Understanding the formulation of SVMs
Kevin Li, CS 189 Spring 2022
In discussion, we started with the assumption that the constraint for SVMs should be:
But what does this mean? And what is c anyway?
It’s hard to come up with an intuitive visualization of c. Instead, it might be easier to think about this alternate constraint (which we’ll see later turns out to be equivalent):
Remember that ∣∣w∣∣yi(xi⋅w+α) is the formula for the distance from any point xi to the decision boundary (hyperplane) defined by w and α. So this constraint is saying that we want all of our training points to be at least distance M away from the boundary. In other words, M plays the role of the margin!
Using this definition, the SVM optimization problem is fairly easy to write:
Note that the only difference between the constraint above and the one used in the discussion worksheet is that this version has an extra ∣∣w∣∣ in the denominator of the left-hand side. So basically, M=∣∣w∣∣c, or equivalently c=∣∣w∣∣⋅M. In other words, c is just the margin scaled by some additional value.
Equivalent formulation in terms of c
(what was used on the discussion worksheet)
Visually, the problem looks like this:
(As a side note, the diagram above might help you see why there must always be a point that lies on the margin on both the positive and negative sides of the boundary. There are two opposing goals here: we’re trying to maximize M, but our training points still need to stay at least distance M away from the boundary. So naturally, we should just keep increasing M until we’re forced to stop — or geometrically, keep expanding the dotted lines until they touch the training data!)
Why don’t we just stop at this simpler formulation? The issue is that we’ve introduced an additional variable M (or c) which makes the problem a little more complicated to optimize.
Luckily, there’s a trick we can use: we can take advantage of the fact that the ∣∣w∣∣ in the constraint ∣∣w∣∣yi(xi⋅w+α)≥M doesn’t really matter. For example, we can arbitrarily change ∣∣w∣∣ to be 10⋅∣∣w∣∣, and adjust w and α accordingly (since we have full control over those variables), so that nothing changes about our constraint:
(The 10’s cancel out and we’re left with the same expression as before.)
Since ∣∣w∣∣ can be whatever we want, why not define it as M1? This seems arbitrary, but it’s nice because it allows us to cancel out the M’s entirely:
And since ∣∣w∣∣=M1, we can replace the M in the objective as well, which leaves us with: