Understanding the formulation of SVMs
Kevin Li, CS 189 Spring 2022


In discussion, we started with the assumption that the constraint for SVMs should be:

yi(xiw+α)cy_i (x_i \cdot w + \alpha) \geq c


But what does this mean? And what is cc anyway?

It’s hard to come up with an intuitive visualization of cc. Instead, it might be easier to think about this alternate constraint (which we’ll see later turns out to be equivalent):

yi(xiw+α)wM\frac{y_i(x_i \cdot w + \alpha)}{||w||} \geq M


Remember that yi(xiw+α)w\frac{y_i(x_i \cdot w + \alpha)}{||w||} is the formula for the distance from any point xix_i to the decision boundary (hyperplane) defined by ww and α\alpha. So this constraint is saying that we want all of our training points to be at least distance MM away from the boundary. In other words, MM plays the role of the margin!

Using this definition, the SVM optimization problem is fairly easy to write:

maxw,α,MM\max \limits_{w, \alpha, M} M

s.t. yi(xiw+α)wM\text{s.t. }\frac{y_i(x_i \cdot w + \alpha)}{||w||} \geq M



Note that the only difference between the constraint above and the one used in the discussion worksheet is that this version has an extra w||w|| in the denominator of the left-hand side. So basically, M=cwM = \frac{c}{||w||}, or equivalently c=wMc = ||w|| \cdot M. In other words, cc is just the margin scaled by some additional value.

Simpler formulation

Equivalent formulation in terms of cc

(what was used on the discussion worksheet)

maxw,α,MM\max \limits_{w, \alpha, M} M

s.t. yi(xiw+α)wM\text{s.t. }\frac{y_i(x_i \cdot w + \alpha)}{||w||} \geq M

maxw,α,ccw\max \limits_{w, \alpha, c} \frac{c}{||w||}

s.t. yi(xiw+α)c\text{s.t. }y_i(x_i \cdot w + \alpha) \geq c


Visually, the problem looks like this:
 
(As a side note, the diagram above might help you see why there must always be a point that lies on the margin on both the positive and negative sides of the boundary. There are two opposing goals here: we’re trying to maximize MM, but our training points still need to stay at least distance MM away from the boundary. So naturally, we should just keep increasing MM until we’re forced to stop — or geometrically, keep expanding the dotted lines until they touch the training data!)
 


Why don’t we just stop at this simpler formulation? The issue is that we’ve introduced an additional variable MM (or cc) which makes the problem a little more complicated to optimize.

Luckily, there’s a trick we can use: we can take advantage of the fact that the w||w|| in the constraint yi(xiw+α)wM\frac{y_i(x_i \cdot w + \alpha)}{||w||} \geq M doesn’t really matter. For example, we can arbitrarily change w||w|| to be 10w10 \cdot ||w||, and adjust ww and α\alpha accordingly (since we have full control over those variables), so that nothing changes about our constraint:

yi(xi(10w)+(10α))10wM\frac{y_i(x_i \cdot (10 w) + (10 \alpha))}{10 \cdot ||w||} \geq M

(The 10’s cancel out and we’re left with the same expression as before.)

Since w||w|| can be whatever we want, why not define it as 1M\frac{1}{M}? This seems arbitrary, but it’s nice because it allows us to cancel out the MM’s entirely:

yi(xiw+α)1/MM\frac{y_i(x_i \cdot w + \alpha)}{1/M} \geq M

yi(xiw+α)1y_i(x_i \cdot w + \alpha) \geq 1


And since w=1M||w|| = \frac{1}{M}, we can replace the MM in the objective as well, which leaves us with:

maxw,α1w\max \limits_{w, \alpha} \frac{1}{||w||}