Understanding the formulation of SVMs
Kevin Li, CS 189 Spring 2022

In discussion, we started with the assumption that the constraint for SVMs should be:

## ​​​​$y_i (x_i \cdot w + \alpha) \geq c$​

But what does this mean? And what is $c$ anyway?

It’s hard to come up with an intuitive visualization of $c$. Instead, it might be easier to think about this alternate constraint (which we’ll see later turns out to be equivalent):

## ​​​​$\frac{y_i(x_i \cdot w + \alpha)}{||w||} \geq M$​

Remember that $\frac{y_i(x_i \cdot w + \alpha)}{||w||}$ is the formula for the distance from any point $x_i$ to the decision boundary (hyperplane) defined by $w$ and $\alpha$. So this constraint is saying that we want all of our training points to be at least distance $M$ away from the boundary. In other words, $M$ plays the role of the margin!

Using this definition, the SVM optimization problem is fairly easy to write:

## ​​​​$\text{s.t. }\frac{y_i(x_i \cdot w + \alpha)}{||w||} \geq M$​

Note that the only difference between the constraint above and the one used in the discussion worksheet is that this version has an extra $||w||$ in the denominator of the left-hand side. So basically, $M = \frac{c}{||w||}$, or equivalently $c = ||w|| \cdot M$. In other words, $c$ is just the margin scaled by some additional value.

## ​​Equivalent formulation in terms of ​​$c$​

(what was used on the discussion worksheet)

## ​​​​$\text{s.t. }y_i(x_i \cdot w + \alpha) \geq c$​

Visually, the problem looks like this:
(As a side note, the diagram above might help you see why there must always be a point that lies on the margin on both the positive and negative sides of the boundary. There are two opposing goals here: we’re trying to maximize $M$, but our training points still need to stay at least distance $M$ away from the boundary. So naturally, we should just keep increasing $M$ until we’re forced to stop — or geometrically, keep expanding the dotted lines until they touch the training data!)

Why don’t we just stop at this simpler formulation? The issue is that we’ve introduced an additional variable $M$ (or $c$) which makes the problem a little more complicated to optimize.

Luckily, there’s a trick we can use: we can take advantage of the fact that the $||w||$ in the constraint $\frac{y_i(x_i \cdot w + \alpha)}{||w||} \geq M$ doesn’t really matter. For example, we can arbitrarily change $||w||$ to be $10 \cdot ||w||$, and adjust $w$ and $\alpha$ accordingly (since we have full control over those variables), so that nothing changes about our constraint:

## ​​​​$\frac{y_i(x_i \cdot (10 w) + (10 \alpha))}{10 \cdot ||w||} \geq M$​

(The 10’s cancel out and we’re left with the same expression as before.)

Since $||w||$ can be whatever we want, why not define it as $\frac{1}{M}$? This seems arbitrary, but it’s nice because it allows us to cancel out the $M$’s entirely:

## ​​​​$y_i(x_i \cdot w + \alpha) \geq 1$​

And since $||w|| = \frac{1}{M}$, we can replace the $M$ in the objective as well, which leaves us with: