7  Maximum Margin Classifier

7.1 Introduction

Linear maximum margin classifiers which are also known as linear support vector machines allow us to classify binary data using a geometric approach. In two dimensions, we can use a simple line that separates the classes (under the assumption that they are indeed separable) and additionally maximizes the margins between those classes. In higher dimensions, the line is replaced by a hyperplane.

The goal of this chapter is to gain an intuition about how such a line can be derived visually and analytically.

Throughout the exercises, the following libraries are used for creating figures:

library("tidyverse")
library("ggtext")

7.2 Exercises

Exercise 7.1 (Visual derivation) To solve the first exercise, you can either draw everything in a hand sketched figure, or create your own figures with the {ggplot} library. Consider the following data set comprised of ten data points and two classes with labels -1 and 1.

data <- tibble(x = c(-1,-1,0,0,1,2,2,2.5,3,4),
            y = c(-1,1,1,0,0,2.5,2,3,2.5,4),
            label = factor(c(rep(1,5),rep(-1,5)))
            )
  1. Generate a scatter plot visualizing the data points and their respective classes. Hint: You can visualize the classes using colors or shapes.

  2. To find the optimal separation line geometrically, it is often useful to consider the convex hull of the dataset. Start out by drawing the convex hull in the figure generated in Exercise 1.

    Recall, that the convex hull of a set of points X is defined as the minimal convex set containing X. To create a convex hull with ggplot, consider the following example:

    Example: Let X={(0,0),(0.25,0.75),(0.5,0.5),(1,1),(1,0)}.

    data_example <- tibble(x1 = c(0,0.25,0.5,1,1),
                           x2 = c(0,0.75,0.5,0,1),
                           label = factor(rep(1,5))
                           )
    p_example <- data_example %>% ggplot(aes(x=x1,y=x2)) +
      geom_point(size = 2)+
      theme_minimal()
    p_example

    Then, the convex hull can be generated as follows:

    hull_example <- data_example %>% group_by(label) %>% slice(chull(x1,x2))
    
    p_example +
      geom_polygon(data = hull_example,
                   aes(x=x1, y=x2,color = label, fill = label),
                   alpha = 0.3)

  3. The line with the maximal margin is defined by the line with minimal distance between the points of the different classes. Find these two points on the convex hulls you have just drawn/plotted and label them with c1 and c1 (for the classes with label 1 and 1, respectively). Note that every point on a convex hull is a possible candidate, and they do not necessarily need to correspond with the data points.

  4. Connect the points c1 and c1 with a line perpendicular to the points.

  5. The separation line passes through the center of the line you have just drawn and is orthogonal to it, i.e., the two lines enclose a 90° angle. Draw/plot the separation line s, the line l1 that passes through the support vectors of the class with label 1, and the line l1 that passes through the support vector of the class with label 1.

  6. Add two arbitrary points from each class to the feature space, so that the separation line s found in the previous exercise does not change.

  7. Start fresh with the same data and add a new data point x5 that belongs to a class of your choice so that the new margin between the two classes is equal to 1. As before, sketch/plot the convex hull, the two points points c~1 and c~1 on the convex hull, and the three lines l~1,l~1, and s~ as in Exercise 1.-4.

Exercise 7.2 (Analytical derivation of a maximum margin classifier) From the previous exercise, we know what kind of separation line we can expect when using support vector machines and how to find it graphically. However, determining the separation line analytically is difficult, even for the simple problem of . The number of variables and conditions make this task impractical for a “Pen and Paper” exercise. However, to still get a basic idea of the linear SVM algorithm, we consider an even simpler problem with only two points x1,x2R2 that each belongs to their own class. The two points are given by

x1=(11)ω1={xi:Ti=1},x2=(23)ω1={xi:Ti=1}.

Since both points are the single representative of their respective classes, they are also the support vectors. Furthermore, they are located on the margin. The goal of this exercise is to find the parameters of a separation line with maximal margin.

Recall from the lecture, that the dual problem is given by

LD=i=12αi12i=12j=12αiαjTiTjxixj

subject to the constraint

i=12αiTi=0.

Technically, we also need the constraint αi0,i. However, to keep things simple, we assume that this is satisfied here.

  1. Set up the Lagrange function by plugging all the values into LD and the constraint above. Subsequently, simplify the terms.

  2. To maximize the term LD under the constraint i=12αiTi=0, we need an additional Lagrange function with Lagrange multiplier λ

    Λ(α1,α2,λ)=LD+λ(i=12αiTi).

    Maximize this function and show that the optimal values are given by

    α1=25andα2=25.

    It is sufficient to only calculate the potential extrema since we will also assume for them to be actual extrema.

  3. Based on the previous results, calculate the line parameters w1,w2, and $ b$.

  4. Add the line to the figure below.

    data_ex02 <- tibble(x1= c(1,2), x2 = c(1,3), "T" = factor(c(1,-1)))
    
    data_ex02 %>% ggplot(aes(x=x1,y=x2, color = T)) +
      geom_point(size = 3) +
      xlim(1,2) +
      ylim(1,3) +
      labs(
      color = "Labels"
      )+
      theme_minimal()

  5. Suppose we want to classify a new point x3=(2,2). Assign this point to the correct class, both visually and using the decision function f(x~)=sign(i=12αiTi(xix~+b)).

7.3 Solutions

Solution 7.1 ().

  1. cols <- c("1" = "#212C58", "-1" = "#F58216")
    
    title_text <- glue::glue(
    "Classes 
     <span style='color:{cols['1']};'>
      &omega;<sub>1</sub> (\u25CF)
     </span>
     and 
     <span style='color:{cols['-1']};'>
       &omega;<sub>-1</sub>(\u25A0)
     </span> 
    ")
    
    p <- ggplot() +
      geom_point(data = data,
                 aes(x=x,y=y,color = label, shape = label),
                 size = 2)+
      scale_color_manual(values = cols) +
      scale_shape_manual(values = c(15, 16)) +
      labs(
        title = title_text
      )+
      theme_minimal()+
      theme(
        plot.title = element_markdown(),
        legend.position = "None"
      )+
      coord_fixed()
    p

  2. hull <- data %>% group_by(label) %>% slice(chull(x,y))
    p1 <- p +
        geom_polygon(data = hull,
                     aes(x=x, y=y,color = label, fill = label),
                     alpha = 0.3)+
        scale_fill_manual(values = cols)
    p1

  3. df_annotate <- tibble(
      label = c(
        "<span style='color:red;'> c<sub>1</sub></span>",
        "<span style='color:red;'> c<sub>-1</sub></span>"
      ),
      x = c(0.5,2),
      y = c(0.5,2),
      hjust = c(-0.3, 1.2)
    )
    df_c <- tibble(x = c(0.5,2),
                 y = c(0.5,2))
    
    p2 <- p1 + geom_point(data = df_c,
                    aes(x=x,y=y),
                    shape = c(8,8),
                    size = 3,
                    color = c("red","red"))+
      geom_richtext(data = df_annotate,
                    aes(x=x, y=y, label=label, hjust = hjust),
                    fill = NA,
                    label.color = NA)
    p2

  4. df_line <- tibble(x1 = 0.5, x2 = 2, y1 = 0.5, y2 = 2)
    p3 <- p2 + geom_segment(data = df_line,
                       aes(x = x1, y = y1, xend = x2, yend = y2),
                       color = "red")
    p3

  5. df_annotate_l <- tibble(
      label = c(
        "<span style='color:grey50;'> l<sub>1</sub></span>",
        "<span style='color:grey50;'> l<sub>-1</sub></span>",
        "<span style='color:grey50;'> s</span>"
      ),
      x = c(-0.75,3.5,0.25),
      y = c(1.5,1,2.5)
    )
    
    p4 <- p3 + geom_abline(slope = -1,
                           intercept = 2.5,
                           color = "grey50")+
               geom_abline(slope = -1,
                           intercept = 1,
                           color = "grey50",
                           linetype = 2)+
               geom_abline(slope = -1,
                           intercept = 4,
                           color = "grey50",
                           linetype = 2)+
               geom_richtext(data = df_annotate_l,
                    aes(x=x, y=y, label=label),
                    fill = NA,
                    label.color = NA)
    p4

  6. title_text <- glue::glue(
    "Classes
      <span style='color:{cols['1']};'>
        &omega;<sub>1</sub> (\u25CF)
      </span>
    and 
      <span style='color:{cols['-1']};'>
        &omega;<sub>-1</sub>(\u25A0)
      </span>,
    <br> and additional points 
      <span style='color:{cols['1']};'> 
        x<sub>1</sub>,x<sub>2</sub>(\u25B2)
      </span>
    and
      <span style='color:{cols['-1']};'>
        x<sub>3</sub>,x<sub>4</sub>(\u25C6)
      </span>"
    )
    df_x <- tibble(x = c(-0.5,0,2.5,3),
                   y = c(-0.5,0.5,2.5,3),
                   label =factor(c(1,1,-1,-1)))
    p5 <- p4 + geom_point(data = df_x,
                          aes(x=x,y=y, color = label),
                          shape = c(17,17,18,18),
                          size = 3) +
      labs(
        title = title_text
      )
    p5

  7. title_text <- glue::glue(
    "Classes
    <span style='color:{cols['1']};'>
      &omega;<sub>1</sub> (\u25CF)
    </span>
    and
    <span style='color:{cols['-1']};'>
      &omega;<sub>-1</sub>(\u25A0)
    </span>."
    )
    
    #Define new Data and Hull
    data_new <- rbind(data,c(1,2,1))
    hull_new <- data_new %>% group_by(label) %>% slice(chull(x,y))
    
    #New Annotations
    ## Annotate c
    df_annotate_new <- tibble(
      label = c(
        "<span style='color:red;'> c<sup>~</sup><sub>1</sub></span>",
        "<span style='color:red;'> c<sup>~</sup><sub>-1</sub></span>"
      ),
      x = c(0.5,2),
      y = c(2,2),
      hjust = c(-0.3, 1.2)
    )
    df_c_new <- tibble(x = c(1,2),
                 y = c(2,2))
    ## Annotate l and s
    df_annotate_l_new <- tibble(
      label = c(
        "<span style='color:grey50;'> l<sup>~</sup><sub>-1</sub></span>",
        "<span style='color:grey50;'> l<sup>~</sup><sub>1</sub></span>",
        "<span style='color:grey50;'> s<sup>~</sup></span>"
      ),
      x = c(2.25,0.75,1.4),
      y = c(3,3,3)
    )
    
    # Generate new plot
    p_new <- ggplot() +
      geom_point(data = data_new,aes(x=x,y=y,color = label, shape = label),size = 2)+
      geom_polygon(data = hull_new, aes(x=x, y=y,color = label, fill = label),alpha = 0.3)+
      geom_point(data = df_c_new,
                    aes(x=x,y=y),
                    shape = c(8,8),
                    size = 3,
                    color = c("red","red"))+
      geom_richtext(data = df_annotate_new,
                    aes(x=x, y=y, label=label, hjust = hjust),
                    fill = NA,
                    label.color = NA)+
      geom_vline(xintercept = 1.5, color = "grey50")+
      geom_vline(xintercept = 1, color = "grey50", linetype = 2)+
      geom_vline(xintercept = 2, color = "grey50", linetype = 2)+
      geom_richtext(data = df_annotate_l_new,
           aes(x=x, y=y, label=label),
           fill = NA,
           label.color = NA)+
      scale_fill_manual(values = cols)+
      scale_color_manual(values = cols) +
      scale_shape_manual(values = c(15, 16)) +
      labs(
        title = title_text
      )+
      theme_minimal()+
      theme(
        plot.title = element_markdown(),
        legend.position = "None"
      )+
      coord_fixed()
    
    p_new

Solution 7.2 ().

  1. Setting up the Lagrange function:

    LD=(α1+α2)12(α12122+α1α2115+α2α1115+α22(1)213)=(α1+α2)12(2α1225α1α2+13α22)=α1+α2α12+5α1α2132α22

    Setting up the constraint

    α1α2=0

  2. In order to optimize Λ(α1,α2,λ)=α1+α2α12+5α1α2132α22+λ(α1α2), first calculate the partial derivatives with respect to α1,α2,λ:

    (1)Λα1=12α1+5α2+λ(2)Λα2=1+5α113α2λ(3)Λλ=α1α2

    In order to obtain an extrema, we have to set each of the equations above to 0 and solve them for α1 and α2.

    Adding the terms (1) and (2), we obtain

    12α1+5α2+λ+1+5α113α2λ=02+3α18α2=0α1=8α223

    Plugging α1 into (3) then yields

    8α223α2=05α223=0α2=25

    Since we also need to satisfy α1α2=0 and thus α1=α2, we can deduce α1=25.

    Therefore α1=α2=25.

  3. The first order conditions require

    w=i=12αiTixi,

    i.e., (w1w2)=α1T1x1+α2T2x2=(2525)(4565)=(2545) Calculating b using the Karush–Kuhn–Tucker conditions yields

    αi(Ti(xiw+b)1)=0b=1Tixiw Plugging in the values we obtained for w:

    b=11(11)(2545)=11(23)(2545)=115 The separation line is therefore give by

    w1x1+w2x2+b=0x2=bw1x1w2=115+25x145=11412x1

  4. data_ex02 %>% ggplot(aes(x=x1,y=x2, color = T)) +
      geom_point(size = 3) +
      geom_abline( slope = -0.5, intercept = 11/4) +
      xlim(1,2) +
      ylim(1,3) +
      labs(color = "Labels")+
      theme_minimal()

  5. data_ex02 %>% ggplot() +
      geom_point(aes(x=x1,y=x2, color = T), size = 3) +
      geom_abline( slope = -0.5, intercept = 11/4) +
      geom_point(data = tibble(x=2,y=2), aes(x=x,y=y), size = 3, shape = 17)+ 
      xlim(0,4) +
      ylim(0,4) +
      labs(color = "Labels")+
      theme_minimal()

    Since the point (2,2) is above the decision line, it belongs to class ω1.

    Plugging all the values into the decision function yields:

    f(22)=sign(251(11)(22)+115+251((23)(22)+115))=sign(125)=1. Therefore, the new point belongs to class ω1.