Example of S3VM

We now implement an S³VM in Python using the SciPy optimization methods, which are mainly based on C and FORTRAN implementations. The reader can try it with other libraries such as NLOpt and LIBSVM and compare the results. A possibility suggested by Bennet and Demiriz is to use the L1-norm for w, so as to linearize the objective function; however, this choice seems to produce good results only for small datasets. We are going to keep the original formulation based on the L2-norm, using an Sequential Least Squares Programming (SLSQP) algorithm to optimize the objective.

Let's start by creating a bidimensional dataset with both labeled and unlabeled samples:

from sklearn.datasets import make_classification

nb_samples = 500
nb_unlabeled = 200

X, Y = make_classification(n_samples=nb_samples, n_features=2, n_redundant=0, random_state=1000)
Y[Y==0] = -1
Y[nb_samples - nb_unlabeled:nb_samples] = 0

For simplicity (and without any impact, because the samples are shuffled), we set last 200 samples as unlabeled (y = 0). The corresponding plot is shown in the following graph:

Original labeled and unlabeled dataset

The crosses represent unlabeled points, which are spread throughout the entire dataset. At this point we need to initialize all variables required for the optimization problem:

import numpy as np

w = np.random.uniform(-0.1, 0.1, size=X.shape[1])
eta = np.random.uniform(0.0, 0.1, size=nb_samples - nb_unlabeled)
xi = np.random.uniform(0.0, 0.1, size=nb_unlabeled)
zi = np.random.uniform(0.0, 0.1, size=nb_unlabeled)
b = np.random.uniform(-0.1, 0.1, size=1)
C = 1.0

theta0 = np.hstack((w, eta, xi, zi, b))

As the optimization algorithm requires a single array, we have stacked all vectors into a horizontal array theta0 using the np.hstack() function. We also need to vectorize the min() function in order to apply it to arrays:

vmin = np.vectorize(lambda x1, x2: x1 if x1 <= x2 else x2)

Now, we can define the objective function:

def svm_target(theta, Xd, Yd):
    wt = theta[0:2].reshape((Xd.shape[1], 1))
    
    s_eta = np.sum(theta[2:2 + nb_samples - nb_unlabeled])
    s_min_xi_zi = np.sum(vmin(theta[2 + nb_samples - nb_unlabeled:2 + nb_samples], 
                              theta[2 + nb_samples:2 + nb_samples + nb_unlabeled]))
    
    return C * (s_eta + s_min_xi_zi) + 0.5 * np.dot(wt.T, wt)

The arguments are the current theta vector and the complete datasets Xd and Yd. The dot product of w has been multiplied by 0.5 to keep the conventional notation used for supervised SVMs. The constant can be omitted without any impact. At this point, we need to define all the constraints, as they are based on the slack variables; each function (which shares the same parameters of the objectives) is parametrized with an index, idx. The labeled constraint is as follows:

def labeled_constraint(theta, Xd, Yd, idx):
    wt = theta[0:2].reshape((Xd.shape[1], 1))
    
    c = Yd[idx] * (np.dot(Xd[idx], wt) + theta[-1]) + 
    theta[2:2 + nb_samples - nb_unlabeled][idx] - 1.0
    
    return (c >= 0)[0]

The unlabeled constraints, instead, are as follows:

def unlabeled_constraint_1(theta, Xd, idx):
    wt = theta[0:2].reshape((Xd.shape[1], 1))
    
    c = np.dot(Xd[idx], wt) - theta[-1] + 
        theta[2 + nb_samples - nb_unlabeled:2 + nb_samples][idx - nb_samples + nb_unlabeled] - 1.0
        
    return (c >= 0)[0]

def unlabeled_constraint_2(theta, Xd, idx):
    wt = theta[0:2].reshape((Xd.shape[1], 1))
    
    c = -(np.dot(Xd[idx], wt) - theta[-1]) + 
        theta[2 + nb_samples:2 + nb_samples + nb_unlabeled ][idx - nb_samples + nb_unlabeled] - 1.0
        
    return (c >= 0)[0]

They are parametrized with the current theta vector, the Xd dataset, and an idx index. We need also to include the constraints for each slack variable (≥ 0):

def eta_constraint(theta, idx):
    return theta[2:2 + nb_samples - nb_unlabeled][idx] >= 0

def xi_constraint(theta, idx):
    return theta[2 + nb_samples - nb_unlabeled:2 + nb_samples][idx - nb_samples + nb_unlabeled] >= 0

def zi_constraint(theta, idx):
    return theta[2 + nb_samples:2 + nb_samples+nb_unlabeled ][idx - nb_samples + nb_unlabeled] >= 0

We can now set up the problem using the SciPy convention:

svm_constraints = []

for i in range(nb_samples - nb_unlabeled):
    svm_constraints.append({
            'type': 'ineq',
            'fun': labeled_constraint,
            'args': (X, Y, i)
        })
    svm_constraints.append({
            'type': 'ineq',
            'fun': eta_constraint,
            'args': (i,)
        })
    
for i in range(nb_samples - nb_unlabeled, nb_samples):
    svm_constraints.append({
            'type': 'ineq',
            'fun': unlabeled_constraint_1,
            'args': (X, i)
        })
    svm_constraints.append({
            'type': 'ineq',
            'fun': unlabeled_constraint_2,
            'args': (X, i)
        })
    svm_constraints.append({
            'type': 'ineq',
            'fun': xi_constraint,
            'args': (i,)
        })
    svm_constraints.append({
            'type': 'ineq',
            'fun': zi_constraint,
            'args': (i,)
        })

Each constraint is represented with a dictionary, where type is set to ineq to indicate that it is an inequality, fun points to the callable object and args contains all extra arguments (theta is the main x variable and it's automatically added). Using SciPy, it's possible to minimize the objective using the Sequential Least Squares Programming (SLSQP) or Constraint Optimization by Linear Approximation (COBYLA) algorithms. We preferred the former, because it works more rapidly and is more stable:

from scipy.optimize import minimize

result = minimize(fun=svm_target, 
                  x0=theta0, 
                  constraints=svm_constraints, 
                  args=(X, Y), 
                  method='SLSQP', 
                  tol=0.0001, 
                  options={'maxiter': 1000})

After the training process is complete, we can compute the labels for the unlabeled points:

theta_end = result['x']
w = theta_end[0:2]
b = theta_end[-1]

Xu= X[nb_samples - nb_unlabeled:nb_samples]
yu = -np.sign(np.dot(Xu, w) + b)

In the next graph, it's possible to compare the initial plot (left) with the final one where all points have been assigned a label (right):

As you can see, S³VM succeeded in finding the right label for all unlabeled points, confirming the existence of two very dense regions for x between [0, 2] (square dots) and y between [0, 2] (circular dots).

NLOpt is a complete optimization library developed at MIT. It is available for different operating systems and programming languages. The website is https://nlopt.readthedocs.io. LIBSVM is an optimized library for solving SVM problems and it is adopted by Scikit-Learn together with LIBLINEAR. It's also available for different environments. The homepage is https://www.csie.ntu.edu.tw/~cjlin/libsvm/.

Table of Contents for Example of S3VM

Create new playlist

Sign In

Sign Up

Table of Contents for
Example of S3VM