My First Classifier

From classes to classifiers, I’ve made my first proper step into machine learning and implemented my first ML algorithm into Python; the k-nearest neighbour classifier.

Here we are in the domain of supervised classification that is discrete response variables with known outcomes. The general idea behind k-nearest neighbours is to imagine the training data plotted in p dimensional space, where each point is labelled with it’s known response. To classify a new point, simply consider the k closest points and choose the majority class in said set of k. Here closeness is defined in terms of the euclidean distance metric. For more information check out an episode from a great, albeit very short series on ML, put out by the Google Dev channel, on which this post is loosely based.

Before diving into code, consider the task at hand, first off we need a dataset. We will use Fisher’s iris dataset which comes with sklearn and consists of 4 features (sepal and petal width and length) and 3 classes (3 subspecies of iris). We then need to split this data into training data (to traing our classifier) and test data (to test it). The difficult bit is making a classifier that can be trained and then used to predict the outcome of new unseen data.

To get out data imported and split into train and test we have the following

from sklearn import datasets
from sklearn.model_selection import train_test_split

iris = datasets.load_iris()
x = iris.data
y = iris.target

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=30)

Now to the meat of the problem; building the classifier which gives us the chance to show off our new classes. Since we want to specify the number of neighbours to consider, when we define the __init__ method we’ll pass the argument k. Just like with the standard classifiers in sklearn, we want the ability to both fit and predict so we will define a method for each. The __init__ and fit methods are self explanatory, the crux of this lies in the predict method.

For each new point to be classified in x_test we consider the distance to all other points in the training data x_train using distance from scipy.spatial. Finding the index of the k nearest neighbours we can then find their respective classes. Using mode from scipy.stats we find the majority class and use this as our prediction.

from scipy.spatial import distance
import numpy as np
import scipy.stats as stats

class MyKNN():

    def __init__(self,k):
        self.k = k

    def fit(self,x_train,y_train):
        self.x = x_train
        self.y = y_train

    def predict(self,x_test):
        predictions = np.array([])
        for i in x_test:
            distances = np.array([])
            for j in self.x:
                dist = distance.euclidean(i,j)
                distances = np.append(distances,dist)
            #index of k nearest neighbours
            knn_index = np.argsort(distances)[:self.k]
            #classification of k nearest neighbours (list)
            knn_class = [self.y[k] for k in knn_index]
            label = stats.mode(knn_class)
            predictions = np.append(predictions, label[0][0])
        return predictions

Finally we instantiate our class with k=2 (chosen arbitrarily), call the fit method on our training data and call the predict method on our training data. Importing an accuracy function from sklearn allows us to judge how well our first classifier performs…

clf = MyKNN(2)
clf.fit(x_train,y_train)
pred = clf.predict(x_test)

from sklearn.metrics import accuracy_score
print(accuracy_score(y_test,pred))

93.3% accuracy, not bad for a first shot and on that note it is safe to say that I am no longer an ML zero.

I’ve recently made a start on Kevin Murphy’s mighty 1000 page book Machine Learning: A Probabilistic Perspective which so far has been proven to be a perfect level of difficulty. As I work my way through this, I hope to implement some more advanced ML algorithms into Python and I’ll be sure to keep this space updated with anything I hack together.

Fractals

While scouring reddit for a programming challenge I stumbled across the programming daily subreddit that provides daily challenges to readers. One looked at generating Julia fractals so sensing an opportunity for some maths and gnarly images I thought I’d have a go.

The idea behind Julia fractals is taking a complex function and a point in the complex plane and repeatedly applying the function until the value of the function exceeds some threshold. Given a pixel whose location corresponds to the point in the plane, we color it according to the number of iterations required and repeat for lots of points in the plane.

We used the function f with \omega = - (0.221 + 0.731i)

\begin{matrix} f_\omega: & \mathbb{C} & \rightarrow & \mathbb{C} \\ & z & \rightarrow & z^{2} - \omega \end{matrix}

Iterating over the intervals [-1,1] for both the real and imaginary parts in steps of 0.001, gives the following which I will definitely be using as the new header for the blog.

julia set

More on Classes

With the basics of classes covered, we dive deeper into the ideas of classes and OOP and begin to look at subclasses, inheritance and polymorphism.

First let’s take a look at subcalsses. As someone from a maths background this idea followed very naturally from the idea of subsets, and is essentially a way of grouping objects in a hierarchical structure. A car is a type of vehicle so one could say a car is a subclass of vehicle, equally a dachshund is a subclass of dog which itself is a subclass of pet. We say that the pet class is the superclass (just like superset in set theory) of the class dog. As with set theory we have implications about the objects in each class for example

  • All dachshunds are dogs
  • All dachshunds are pets
  • All dogs are pets
  • Not all pets are dachshunds
  • Not all pets are dogs

With this idea of hierarchical structure comes the notion of inheritance of certain features an object may have; for example all pets have names, so since a dog is a pet we would expect it to have a name.

So now we have a feel for the intuitive notion of subclasses and inheritance let’s see how we’d code it up. Sticking with pets, dogs and dachshunds we first make our pet class (like last time) and define the __init__ method that takes a name and an age. We’ll also add a method called Birthday that wishes the pet happy birthday and increments its age.

class Pet():
    
    def __init__(self,name,age):
        self.name = name
        self.age = age
        
    def birthday(self):
        print('Happy Birthday ' + self.name + '!')
        self.age = self.age + 1

To define a subclass no new syntax is needed, we simply pass the class we want as a superclass as an argument when defining the class. So let’s make the dog class and define the __init__ method as well as a method called talk

class Dog(Pet):
    
    def __init__(self,name,age,good_dog):
        super().__init__(name,age)
        self.good_dog = good_dog
    
    def talk(self):
        print('Woof')

Note on line 4 the syntax super().__init__(name,age). Here we are calling the initialise methd of the superclass Pet.

We now have all the tools to observe the features of subclasses and inheritance. Creating an instance of dog_1 = Dog('Fido',1,True) we can call the talk method fido.talk(). More interestingly though we can call the birthday method from the pet class, fido.birthday(). We can do this since Dog is a subclass of Pet and we say that the class Dog inherits the method talk. Contrastingly we cannot call the talk method for an instantiation of Pet since not all pets are dogs.

In order to discuss the notion of polymorphism let us introduce another subclass of Pet called Cat.

class Cat(Pet):
    
    def talk(self):
        print('Meow')

Note here we don’t specify an __init__ method, since it is called automatically on instantiating a class, and since we do not specify it explicitly it is inherited from the Pet class.

Note we have two subclasses each with a method called talk, which initially might appear worrisome. This demonstrates the notion of polymorphism, that is the same method has different effects on different classes.

I feel like I’ve grasped the notion of classes now, and can begin to see how they might be useful. My next challenge is to use this new knowledge to dive into some machine learning and construct my first home made classifier.

My First Class

Despite playing around with Python for a while now I had never managed to get to grips with the object oriented nature of the language, namely classes. After trying a failing to get to grips with this idea I decided to sit down and grind through it once and for all, and finally something has clicked. I’ve been following the playlist from DrapsTV which has proved massively useful, so here goes my attempts at explanation based on the things what I learnt. Criticism/corrections most welcome.

A class is a data structure or a type, familiar examples being integers, lists, dictionaries etc. which are inbuilt data structures that come as part of the Python language. Classes give the ability to create new data structures that aren’t included, an example being 2 dimensional vectors (providing we suspend our disbelief regarding the existence of numpy arrays). Classes have data and behaviours stored within them and these are called attributes and methods. Revisiting the example of lists, when we assign a list to a variable we instantiate (fancy word for bring into our program) an instance of the list data structure.

my_list = [1,2,3,4]

In this case the data or attributes are the elements in the list and we can access them in using indexing my_list[1]. Similarly the behaviours or methods are things like the ability to reverse a list which we access via my_list.reverse().

No we’re familiar with what a class might do let’s look at the syntax for creating our first class. True to form creating a class in Python introduces simple and minimal syntax.

class NewClass:

Where class defines a class called Vec2D. I believe it is convention to camel case home made classes, that is MyFirstClass.

An important method in a class is the __init__ method which creates an object of the class you define, this is how instantiation occurs. For those of you familiar with functions, defining a method will appear familiar, and although they are not strictly the same there are definitely some parallels. Again simple syntax means defining a new method is as easy as

def newMethod(self)

Again here I believe it’s convention to lower camel case home made methods that is myFirstMethod. Further it is a requirement in Python that the first argument passed to a method is the self argument which initially proved a complicated thing for me to grasp so let me elaborate.

In the case of lists we have an object my_list and method reverse. Thinking of reverse as a function (although strictly it is a method) if we need to reverse my_list we must pass it as an argument. Therefore in defining the general method we must tell the method what we’ll be reversing, and that is the instance of the class we are dealing with. This is precisely the notion of self.

We now have all the basic building blocks to make our first class, the 2 dimensional vector. We will call the class Vec2D and define two methods, the __init__ method and method that calculates the length of the vector.

class Vec2D:

    def __init__(self,x,y):
        self.x = x
        self.y = y

    def vectorLength(self):
        return math.sqrt(self.x**2 + self.y**2)

We can then call into existence a multiple instances of the Vec2D class simply by

vec_1 = Vec2D(3,4)
vec_2 = Vec2D(1,1)
and so on

Note Vec2D takes 2 arguments (as per the definition of the __init__ method) to set the components of the vector. To access the x and y attributes (just like we might want to access the elements of a list) we simply use

vec_1.x which returns 3
vec_2.y which returns 1

Further We can then access the length method by

vec_1.vectorLength()

Finally we’ll briefly touch on special methods for example em>numeric methods. These let us redefine how the operations + - * / etc work within our new data structure. To do this we define a new method with a specific name (similarly to the __init__ method) that is

def __add__(self,other)

This again takes the self argument as is custom but also takes the other since we need another vector to add to. Adding this into our code above and writing the logic for component wise addition gives the following

class Vec2D:

   def __init__(self,x,y):
        self.x = x
        self.y = y

    def vectorLength(self):
        return math.sqrt(self.x**2 + self.y**2)

    def __add__(self, other):
        a = Vec2D(self.x + other.x,self.y + other.y)
        return a

Line 11 here perfectly exemplifies all the things above. We’re creating an instance of the class we’ve made with new arguments that are the attributes of class itself.

Now we can use this class to add 2 dimensional vectors using the + operation as such

vec_sum = vec_1 + vec_2

This concludes my first proper dive into the unknown of new topics and indeed writing about them.

Learning Machine Learning

Inspired by a recent video by John Green, My Information DietI’ve decided to revamp my current information intake which consists of frankly far too much Facebook. As a fan of of the medium, I turned by attention to finding some data science/machine learning podcasts and almost instantly came across Partially Derivative whose latest post was entitled Learning Machine Learning.

This podcast laid out a syllabus for machine learning under the ideology of combining 3 key areas: Theory, Application and Immersion, before going on to recommend a decent amount of resources. The books are expensive however a large portion of them appear to be online for free in pdf form, here are a selection of the ones mentioned in the podcast.

Theory

Application

Immersion

I feel like this will provide a pretty good grounding in the field and gives a really decent structure to what I’ll be looking at in the not to distance future.

Hello World

Hi all, my name’s Alex and I’m a final year Mathematics student at Leeds uni in the UK. During my degree have started programming, mostly in Python (a little in R) and would say I know my way around the basics fairly well (classes are still sorcery to me at the moment) but my main knowledge is out of a necessity to do maths. I’ve done a couple of larger projects including my dissertation/final year project on network dynamics which involved implementing simple algorithms to model opinion dynamics. The other project was a summer research project about random forests and decision trees and relied heavily on the machine learning library scikit learn, which allowed me to jump right into machine learning without understanding all that much. A combination of this and watching endless interviews with Demis Hassabis of Deepmind has inspired me to crack on and dive deeper into machine learning.

The intention of this blog is to share and document the process of learning more programming and machine learning as I go from Zero to Machine Learning Hero (lame attempt originality is a work in progress). Hopefully this will provide me with motivation for a developing hobby and, at a later date, even be a source of inspiration and information for other. With this in mind I’ll provide links to resources I have found useful and (maybe) explanations of new things what I learnt in attempts to solidify my own understanding. I’m currently relying on the books Introduction to/Elements of Statistical Learning (ISL/ESL) as well as the Google Developers machine learning series.

In terms of what can be expected of this space some short terms goals of things I want to tackle include
  • Python classes
  • My first ‘home grown’ machine learning algorithm, i.e. not using the inbuilt classifiers of sklearn
  • Wade through ISL/ESL

That concludes the obligatory first post, time to get started on classes, next post will no doubt be about that.