NB: Introducing Classes

Programming for Data Science

Introduction

Classes are a way of organizing code into bundles of variables and functions called attributes and methods.

Each class models some thing — a thing in the world, a process, a model, or just some convenient way of grouping code.

For example, a logistic regression model would have attributes like:

  • weights
  • an optional intercept term
  • the maximum number of iterations

These attributes help describe the object; they give the object’s state.

The logistic regression model would have functionality such as:

  • the optimization routine used in training
  • a prediction function

The behavior, or functionality, is supported by methods, which are functions included in the class.

Here are a couple of other ways to think of a class:

  • It provides a template for creating an object and for working with the object.
  • It constitutes a kind of definition of something in the world.

A First Example

Ok, let’s look at examples, starting with a very small, simple class.

The class contains:

  • a name Ferrari458
  • a docstring for a quick description
  • an attribute, which is number of cylinders in the engine
  • a method
class Ferrari458:
    "This is a Ferrari 458 object"
    cylinders = 8

    def print_origin(self):
        "Returns a string"
        return 'I was built in Italy!'

You can learn about the class by printing the docstring:

Ferrari458.__doc__
'This is a Ferrari 458 object'

You can also get detailed help like this:

help(Ferrari458)
Help on class Ferrari458 in module __main__:

class Ferrari458(builtins.object)
 |  This is a Ferrari 458 object
 |  
 |  Methods defined here:
 |  
 |  print_origin(self)
 |      Returns a string
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  cylinders = 8

Next, we create an object from the class (also called an instance of the class).

It is called like a function.

The process is called instantiation.

myferrari = Ferrari458()

We can show the number of myferrari cylinders by using the object.attribute format:

myferrari.cylinders
8

We can call its method .print_origin() to learn where the car was built:

myferrari.print_origin()
'I was built in Italy!'

Note that the method takes self as its first argument.

By doing this, the method can use the self.attribute and self.method() pattern to access the attributes and methods contained in other parts of the class.

Here is an example, with the method .get_cylinders():

class Ferrari458_v2:
    """This is a Ferrari 458 object"""
    cylinders = 8

    def print_origin(self):
        return 'I was built in Italy!'

    def get_cylinders(self):
        return self.cylinders
myferrari = Ferrari458_v2()
myferrari.get_cylinders()
8

For the method .get_cylinders() to see the attribute cylinders, it has to access the attribute as a property of self.

This is because the scope inside the class doesn’t behave like global scope in a module.

Each method can only see what is inside of it, or what is global to the code that defined the class.

Instead of having globals, it has the shared variable self that methods can use to share information.

The Meaning of self

self stands for the intantiated object itself.

It is a proxy in the template for an actual instance of the class.

So, to repeat what was said earlier, if you want your method to access the other attributes and methods of an object, you need to put self as its first argument.

Note that when you use the method with an instance, you don’t pass the object name as an argument:

myferrari.get_cylinders() 

The object name myferrari is passed implicitly by Python.

You can use any valid name you want for the name of the object itself, but the convention is to use self.

Note that self is only used within the methods of a class, not outside of it in the rest of the class definition.

Attributes defined outside of methods but inside a class are implicitly attached to self.

The self variable is the mechanism that allows methods to share data without having to pass and return a bunch of variables.

Think of self as a data structure that stores the program itself

The .__init__() method

There is a special method called .__init__() that will initialize the state of an object when you create it.

Use it to supply more context-dependent information about your instance.

Let’s look at another version of the class with __init__().

class Ferrari458_v3:
    """this is a Ferrari 458 object"""
    cylinders = 8
    
    def __init__(self, color):
        self.color = color

    def print_origin(self):
        return 'I was built in Italy!'

    def get_color(self):
        return self.color

By adding the .__init__() method, we can create objects if we pass the color.

If we don’t pass this parameter, there will be an error.

This is because we did not define a default value for the color argument in our initialization method.

ferr1 = Ferrari458_v3()
TypeError: __init__() missing 1 required positional argument: 'color'

This works:

ferr1 = Ferrari458_v3("red")

We can access the initialized attribute using the dot operator, just as if it were declared at the top of the class:

ferr1.color
'red'

Or we can call the accessor method that we created.

ferr1.get_color()
'red'

Note that even though we initialized the car object with “red”, we can always change it:

ferr1.color = "Cobalt"
ferr1.get_color()
'Cobalt'

Instance vs Class Attributes

Notice the difference between the cylinders and the color attributes.

class Ferrari458_v3:
    """this is a Ferrari 458 object"""
    cylinders = 8
    
    def __init__(self, color):
        self.color = color

The first is a class attribute.

It is defined outside of any method.

Its value will apply to all instances of the class, unless the instance overrides it.

The second is an instance attribute.

It is defined inside of a method.

Its value is meant to be changed with each instance.

Look what happens if we change the value of cylinders in the class:

Ferrari458_v3.cylinders = 12
ferr1.cylinders
12

The value will be changed with all of the instances created from the class.

Now, if we change the instance variable, the class is unaffected.

ferr1.cylinders = 4
Ferrari458_v3.cylinders
12

Summary and Additional Info

An object is a self-contained bundle of methods and attributes.

  • Methods are basically functions.
  • Attributes are basically variables.

A class definition is a template for creating objects.

  • Objects are class instances.
  • Classes are object types.

Objects have their own scope, like functions.

When objects are first created, they often expected to have data passed to them.

  • This is called initializing the object.
  • These data are handled internally by the .__init__() method.
  • Data that are passed this way may be overridden by accessing the attributes they assigned to.

The methods of a class begin with self as the first argument.

  • This stands for the instance itself.
  • All methods and attributes are available to all other methods in the object through the self object.

If a method does not have self as its first argument, it cannot access the internal state or methods of the object.

  • The internal state is just the attributes and their current values.
  • These are called static methods.
  • Static methods are useful in providing functions to the environment in which their containing object is instantiated.

There is a lot more to the subject, but this is good enough to get started!