.. _theano_to_pylearn2_tutorial: ======================= Your models in Pylearn2 ======================= Who should read this ==================== We recommend you spend some time with Pylearn2 and read some of our other tutorials before starting with this minimalistic technique. If you are completely new to Pylearn2, have a look at the `softmax regression tutorial `_. Pylearn2 is great for many things; we’ll highlight two here. * It allows you to experiment with new ideas without much implementation overhead. The library was built to be modular, and it aims to be usable without an extensive knowledge of the codebase. Writing a new model from scratch is usually pretty fast once you know what to do and where to look. * It has an interface (YAML) that allows one to decouple implementation from experimental choices, enabling experiments to be constructed in a light and readable fashion. Obviously, there is always a trade-off between being user-friendly and being flexible, and Pylearn2 is no exception. For instance, users looking for a way to work with sequential data might have a harder time getting started (although we’re working to make this experience better). In this post, we will assume that you have built a regression or classification model with Theano and that the training data can be cast into two matrices, one for training examples and one for training targets. People with different requirements may need to work a little more (e.g. by figuring out how to put their data inside Pylearn2). This tutorial contains useful information for anyone interested in porting a model to Pylearn2. How is Pylearn2 used? ======================== While many researchers use Pylearn2 as their primary research tool, this doesn't necessarily mean they know or use every feature in Pylearn2. In fact, you can prototype new models in a very Theano-like fashion: write a model as a big monolithic block of hard coded Theano expressions, and wrap that up in the minimal amount of code necessary to be able to plug a model into Pylearn2. **This bare minimum is what we’ll explain here.** The resulting model may be hard to extend, but it represents a good starting point. As you explore new ideas and change the code, you can gradually make it more flexible: a hard coded input dimension gets factored out as a constructor argument, functions being composed are separated into layers, etc. Our point: **it is alright to stick to the bare minimum when developing a model for Pylearn2**. Your code probably won't satisfy any other use cases than your own, but this is something that you can change gradually as you go. There's no need to overcomplicate things when you start. The bare minimum ================ Let's look at that *bare minimum*. It involves writing exactly two subclasses: * One subclass of `pylearn2.costs.cost.Cost` * One subclass of `pylearn2.models.model.Model` Need more than that? Nope. That's it! Let's have a look. It all starts with a cost expression ------------------------------------ In the scenario we’re describing, your model maps an input to an output, the output is compared with some ground truth using some measure of dissimilarity, and the parameters of the model are changed to reduce this measure using gradient information. It is therefore natural that the object that interfaces between the model and the training algorithm represents a cost. The base class for this object is `pylearn2.costs.cost.Cost` and does three main things: * It describes what data it needs to perform its duty and how it should be formatted. * It computes the cost expression by feeding the input to the model and receiving its output. * It differentiates the cost expression with respect to the model parameter and returns the gradients to the training algorithm. What's nice about `Cost` is if you follow the guidelines we’re about to describe, you only have to worry about the cost expression; the gradient part is all handled by the `Cost` base class, and a very useful `DefaultDataSpecsMixin` mixin subclass is defined to handle the data description part (more about that when we look at the `Model` subclass). Let's look at how the subclass should look: .. code-block:: python from pylearn2.costs.cost import Cost, DefaultDataSpecsMixin class MyCostSubclass(Cost, DefaultDataSpecsMixin): # Here it is assumed that we are doing supervised learning supervised = True def expr(self, model, data, **kwargs): space, source = self.get_data_specs(model) space.validate(data) inputs, targets = data outputs = model.some_method_for_outputs(inputs) loss = # some loss measure involving outputs and targets return loss The `supervised` class attribute is used by `DefaultDataSpecsMixin` to know how to specify the data requirements. If it is set to `True`, the cost will expect to receive inputs and targets, and if it is set to `False`, the cost will expect to receive inputs only. In the example, it is assumed that we are doing supervised learning, so we set `supervised` to `True`. The first two lines of `expr` do some basic input checking and should always be included at the beginning of your `expr` method. Without going too much into detail, `space.validate(data)` will make sure that the data you get is the data you requested (e.g. if you do supervised learning, you need an input a tensor variable and a target tensor variable). How to determine “what you need" will be covered when we look at the `Model` subclass. In that case, `data` is a tuple containing the inputs as the first element and the targets as the second element. We then get the model output by calling its `some_method_for_outputs` method, whose name and behaviour is really for you to decide, as long as your `Cost` subclass knows which method to call on the model. Finally, we compute some loss measure on `outputs` and `targets` and return that as the cost expression. Note that things don't have to be *exactly* like this. For instance, you could ask the model to have a method that takes inputs and targets as arguments and returns the loss directly, and that would be perfectly fine. All you need is some way to make your `Model` and `Cost` subclasses work together to produce a cost expression in the end. Defining the model ------------------ Now it's time to make things more concrete by writing the model itself. The model will be a subclass of `pylearn2.models.model.Model`, which is responsible for the following: * Defining what its parameters are * Defining what its data requirements are * Doing something with the input to produce an output As is the case with `Cost`, the `Model` base class does many useful things on its own, provided you set the appropriate instance attributes. Let's have a look at a subclass example: .. code-block:: python from pylearn2.models.model import Model class MyModelSubclass(Model): def __init__(self, *args, **kwargs): super(MyModelSubclass, self).__init__() # Some parameter initialization using *args and **kwargs # ... self._params = [ # List of all the model parameters ] self.input_space = # Some `pylearn2.space.Space` subclass # This one is necessary only for supervised learning self.output_space = # Some `pylearn2.space.Space` subclass def some_method_for_outputs(self, inputs): # Some computation involving the inputs The first thing you should do if you're overriding the constructor is call the the superclass' constructor. Pylearn2 checks for that and will scold you if you don't. You should then initialize you model parameters **as shared variables**: Pylearn2 will build an updates dictionary for your model variables using gradients returned by your cost. **Protip: the `pylearn2.utils.sharedX` method initializes a shared variable with the value and an optional name you provide. This allows your code to be GPU-compatible without putting too much thought into it.** For instance, a weights matrix can be initialized this way: .. code-block:: python import numpy from pylearn2.utils import sharedX self.W = sharedX(numpy.random.normal(size=(size1, size2)), 'W') Put all your parameters in a list as the `_params` instance attribute. The `Model` superclass defines a `get_params` method which returns `self._params` for you, and that is method that is called to get the model parameters when `Cost` is computing the gradients. Your `Model` subclass should also describe the data format it expects as inputs (`self.input_space`) and the data format of the model's output (`self.output_space`), which is required only if you're doing supervised learning. These attributes should be instances of `pylearn2.space.Space` (and generally are instances of `pylearn2.space.VectorSpace`, a subclass of `pylearn2.space.Space` used to represent batches of vectors). Broadly, this mechanism allows for automatic conversion between different `data formats `_ (e.g. if your targets are stored as integer indexes in the dataset but are required to be one-hot encoded by the model). The `some_method_for_outputs` method is really where all the magic happens. Remember, the name of the method doesn't really matter, as long as your `Cost` subclass knows that it's the one it has to call. This method expects a tensor variable as input and returns a symbolic expression involving the input and its parameters. What happens in between is up to you, and this is where you can put all the Theano code you could possibly hope for, just like you would do in pure Theano scripts. Examples ================ Let's demonstrate these ideas by writing two models, one which does supervised learning and one which does unsupervised learning. The data you train these models on is up to you, as long as it is represented in a matrix of features (each row being an example) and a matrix of targets (where each row is a target for an example). Obviously this second matrix is only required for supervised learning. While this is not the only way to store data in Pylearn2, it is probably the most common method, so we will use it in the remainder of this discussion. For the purposes of this tutorial, we will train models on the venerable MNIST dataset, which you can download at: .. code-block:: bash wget http://deeplearning.net/data/mnist/mnist.pkl.gz To make things easier to manipulate, we will unzip the archive into six different files: .. code-block:: bash python -c "from pylearn2.utils import serial; \ data = serial.load('mnist.pkl'); \ serial.save('mnist_train_X.pkl', data[0][0]); \ serial.save('mnist_train_y.pkl', data[0][1].reshape((-1, 1))); \ serial.save('mnist_valid_X.pkl', data[1][0]); \ serial.save('mnist_valid_y.pkl', data[1][1].reshape((-1, 1))); \ serial.save('mnist_test_X.pkl', data[2][0]); \ serial.save('mnist_test_y.pkl', data[2][1].reshape((-1, 1)))" Supervised learning using logistic regression --------------------------------------------- Let's keep things simple by porting to Pylearn2 the *Hello World!* of supervised learning: logistic regression. For a refresher, we suggest that you first read the `deeplearning.net tutorial `_ on logistic regression. Here is what we need to do: * Implement the negative log-likelihood (NLL) loss in our `Cost` subclass * Initialize the model parameters W and b * Implement the model's logistic regression output Let's start with the `Cost` subclass: .. code-block:: python import theano.tensor as T from pylearn2.costs.cost import Cost, DefaultDataSpecsMixin class LogisticRegressionCost(DefaultDataSpecsMixin, Cost): supervised = True def expr(self, model, data, **kwargs): space, source = self.get_data_specs(model) space.validate(data) inputs, targets = data outputs = model.logistic_regression(inputs) loss = -(targets * T.log(outputs)).sum(axis=1) return loss.mean() We assumed our model has a `logistic_regression` method which accepts a batch of examples and computes the logistic regression output. We will implement that method in just a moment. We also computed the loss as the average negative log-likelihood of the targets given the logistic regression output, as described in the deeplearning.net tutorial. Also, notice how we set `supervised` to `True`. Now for the `Model` subclass: .. code-block:: python import numpy import theano.tensor as T from pylearn2.models.model import Model from pylearn2.space import VectorSpace from pylearn2.utils import sharedX class LogisticRegression(Model): def __init__(self, nvis, nclasses): super(LogisticRegression, self).__init__() self.nvis = nvis self.nclasses = nclasses W_value = numpy.random.uniform(size=(self.nvis, self.nclasses)) self.W = sharedX(W_value, 'W') b_value = numpy.zeros(self.nclasses) self.b = sharedX(b_value, 'b') self._params = [self.W, self.b] self.input_space = VectorSpace(dim=self.nvis) self.output_space = VectorSpace(dim=self.nclasses) def logistic_regression(self, inputs): return T.nnet.softmax(T.dot(inputs, self.W) + self.b) The model's constructor receives the dimensionality of the input and the number of classes. It initializes the weights matrix and the bias vector with `sharedX`. It also sets its input space to an instance of `VectorSpace` of the dimensionality of the input (meaning it expects the input to be a batch of examples which are all vectors of size `nvis`) and its output space to an instance of `VectorSpace` of dimension `nclasses` (meaning it produces an output corresponding to a batch of probability vectors, one element for each possible class). The `logistic_regression` method does pretty much what you would expect: it returns a linear transformation of the input followed by a softmax non-linearity. How about we give it a try? Save those two code snippets in a single file (e.g. `log_reg.py`) and save the following in `log_reg.yaml`: .. code-block:: yaml !obj:pylearn2.train.Train { dataset: &train !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix { X: !pkl: 'mnist_train_X.pkl', y: !pkl: 'mnist_train_y.pkl', y_labels: 10, }, model: !obj:log_reg.LogisticRegression { nvis: 784, nclasses: 10, }, algorithm: !obj:pylearn2.training_algorithms.sgd.SGD { batch_size: 200, learning_rate: 1e-3, monitoring_dataset: { 'train' : *train, 'valid' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix { X: !pkl: 'mnist_valid_X.pkl', y: !pkl: 'mnist_valid_y.pkl', y_labels: 10, }, 'test' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix { X: !pkl: 'mnist_test_X.pkl', y: !pkl: 'mnist_test_y.pkl', y_labels: 10, }, }, cost: !obj:log_reg.LogisticRegressionCost {}, termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter { max_epochs: 15 }, }, } Run the following command: .. code-block:: python python -c "from pylearn2.utils import serial; \ train_obj = serial.load_train_file('log_reg.yaml'); \ train_obj.main_loop()" Congratulations, you just implemented your first model in Pylearn2! *(By the way, the targets you used to initialize `DenseDesignMatrix` instances were column matrices, yet your model expects to receive one-hot encoded vectors. The reason why you can do that is because Pylearn2 does the conversion for you via the `data_specs` mechanism. That's why specifying the model's `input_space` and `output_space` is important.)* Unsupervised learning using an autoencoder ------------------------------------------ Let's now have a look at an unsupervised learning example: an autoencoder with tied weights. Once again, we recommend that you read the `deeplearning.net tutorial `_. Here's what we'll do: * Implement the binary cross-entropy reconstruction loss in our `Cost` subclass * Initialize the model parameters W and b * Implement the model's reconstruction logic Let's start again by the `Cost` subclass: .. code-block:: python import theano.tensor as T from pylearn2.costs.cost import Cost, DefaultDataSpecsMixin class AutoencoderCost(DefaultDataSpecsMixin, Cost): supervised = False def expr(self, model, data, **kwargs): space, source = self.get_data_specs(model) space.validate(data) X = data X_hat = model.reconstruct(X) loss = -(X * T.log(X_hat) + (1 - X) * T.log(1 - X_hat)).sum(axis=1) return loss.mean() We assumed our model has a `reconstruction` method which encodes and decodes its input. We also computed the loss as the average binary cross-entropy between the input and its reconstruction. This time, however, we set `supervised` to `False`. Now for the `Model` subclass: .. code-block:: python import numpy import theano.tensor as T from pylearn2.models.model import Model from pylearn2.space import VectorSpace from pylearn2.utils import sharedX class Autoencoder(Model): def __init__(self, nvis, nhid): super(Autoencoder, self).__init__() self.nvis = nvis self.nhid = nhid W_value = numpy.random.uniform(size=(self.nvis, self.nhid)) self.W = sharedX(W_value, 'W') b_value = numpy.zeros(self.nhid) self.b = sharedX(b_value, 'b') c_value = numpy.zeros(self.nvis) self.c = sharedX(c_value, 'c') self._params = [self.W, self.b, self.c] self.input_space = VectorSpace(dim=self.nvis) def reconstruct(self, X): h = T.tanh(T.dot(X, self.W) + self.b) return T.nnet.sigmoid(T.dot(h, self.W.T) + self.c) The constructor looks quite similar to the logistic regression example, except that this time we don't need to specify the model's output space. The `reconstruct` method simply encodes and decodes its input. Let's try to train it. Save the two code snippets in a single file. For instance `autoencoder.py`. Then save the following in `autoencoder.yaml`: .. code-block:: none !obj:pylearn2.train.Train { dataset: &train !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix { X: !pkl: 'mnist_train_X.pkl', }, model: !obj:autoencoder.Autoencoder { nvis: 784, nhid: 200, }, algorithm: !obj:pylearn2.training_algorithms.sgd.SGD { batch_size: 200, learning_rate: 1e-3, monitoring_dataset: { 'train' : *train, 'valid' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix { X: !pkl: 'mnist_valid_X.pkl', }, 'test' : !obj:pylearn2.datasets.dense_design_matrix.DenseDesignMatrix { X: !pkl: 'mnist_test_X.pkl', }, }, cost: !obj:autoencoder.AutoencoderCost {}, termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter { max_epochs: 15 }, }, } Run the following command: .. code-block:: bash python -c "from pylearn2.utils import serial; \ train_obj = serial.load_train_file('autoencoder.yaml'); \ train_obj.main_loop()" What have we gained? ==================== At this point you might be thinking *"There's still boilerplate code to write; what have we gained?"* The answer is that we gained access to the plethora of scripts, model parts, costs and training algorithms which are built into Pylearn2. You don't have to reinvent the wheel anymore when you wish to train using SGD and momentum. If you want to switch from SGD to BGD, then Pylearn2 makes this is as simple as changing the training algorithm description in your YAML file. As we pointed out earlier, this demonstrates only the **bare minimum** needed to implement a model in Pylearn2. Nothing prevents you from digging deeper in the codebase and overriding some methods to gain new functionalities. Here's an example of how a few more lines of code can do a lot for you in Pylearn2. Monitoring various quantities during training --------------------------------------------- Let's monitor the classification error of our logistic regression classifier. To do so, you will have to override `Model`'s `get_monitoring_data_specs` and `get_monitoring_channels` methods. The former specifies what the model needs for its monitoring, and in which format they should be provided. The latter does the actual monitoring by returning an `OrderedDict` mapping string identifiers to their quantities. Let's look at how it's done. Add the following to `LogisticRegression`: .. code-block:: python # Keeps things compatible for Python 2.6 from theano.compat.python2x import OrderedDict from pylearn2.space import CompositeSpace class LogisticRegression(Model): # (Your previous code) def get_monitoring_data_specs(self): space = CompositeSpace([self.get_input_space(), self.get_target_space()]) source = (self.get_input_source(), self.get_target_source()) return (space, source) def get_monitoring_channels(self, data): space, source = self.get_monitoring_data_specs() space.validate(data) X, y = data y_hat = self.logistic_regression(X) error = T.neq(y.argmax(axis=1), y_hat.argmax(axis=1)).mean() return OrderedDict([('error', error)]) The content of `get_monitoring_data_specs` may look cryptic at first. Documentation for data specs can be found `here `_. All you really need to know, is that this is the standard method in Pylearn2 to request a tuple whose first element represents features and second element represents targets. The content of `get_monitoring_channels` should more familiar. We start by checking `data` just as in `Cost` subclasses' implementation of `expr`, and we separate `data` into features and targets. We then get predictions by calling `logistic_regression` and computing the average error the standard way. We return an `OrderedDict` mapping `'error'` to the Theano expression for the classification error. If we launch training again using .. code-block:: bash python -c "from pylearn2.utils import serial; \ train_obj = serial.load_train_file('log_reg.yaml'); \ train_obj.main_loop()" then you'll see the classification error being displayed with the other monitored quantities. What's next? ============ The examples given in this tutorial are obviously very simplistic and could be easily replaced by existing parts of Pylearn2. However, they show a path that one can take to implement arbitrary ideas in Pylearn2. In order to avoid reinventing the wheel, it is often useful to dig into Pylearn2's codebase to see what has already been implemented. For example, the VAE framework relies on the MLP framework to represent the mapping from inputs to conditional distribution parameters. While it is often desirable to reuse code, the inherent difficulty of this depends on your knowledge of Pylearn2, and also how similar your model is to what is already implemented. You should never feel ashamed to dump Theano code inside a `Model` subclass' method like we showed here. The modularity of your code can be improved gradually, and at your own pace. In the meantime you can still benefit from Pylearn2's features, like human-readable descriptions of experiments, automatic monitoring of various quantities, easily-interchangeable training algorithms, and so on.