Autodiff

_images/backprop4.png

This module shows how to build the first version of MiniTorch (mini-MiniTorch?) using only Scalar values. This covers key aspects of auto-differentiation: the key technique in the system. Then you will use your code to train a preliminary model.

All starter code is available in https://github.com/minitorch/Module-1 .

To begin, remember to activate your virtual environment first, and then clone your assignment:

>>> git clone {{STUDENT_ASSIGNMENT1_URL}}
>>> cd {{STUDENT_ASSIGNMENT_NAME}}
>>> pip install -Ue .

Module 1 is built upon the previous Module 0, so make sure to pull your files from Assignment 0 to your new repo.

Please continue to follow the Contributing guideline.

Tasks

Task 1.1: Numerical Derivatives

Note

This task requires basic familiarity with derivatives. Be sure to review differentiation rules and the notation for derivatives. Then carefully read the Guide on Derivatives.

Todo

Complete the following function in minitorch/scalar.py and pass tests marked as task1_1.

minitorch.scalar.central_difference(f, *vals, arg=0, epsilon=1e-06)

Computes an approximation to the derivative of f with respect to one arg.

See Derivatives or https://en.wikipedia.org/wiki/Finite_difference for more details.

Parameters
  • f -- arbitrary function from n-scalar args to one value

  • *vals (list of floats) -- n-float values \(x_0 \ldots x_{n-1}\)

  • arg (int) -- the number \(i\) of the arg to compute the derivative

  • epsilon (float) -- a small constant

Returns

An approximation of \(f'_i(x_0, \ldots, x_{n-1})\)

Return type

float

Task 1.2: Scalars

Note

This task requires familiarity with the minitorch.Scalar class. Be sure to first carefully read the Guide on Tracking Variables and to refresh your memory on Python numerical overrides.

Implement the overridden mathematical functions required for the minitorch.Scalar class. Each of these requires wiring the internal Python operator to the correct minitorch.Function.forward() call.

minitorch.scalar.ScalarFunction.forward(ctx, *inputs)

Forward call, compute \(f(x_0 \ldots x_{n-1})\).

Parameters
  • ctx (Context) -- A container object to save any information that may be needed for the call to backward.

  • *inputs (list of floats) -- n-float values \(x_0 \ldots x_{n-1}\).

Should return float the computation of the function \(f\).

Read the example ScalarFunctions that we have implemented for guidelines. You may find it useful to reuse the operators from Module 0.

We have built a debugging tool for you to observe the workings of your expressions to see how the graph is built. You can run it in the Autodiff Sandbox. You can alter the expression at the top of the file and then run the code to create a graph in Streamlit:

>>> streamlit run app.py -- 1

Todo

Complete the following functions in minitorch/scalar.py.

minitorch.scalar.Mul.forward(ctx, a, b)
minitorch.scalar.Inv.forward(ctx, a)
minitorch.scalar.Neg.forward(ctx, a)
minitorch.scalar.Sigmoid.forward(ctx, a)
minitorch.scalar.ReLU.forward(ctx, a)
minitorch.scalar.Exp.forward(ctx, a)
minitorch.scalar.LT.forward(ctx, a, b)
minitorch.scalar.EQ.forward(ctx, a, b)

Todo

Complete the following function in minitorch/scalar.py, and pass tests marked as task1_2. See Python numerical overrides for the interface of these methods. All of these functions should return minitorch.Scalar arguments.

minitorch.Scalar.__lt__(self, b)
minitorch.Scalar.__gt__(self, b)
minitorch.Scalar.__sub__(self, b)
minitorch.Scalar.__neg__(self)
minitorch.Scalar.__add__(self, b)
minitorch.Scalar.log(self)
minitorch.Scalar.exp(self)
minitorch.Scalar.sigmoid(self)
minitorch.Scalar.relu(self)

Task 1.3: Chain Rule

Note

This task is quite tricky, so be sure you understand the chain rule, Variables, and Functions. Be sure to first read the Guide on Autodifferentiation very carefully and read the code for other ScalarFunctions.

Implement the chain_rule function on FunctionBase for functions of arbitrary arguments. This function should be able to backward process a function by passing it in a context and \(d_{out}\) and then collecting the local derivatives. It should then pair these with the right variables and return them. This function is also where we filter out constants that were used on the forward pass, but do not need derivatives.

Todo

Complete the following function in minitorch/autodiff.py, and pass tests marked as task1_3.

minitorch.FunctionBase.chain_rule(ctx, inputs, d_output)

Implement the derivative chain-rule.

Parameters
  • ctx (Context) -- The context from running forward

  • inputs (list of args) -- The args that were passed to FunctionBase.apply() (e.g. \(x, y\))

  • d_output (number) -- The d_output value in the chain rule.

Returns

A list of non-constant variables with their derivatives (see is_constant to remove unneeded variables)

Return type

list of (Variable, number)

Task 1.4: Backpropagation

Note

Be sure to first read the Guide on Backpropagation very carefully and read the code for other ScalarFunctions.

Implement backpropagation. Each of these requires wiring the internal Python operator to the correct minitorch.Function.backward() call.

minitorch.scalar.ScalarFunction.backward(ctx, d_out)

Backward call, computes \(f'_{x_i}(x_0 \ldots x_{n-1}) \times d_{out}\).

Parameters
  • ctx (Context) -- A container object holding any information saved during in the corresponding forward call.

  • d_out (float) -- \(d_out\) term in the chain rule.

Should return the computation of the derivative function \(f'_{x_i}\) for each input \(x_i\) times d_out.

Read the example ScalarFunctions that we have implemented for guidelines. Feel free to also consult differentiation rules if you forget how these identities work.

Todo

Complete the following functions in minitorch/autodiff.py and minitorch/scalar.py, and pass tests marked as task1_4.

minitorch.topological_sort(variable)

Computes the topological order of the computation graph.

Parameters

variable (Variable) -- The right-most variable

Returns

Non-constant Variables in topological order

starting from the right.

Return type

list of Variables

minitorch.backpropagate(variable, deriv)

Runs backpropagation on the computation graph in order to compute derivatives for the leave nodes.

See Backpropagation for details on the algorithm.

Parameters
  • variable (Variable) -- The right-most variable

  • deriv (number) -- Its derivative that we want to propagate backward to the leaves.

No return. Should write to its results to the derivative values of each leaf through accumulate_derivative.

minitorch.scalar.Mul.backward(ctx, d_output)
minitorch.scalar.Inv.backward(ctx, d_output)
minitorch.scalar.Neg.backward(ctx, d_output)
minitorch.scalar.Sigmoid.backward(ctx, d_output)
minitorch.scalar.ReLU.backward(ctx, d_output)
minitorch.scalar.Exp.backward(ctx, d_output)

Task 1.5: Training

If your code works, you should now be able to run the training script. Study the code in project/run_scalar.py carefully to understand what the neural network is doing.

You will also need Module code to implement the parameters Network and for Linear. You can modify the dataset and the module with the parameters at the bottom of the file. Start with this simple config:

PTS = 50
DATASET = minitorch.datasets["Simple"](PTS)
HIDDEN = 2
RATE = 0.5

You can then move up to something more complex, for instance:

PTS = 50
DATASET = minitorch.datasets["Xor"](PTS)

HIDDEN = 10
RATE = 0.5

If your code is successful, you should be able to run the full visualization:

>>> streamlit run app.py -- 1

Todo

Train a scalar model for each of the 4 main datasets.

Add the output training logs and final images to your README file.