============= Derivatives ============= Let's begin by discussing derivatives in the setting of programming. We assume that you have not seen derivatives in a while, so we will start slow and develop some notations first. .. jupyter-execute:: :hide-code: import sys sys.path.append("../project/") import minitorch from datasets_plotly import plot_function import math Assume we are given a function, .. math :: f(x) = \sin(2 x) .. jupyter-execute:: def f(x): return math.sin(2 * x) plot_function("sin", f) we can compute a function for its derivative by applying rules from univariate calculus. We will use the Lagrange notation where the derivative of a one-argument function :math:`f'` as is denoted :math:`f'` i.e. .. math :: f'(x) = 2 \times \cos(2 x) .. jupyter-execute:: def d_f(x): return 2 * math.cos(2 * x) plot_function("d_sin", d_f) When working with two-argument functions, we use subscripts to indicate which argument the derivative applies to. For example, .. math :: \begin{eqnarray*} f(x, y) &=& x + 2 y \\ f'_x(x, y) &=& 1\\ f'_y(x, y) &=& 2\\ \end{eqnarray*} We will refer to these as `symbolic` derivatives of the function. When available symbolic derivatives are ideal, they tell us everything we need to know about the derivative of the function. Visually, derivative functions correspond to slopes of tangent lines in 2D. Let's start with this simple function: .. math :: f(x) = x^2 .. image:: figs/Grad/function.png :align: center Its derivative at an arbitracy input is the slope of the line tangent to that input. .. math :: f'(x) = 2x .. image:: figs/Grad/tangent.png :align: center The above visual representation informally motivates an alternative approach to compute a `numerical` derivative. Recall one definition of the derivative function is this slope as we approach as a tangent line: .. math :: f'(x) = \lim_{\epsilon \rightarrow 0} \frac{f(x + \epsilon) - f(x)}{\epsilon} If we set :math:`epsilon` to be very small, we get an approximation of the derivative function: .. math :: f'(x) \approx \frac{f(x + \epsilon) - f(x)}{\epsilon} Alternatively, you could imagine approaching x from the other side, which would yield a different derivative function: .. math :: f'(x) = \lim_{\epsilon \rightarrow 0} \frac{f(x) - f(x- \epsilon)}{\epsilon} You can show that doing both simultaneously yields the best approximation (you probably proved this in high school!): .. math :: f'(x) \approx \frac{f(x + \epsilon) - f(x-\epsilon)}{2\epsilon} .. image:: figs/Grad/approx.png :align: center The above is known as the `central difference`. You could find a complete `description `_ of the method here. The key benefit of the `numerical` approach is that we do not need to know everything about the function: all we need is to able to compute its value under a given input. From a programming sense, this means we can approximate the derivative for any black-box function. In implementation, it means we can write a `higher-order function` of the following form:: def central_difference(f, x): ... Assume we are just given an arbitrary python function:: def f(x): "Compute some unknown function of x." ... we can call central_difference(f,x) to immediately approximate the derivative of this function f on input x. We will see that this approach is not a great way to train machine learning models, but it provides a generic alternative approach to check if your derivative functions are correct, e.g. free :doc:`property_testing`. Here are some examples of Module-0 functions with central difference applied. .. jupyter-execute:: plot_function("sigmoid", minitorch.operators.sigmoid) .. jupyter-execute:: def d_sigmoid(x): return minitorch.central_difference(minitorch.operators.sigmoid, x) plot_function("Derivative of sigmoid", d_sigmoid) .. jupyter-execute:: plot_function("exp", minitorch.operators.exp) .. jupyter-execute:: def d_exp(x): return minitorch.central_difference(minitorch.operators.exp, x) plot_function("Derivative of exp", d_exp) .. jupyter-execute:: plot_function("ReLU", minitorch.operators.relu) .. jupyter-execute:: def d_relu(x): return minitorch.central_difference(minitorch.operators.relu, x) plot_function("Derivative of ReLU", d_relu) .. jupyter-execute:: def times_5(x): return minitorch.operators.mul(x, 5) plot_function("Mul by 5", times_5) .. jupyter-execute:: def d_times_5(x): return minitorch.central_difference(minitorch.operators.mul, x, 5, arg=0) plot_function("Derivative of mul by 5", d_times_5)