Reduction applied to each region:
max
reductionsoftmax
outputStep function $f(x) = x > 0$ determines correct answer
Step is derivative of ReLU
$$ \begin{eqnarray*} \text{ReLU}'(x) &=& \begin{cases} 0 & \text{if } x \leq 0 \\ 1 & \text{ow} \end{cases} \\ \text{step}(x) &=& \text{ReLU}'(x) \end{eqnarray*} $$
Loss of step tells us how many points are wrong.
Mathematically,
$$\text{step}'(x) = \begin{cases} 0 & \text{if } x \leq 0 \\ 0 & \text{ow} \end{cases}$$
Not a useful function to differentiate
Used to determine the loss function
Would be nice to have a version that with a useful derivative
$$\text{sigmoid}(x) = \text{softmax} \{0, x\}$$
Useful soft version of argmax.
How do we generalize sigmoid to multiple outputs?
Max is a binary associative operator
$\max(a, b)$ returns max value
Generalized $\text{ReLU}(a) = \max(a, 0)$
argmax
, one-hotargmax
argmax
gradinputReLU(0)
.$$\text{softmax}(\textbf{x}) = \frac{\exp \textbf{x}}{\sum_i \exp x_i}$$
$$\text{softmax}([0, x])[1] = \frac{\exp x}{\exp x + \exp 0} = \sigma(x)$$
Softmax
Network
Returns a combination of x and y $$f(x, y, r) = x * \sigma(r) + y * (1 - \sigma(r))$$
$$\begin{eqnarray*} f'_x(x, y, r) &= \sigma(r) \\ f'_y(x, y, r) &= (1 - \sigma( r))\\ f'_r(x, y, r) &= (x - y) \sigma'(r) \end{eqnarray*}$$
Learn which one of the previous layers is most useful. $$\begin{eqnarray*} r &= NN_1 \\ x &= NN_2 \\ y &= NN_3\\ \end{eqnarray*} $$