Module 2.2 - Tensor Functions¶

Terminology¶

  • 2-Dimensional
  • Math: Matrix
In [2]:
matrix(3, 5)
Out[2]:

Terminology¶

  • Arbitrary dimensions - Tensor (Array in numpy)
In [3]:
tensor(0.75, 2, 3, 5)
Out[3]:

Terminology¶

  • Dims - # dimensions (x.dims)
  • Shape - # cells per dimension (x.shape)
  • Size - # cells (x.size)

Why not just use lists?¶

  • Functions to manipulate shape
  • Mathematical notation
  • Can act as Variables / Parameters
  • Efficient control of memory (Module-3)

Shape - Transpose¶

In [4]:
matrix(3, 5) | chalk.hstrut(1) | matrix(5, 3)
Out[4]:

Shape Permutation¶

In [5]:
x = minitorch.tensor([[1, 2, 3], [3, 2, 1]])
x.shape
Out[5]:
(2, 3)
In [6]:
x.permute(1, 0).shape
Out[6]:
(3, 2)

Lecture Quiz 1¶

How does this work¶

  • Storage : 1-D array of numbers of length size

  • Strides : tuple that provides the mapping from user indexing to the position in the 1-D storage.

Strides¶

In [7]:
d = (
    matrix(5, 2, "n", colormap=color(5, 2))
    / vstrut(1)
    / matrix(1, 10, "s", colormap=lambda i, j: color(5, 2)(j % 5, j // 5))
)
d.connect(("n", 3, 0), ("s", 0, 3)).connect(("n", 3, 1), ("s", 0, 8))
Out[7]:

Stride Intuition¶

  • Numerical bases,
  • Index for position 0? Position 1? Position 2?
In [8]:
tensor(0.75, 2, 2, 2)
Out[8]:

Stride Intuition¶

  • Index for position 0? Position 1? Position 2?

  • $[0, 0, 0], [0, 0, 1], [0, 1, 0]$

In [9]:
(
    tensor(0.5, 2, 2, 2, "n", colormap=lambda i, j, k: color(4, 2)(i * 2 + j, k))
    / vstrut(1)
    / matrix(1, 8, "s", colormap=color(1, 8))
)
Out[9]:

Lecture Quiz 2¶

Outline¶

  • Tensor Functions
  • Operations
  • Broadcasting

Tensor Functions¶

Goal¶

  • Support user api
  • Keep track of tensor properties
  • Setup fast / simple functions

Functions¶

  • Moving from Scalar to Tensor Functions
  • Implementation?
In [10]:
def add2(a, b):
    out_tensor = minitorch.zeros(*a.shape)
    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            out_tensor[i, j] = a[i, j] + b[i, j]
    return out_tensor

Issues¶

  • Different code per different dims
  • Big autodiff graph
  • Slow, lots of Python loops
  • Lots of code

Tensor Functions¶

  • Track graph at tensor level
  • Functions wrap / unwrap Tensors
In [11]:
a = minitorch.tensor([3, 2, 1])
b = minitorch.tensor([1, 2, 3])
out = a + b
print(out)
[4.00 4.00 4.00]

Implementation¶

  • Function class (forward / backward)
  • Similar API as scalars
  • Take / return Tensor objects

Operations¶

Implementing Tensor Functions¶

  • Option: code for loop for each
  • Lazy. We did this already...
  • Optimization. How do we make it fast?

Strategy¶

  • Implement high-level functions
  • Lift scalar operators to tensors
  • Go back and optimize high-level functions
  • Customize important Functions

Tensor Functions¶

In [12]:
# Unary

new_tensor = a.log()

# Binary (for now, only same shape)

new_tensor = a + b

# Reductions

new_tensor = a.sum()

Tensor Ops¶

  1. Map - Apply to all elements
  2. Zip - Apply to all pairs
  3. Reduce - Reduce a subset

Map¶

In [13]:
set_svg_draw_height(200)
set_svg_height(200)
astyle = Style().line_width(.4).line_color(Color("purple"))
opts = ArrowOpts(shaft_style=astyle)

d = hcat([matrix(3, 2, "a"), vstrut(1) / right_arrow, matrix(3, 2, "b")], 1)
d.connect(("a", 0, 0), ("b", 0, 0), style=opts).connect(("a", 1, 0), ("b", 1, 0), style=opts)
Out[13]:

Examples: Map¶

Binary operations

In [14]:
new_tensor = a.log()
new_tensor = a.exp()
new_tensor = -b

Zip¶

In [15]:
opts = ArrowOpts(arc_height=0.5, shaft_style=astyle)
opts2 = ArrowOpts(arc_height=0.2, shaft_style=astyle)

d = hcat([matrix(3, 2, "a"), matrix(3, 2, "b"), right_arrow, matrix(3, 2, "c")], 1)
d.connect(("a", 0, 0), ("c", 0, 0), opts).connect(
    ("a", 1, 0), ("c", 1, 0), opts
).connect(("b", 0, 0), ("c", 0, 0), opts2).connect(("b", 1, 0), ("c", 1, 0), opts2)
Out[15]:

Examples: Zip¶

Binary operations

In [16]:
new_tensor = a + b
new_tensor = a * b
new_tensor = a < b

Reduce¶

In [17]:
opts = ArrowOpts(shaft_style=astyle)
d = hcat([matrix(3, 2, "a"), right_arrow, matrix(1, 2, "c")], 1)
d.connect(("a", 2, 0), ("a", 0, 0), style=opts).connect(("a", 2, 1), ("a", 0, 1), style=opts)
Out[17]:

Reduce Options¶

  • Can reduce full tensor
  • Can also just reduce 1 dimension
In [18]:
out = minitorch.rand((3, 4, 5)).mean(1)
print(out.shape)
# (3, 1, 5)
(3, 1, 5)

Examples: Reduce¶

Binary operations

In [19]:
new_tensor = a.mean()
new_tensor = out.sum(1)

Reduce Example¶

Code

Implementation Notes¶

  • Needs to work on any strides.
  • Start from output. Where does each final value come from?
  • Make sure you really understand tensor data first.

Broadcasting¶

High Level¶

  • Apply same operation multiple times
  • Avoid loops and writes
  • Save memory

First Challenge¶

  • Relaxing Zip constraints
  • Apply zip without shapes being identical

Motivation: Scalar Addition¶

vector1 + 10
In [20]:
a = minitorch.tensor([1])
b = minitorch.tensor([1, 2, 4])
tensor_to_diagram(b) | chalk.hstrut(1) | tensor_to_diagram(a) 
Out[20]:

Naive Scalar Addition 1¶

  • Repeat vector-size $vector1 + [10, 10, 10]$
In [21]:
vector1 = minitorch.tensor([1, 2, 3])
print(vector1 + minitorch.tensor([10, 10, 10]))
[11.00 12.00 13.00]

Naive Scalar Addition 2¶

  • Write a for loop
In [22]:
temp_vector = minitorch.zeros((vector1.shape[0],))
for i in range(temp_vector.shape[0]):
    temp_vector[i] = vector1[i] + 10

Broadcasting¶

  • No intermediate terms
  • Define rules to make different shapes work together
  • Avoid for loops entirely

Zip With Broadcasting¶

In [23]:
a = minitorch.tensor([1, 2, 4])
b = minitorch.tensor([3, 2])
out = minitorch.zeros((3, 2))
for i in range(3):
    for j in range(2):
        out[i, j] = a[i] + b[j]

Zip Broadcasting¶

In [24]:
opts = ArrowOpts(head_arrow=empty(), arc_height=0.5, shaft_style=astyle)
opts2 = ArrowOpts(head_arrow=empty(), arc_height=0.2, shaft_style=astyle)

d = hcat([matrix(3, 1, "a"), matrix(1, 2, "b"), right_arrow, matrix(3, 2, "c")], 1)
d.connect(("a", 0, 0), ("c", 0, 0), opts).connect(
    ("a", 1, 0), ("c", 1, 1), opts
).connect(("b", 0, 0), ("c", 0, 0), opts2).connect(("b", 0, 1), ("c", 1, 1), opts2)
Out[24]:

Rules¶

  • Rule 1: Dimension of size 1 broadcasts with anything
  • Rule 2: Extra dimensions of 1 can be added with view
  • Rule 3: Zip automatically adds starting dims of size 1

Matrix Scalar Addition¶

Matrix + Scalar

matrix1 + tensor([10])
In [25]:
a = minitorch.tensor([1])
b = minitorch.tensor([[1, 2], [3,4]])
tensor_to_diagram(a) | chalk.hstrut(1) | tensor_to_diagram(b) 
Out[25]:

Matrix Scalar Addition¶

Matrix + Vector

In [26]:
matrix1 = minitorch.zeros((4, 3))
a = matrix1.view(4, 3) 
b = minitorch.tensor([1, 2, 3])
out = a + b
In [27]:
tensor_to_diagram(a) | chalk.hstrut(1) | tensor_to_diagram(b) | chalk.hstrut(1) | tensor_to_diagram(a + b)
Out[27]:

Matrix Scalar Addition¶

In [28]:
# Doesn't Work!
# matrix1.view(4, 3) + minitorch.tensor([1, 2, 3, 5])
In [29]:
# Does Work!
# matrix1.view(4, 3) + tensor([1, 2, 3, 5]).view(4, 1)

Applying the Rules¶

A B =
(3, 4, 5) (3, 1, 5) (3, 4, 5)
(3, 4, 1) (3, 1, 5) (3, 4, 5)
(3, 4, 1) (1, 5) (3, 4, 5)
(3, 4, 1) (3, 5) Fail

Exercises¶

A B =
(1, 3, 4) (1, 3, 1)
(1, 4, 4) (3, 1, 5)
(3, 4, 1) (1, )

Q&A¶