
embedding
Get word vector
word_one_hot = tensor([0 if i != word else 1
for i in range(VOCAB)])
embedding = (layer1 * word_one_hot).sum(1)

(word_emb1 * word_emb2).sum()

opsops fastExample map
for i in range(len(out)):
count(i, out_shape, out_index)
broadcast_index(out_index, out_shape, in_shape, in_index)
o = index_to_position(out_index, out_strides)
j = index_to_position(in_index, in_strides)
out[o] = fn(in_storage[j])
Critical code
out[o] = in_storage[j] + 3
in_storage[j]__add__ or __ladd__!Work
def my_code(x, y):
for i in range(100):
x[i] = y + 20
...
my_code(x, y)
fast_my_code = numba.njit()(my_code)
fast_my_code(x, y)
fast_my_code(x, y)
njit will fail for many python operationsTransform
def my_code(x, y):
for i in prange(100):
x[i] = y + 20
...
my_code(x, y)
fast_my_code = numba.njit(parallel=True)(my_code)
fast_my_code(x, y)
fast_my_code(x, y)
for loops with parallel versionTransform ::
def my_code(x, y):
for i in prange(100):
x[i] = y + 20
...
my_code(x, y)
fast_my_code = numba.njit(parallel=True)(my_code)
fast_my_code(x, y)
fast_my_code(x, y)