Property Testing

In [1]:

Copied!

from hypothesis import example, given
from hypothesis.strategies import integers
from hypothesis import example, given
from hypothesis.strategies import integers

Testing and debugging are critical for software engineering in general, and particularly necessary for framework code that will be used in unexpected ways. Unfortunately, machine learning code is notoriously hard to test and debug. Many practitioners seem to just run models and wait until they are trained before starting the debugging process.

But how do you effectively test mathematically oriented code? Unlike many software projects where unit tests can cover most of the input cases, mathematical functions make this impossible.

For example, let's say you have a function that is meant to add two numbers (this sounds really silly, but we will see it is not)

In [2]:

Copied!





def my_add(a: int, b: int) -> int:
    """A customized integer addition function."""
    out = a
    for _ in range(-b):
        out -= 1
    for _ in range(b):
        out += 1
    return out
def my_add(a: int, b: int) -> int:
    """A customized integer addition function."""
    out = a
    for _ in range(-b):
        out -= 1
    for _ in range(b):
        out += 1
    return out

A (somewhat naive) unit test might look like this::

In [3]:

Copied!





def test_add_basic():
    # Check same as slow system add
    assert my_add(10, 7) == 10 + 7
    # Check that order doesn't matter
    assert my_add(10, 7) == my_add(7, 10)
def test_add_basic():
    # Check same as slow system add
    assert my_add(10, 7) == 10 + 7
    # Check that order doesn't matter
    assert my_add(10, 7) == my_add(7, 10)

In [4]:

Copied!

test_add_basic()
test_add_basic()

This is fine, and certainly can help catch easy bugs, but it is not very reassuring. It is particularly devastating when your code has been running for 20 hours, and then encounters some cases where your add function fails.

An alternative idea is to test properties instead of specific cases. That is, check if key aspects of the expected behavior always hold. For instance, you might imagine directly checking if these properties hold for every pair of integers:

In [5]:

Copied!





def test_add_naive():
    for a in range(-100, 100):
        for b in range(-100, 100):
            assert my_add(a, b) == a + b
            assert my_add(a, b) == my_add(b, a)
def test_add_naive():
    for a in range(-100, 100):
        for b in range(-100, 100):
            assert my_add(a, b) == a + b
            assert my_add(a, b) == my_add(b, a)

This provides better coverage, but is also naive and clearly hopelessly inefficient. Unit tests are supposed to be quick easy snippets of code that can be run quickly while developing.

A clever middle ground is to use randomized property checking. This method was popularized by a library called QuickCheck (http://wikipedia.org/wiki/quickcheck). This approach randomly selects interesting inputs in order to test your codebase's correctness. It gives you the speed of the first approach and some of the breadth of the second. Another nice benefit of randomized property checking is that it actually makes tests shorter and easier to write since it generates cases for you.

In MiniTorch, we will use a property checking library in Python called Hypothesis (https://hypothesis.readthedocs.io/). Hypothesis predefines a whole set of building block strategies that the user can pick from when writing tests. (You can also write your strategies, which you will do in the next assignment.) You can generate integers, floats, lists, strings, etc. Each test can be decorated with values that it operates on::

In [6]:

Copied!





@given(integers(), integers())
def test_add(a, b):
    # Check same as slow system add
    assert my_add(a, b) == a + b
    # Check that order doesn't matter
    assert my_add(a, b) == my_add(b, a)
@given(integers(), integers())
def test_add(a, b):
    # Check same as slow system add
    assert my_add(a, b) == a + b
    # Check that order doesn't matter
    assert my_add(a, b) == my_add(b, a)

The function integers is a strategy function that tells us what type of values to test on. It is a function because we may want to constrain the values in various ways. When debugging we can force it to give us some values.

In [7]:

Copied!

integers(min_value=0, max_value=10).example()
integers(min_value=0, max_value=10).example()

Out[7]:

It is also easy to combine randomized testing with example based testing. This can be useful if you want to create easy to debug test cases.

In [8]:

Copied!





@given(integers(), integers())
@example(5, 7)
def test_add2(a, b):
    # Check same as slow system add
    assert my_add(a, b) == a + b
    # Check that order doesn't matter
    assert my_add(a, b) == my_add(b, a)
@given(integers(), integers())
@example(5, 7)
def test_add2(a, b):
    # Check same as slow system add
    assert my_add(a, b) == a + b
    # Check that order doesn't matter
    assert my_add(a, b) == my_add(b, a)