Differentiable Programs (Neural Networks)

From a functional programming perspective, a neural network is represented by data and functions, much like any other functional program. The only distinction that differentiates neural networks from any other functional program is that it implements a small interface surface to support differentiation. Thus, we can consider neural networks to be "differentiable functional programming".

The data in neural networks are the values to be fitted that parameterize the functions which carry out the inference operation and are modified based on gradients of through those functions.

As with a regular Haskell program, this data is represented by an algebraic data type (ADT). The ADT can take on any shape that's needed to model the domain of interest, allowing a great deal of flexibility and enabling all of Haskell's strenghts in data modeling - can use sum or product types, nest types, etc. The ADT can implement various typeclasses to take on other functionality.

The core interface that defines capability specific to differentiable programming is the Torch.NN.Parameterized typeclass:

class Parameterized f where
  flattenParameters :: f -> [Parameter]
  default flattenParameters :: (Generic f, Parameterized' (Rep f)) => f -> [Parameter]
  flattenParameters f = flattenParameters' (from f)

  replaceOwnParameters :: f -> ParamStream f
  default replaceOwnParameters :: (Generic f, Parameterized' (Rep f)) => f -> ParamStream f
  replaceOwnParameters f = to <$> replaceOwnParameters' (from f)

Note Parameter is simply a type alias for IndependentTensor in the context of neural networks (i.e. type Parameter = IndependentTensor).

The role of flattenParameters is to unroll any arbitrary ADT representation of a neural network into a standard flattened representation consisting a list of IndependentTensor which is used to compute gradients.

replaceOwnParameters is used to update parameters. ParamStream is a type alias for a State type with state represented by a Parameter list and a value parameter corresponding to the ADT defining the model.

type ParamStream a = State [Parameter] a

Note the use of generics. Generics allow the compiler to usually automatically derive flattenParameters and replaceOwnParameter instances without any code if your type is built up on tensors, containers of tensors, or other types that are built from tensor values (for example, layer modules provided in Torch.NN. In many cases, as you'll see in the following examples, you will only need to add

instance Parameterized MyNeuralNetwork

(where MyNeuralNetwork is an ADT definition for your model) and the compiler will derive implementations for the flattenParameters and replaceOwnParameters.