Automatic Differentiation

Automatic differentiation is achieved through the use of two primary functions in the Torch.Autograd module: makeIndependent and grad.

Independent Tensors

makeIndependent is used to instantiate an independent tensor variable from which a compute graph is constructed for differentiation, while grad uses compute graph to compute gradients.

makeIndependent takes a tensor as input and returns an IO action which produces an Torch.Autograd.IndependentTensor{.haskell .identifier}:

makeIndependent :: Tensor -> IO IndependentTensor

What is the definition of the IndependentTensor type produced by the makeIndependent action? It's defined in the Hasktorch library as:

newtype IndependentTensor = IndependentTensor {toDependent :: Tensor} deriving (Show)

Thus IndependentTensor is simply a wrapper around the underlying Tensor that is passed in as the argument to makeIndependent. Building up computations using ops applied to the toDependent tensor of an IndependentTensor will implicitly construct a compute graph to which grad can be applied.

All tensors have an underlying property that can be retrieved using the Torch.Autograd.requiresGrad function which indicates whether they are a differentiable value in a compute graph.[^requires-grad]

let x = asTensor ([1, 2, 3] :: [Float])
y <- makeIndependent (asTensor ([4, 5, 6] :: [Float]))
let y' = toDependent y
let z = x + y'
requiresGrad x => 
requiresGrad y' => 
requiresGrad z => 

In summary, tensors that are computations of values derived from tensor constructors (e.g. ones, zeros, fill, randIO etc.) outside the context of a IndependentTensor are not differentiable. Tensors that are derived from computations on the toDependent value of an IndependentTensor are differentiable, as the above example illustrates.


Once a computation graph is constructed by applying ops and computing derived quantities stemming from a toDependent value of an IndependentTensor, a gradient can be taken by using the grad function specifying in the first argument tensor corresponding to function value of interest and a list of Independent tensor variables that the the derivative is taken with respect to:

grad :: Tensor -> [IndependentTensor] -> [Tensor]

Let's demonstrate this with a concrete example. We create a tensor and derive an IndependentTensor from it:

a <- makeIndependent (ones' [2, 2])
let a' = toDependent a
a' => 
Tensor Float [2,2] [[ 1.0000   ,  1.0000   ],
                    [ 1.0000   ,  1.0000   ]]

Now do some computations on the dependent tensor:

let b = a' + 2
b => 
Tensor Float [2,2] [[ 3.0000   ,  3.0000   ],
                    [ 3.0000   ,  3.0000   ]]

Since b is dependent on the independent tensor a, it is differentiable:

requiresGrad b => 

Applying more operations:

let c = b * b * 3
let out = mean c
c => 
Tensor Float [2,2] [[ 27.0000   ,  27.0000   ],
                    [ 27.0000   ,  27.0000   ]]

Now retrieve the gradient:

grad out [a] => 
[ Tensor Float [2,2] [[ 4.5000   ,  4.5000   ],
                    [ 4.5000   ,  4.5000   ]]]