Automatic Differentiation
Automatic differentiation is achieved through the use of two primary
functions in the Torch.Autograd module: makeIndependent and
grad.
Independent Tensors
makeIndependent is used to instantiate an independent tensor variable
from which a compute graph is constructed for differentiation, while
grad uses compute graph to compute gradients.
makeIndependent takes a tensor as input and returns an IO action which
produces an Torch.Autograd.IndependentTensor{.haskell .identifier}:
makeIndependent :: Tensor -> IO IndependentTensor
What is the definition of the IndependentTensor type produced by the
makeIndependent action? It's defined in the Hasktorch library as:
newtype IndependentTensor = IndependentTensor {toDependent :: Tensor} deriving (Show)
Thus IndependentTensor is simply a wrapper around the underlying
Tensor that is passed in as the argument to
makeIndependent. Building up computations using ops applied to the
toDependent tensor of an IndependentTensor will implicitly
construct a compute graph to which grad can be applied.
All tensors have an underlying property that can be retrieved using
the Torch.Autograd.requiresGrad function which indicates whether
they are a differentiable value in a compute graph.[^requires-grad]
let x = asTensor ([1, 2, 3] :: [Float])
y <- makeIndependent (asTensor ([4, 5, 6] :: [Float]))
let y' = toDependent y
let z = x + y'
requiresGrad x => 
False
requiresGrad y' => 
True
requiresGrad z => 
True
In summary, tensors that are computations of values derived from
tensor constructors (e.g. ones, zeros, fill, randIO etc.)
outside the context of a IndependentTensor are not
differentiable. Tensors that are derived from computations on the
toDependent value of an IndependentTensor are differentiable, as
the above example illustrates.
Gradients
Once a computation graph is constructed by applying ops and computing
derived quantities stemming from a toDependent value of an
IndependentTensor, a gradient can be taken by using the grad
function specifying in the first argument tensor corresponding to
function value of interest and a list of Independent tensor
variables that the the derivative is taken with respect to:
grad :: Tensor -> [IndependentTensor] -> [Tensor]
Let's demonstrate this with a concrete example. We create a tensor and
derive an IndependentTensor from it:
a <- makeIndependent (ones' [2, 2])
let a' = toDependent a
a' => 
Tensor Float [2,2] [[ 1.0000   ,  1.0000   ],
                    [ 1.0000   ,  1.0000   ]]
Now do some computations on the dependent tensor:
let b = a' + 2
b => 
Tensor Float [2,2] [[ 3.0000   ,  3.0000   ],
                    [ 3.0000   ,  3.0000   ]]
Since b is dependent on the independent tensor a, it is differentiable:
requiresGrad b => 
True
Applying more operations:
let c = b * b * 3
let out = mean c
c => 
Tensor Float [2,2] [[ 27.0000   ,  27.0000   ],
                    [ 27.0000   ,  27.0000   ]]
Now retrieve the gradient:
grad out [a] => 
[ Tensor Float [2,2] [[ 4.5000   ,  4.5000   ],
                    [ 4.5000   ,  4.5000   ]]]