Feedback Alignment

This page contains a short tutorial on how to train a model using feedback alignment. It is loosely adapted from the Flux model zoo's MLP example.

Feedback alignment (Lillicrap et al) sometimes refered to as Random Backpropagation is an attempt at making neural network trainig more biologically plausible. Usually the backpropagation of error algorithm is used to train networks, but it relies on a symmetry between the weights used to make predicitons in the forward pass and the weights used for propagating error signals backwards in order to compute weight updates. Since synapses for the most part transmit signals in one direction it seems unlikely that exact one to one synaptic symmetry as required by backprop could somehow arise in biological brains.

For this reason Lillicrap et al proposed to transport errors backwards using a set of fixed random weight matrices. Surprisingly this works fairly well because the weights used in the forwards pass learn to approzimately align with the feedback weights.

The following sections contain some boilerplate code. You can expand them to show the details.

Handling dependencies

show details

using Pkg; Pkg.add(url="https://github.com/Rasmuskh/Bender.jl.git")

begin
	using Bender, Flux, MLDatasets
	using Flux: onehotbatch, onecold, logitcrossentropy, throttle
	using Flux.Data: DataLoader
	using Parameters: @with_kw
	using DataFrames
end

Utilities

show details

@with_kw mutable struct Args
    η::Float64 = 0.0003     # learning rate
    batchsize::Int = 64     # batch size
    epochs::Int = 10        # number of epochs
    device::Function = gpu  # set as gpu, if gpu available
end

function getdata(args)
    ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"

    # Loading Dataset	
    xtrain, ytrain = MLDatasets.MNIST.traindata(Float32)
    xtest, ytest = MLDatasets.MNIST.testdata(Float32)

    # Reshape Data in order to flatten each image into a vector
    xtrain = Flux.flatten(xtrain)
    xtest = Flux.flatten(xtest)

    # One-hot-encode the labels
    ytrain, ytest = onehotbatch(ytrain, 0:9), onehotbatch(ytest, 0:9)

    # Batching
    train_data = DataLoader((xtrain, ytrain), batchsize=args.batchsize, shuffle=true, partial=false)
    test_data = DataLoader((xtest, ytest), batchsize=args.batchsize, partial=false)

    return train_data, test_data
end

function evaluate(data_loader, model)
    acc = 0
	l = 0
	numbatches = length(data_loader)
    for (x,y) in data_loader
        acc += sum(onecold(cpu(model(x))) .== onecold(cpu(y)))*1 / size(x,2)
		l += logitcrossentropy(model(x), y)
    end
    return acc/numbatches, l/numbatches
end

Defining the training loop

show details

function train(; kws...)
    # Initializing Model parameters 
    args = Args(; kws...)

	# Create arrays for recording training metrics
	acc_train = zeros(Float32, args.epochs)
	acc_test = zeros(Float32, args.epochs)
	loss_train = zeros(Float32, args.epochs)
	loss_test = zeros(Float32, args.epochs)
	
    # Load Data
    train_data,test_data = getdata(args)

    # Construct model
    m = build_model()
    train_data = args.device.(train_data)
    test_data = args.device.(test_data)
    m = args.device(m)
    loss(x,y) = logitcrossentropy(m(x), y)
    
    # Training
    opt = ADAM(args.η)
	for epoch=1:args.epochs	
        Flux.train!(loss, params(m), train_data, opt)
		acc_train[epoch], loss_train[epoch] = evaluate(train_data, m)
		acc_test[epoch], loss_test[epoch] = evaluate(test_data, m)
    end

	# Return trianing metrics as a DataFrame
	df = DataFrame([loss_train, loss_test, acc_train, acc_test], 
				   [:loss_train, :loss_test, :acc_train, :acc_test])
	return df
end

Defining the model

Feedback Alignment uses two sets of weights, one for making predictions and one for transporting error signals backwards, so we need to initialize the GenDense layer with an extra set of weights. We also need to specify the forward mapping we will use, which in this case is linear_asym_∂x.

For more details see the documentation and/or source code for the forward mapping linear_asym_∂x, which behaves as a regular fully connected layer in the forwards pass, but uses the second set of weights in the backwards pass.

function build_model()
    m = Chain(GenDense(784=>128, 128=>784, relu; forward=linear_asym_∂x),
              GenDense(128=>64, 64=>128, relu; forward=linear_asym_∂x),
              GenDense(64=>10, 10=>64; forward=linear_asym_∂x))
    return m
end

Training the model

The network quickly learns to solve the problem fairly well even though we are using fixed random matrices to propagate errors backwards. Below we train the network for ten epochs. The train function stores the model's loss and accuracy on the train and test set in a DataFrame.

df = train(epochs=10); round.(df, digits=3)

Epoch	loss_train	loss_test	acc_train	acc_test
1	0.418	0.413	0.881	0.884
2	0.317	0.318	0.909	0.91
3	0.269	0.274	0.923	0.922
4	0.232	0.238	0.934	0.932
5	0.201	0.208	0.943	0.941
6	0.176	0.187	0.951	0.946
7	0.155	0.169	0.956	0.951
8	0.138	0.156	0.96	0.955
9	0.125	0.145	0.964	0.956
10	0.113	0.136	0.967	0.959

References

Timothy P. Lillicrap et al, 2014, Random feedback weights support learning in deep neural networks, Link