associative memory: Nonlinear Function
Created: March 29, 2024
Modified: March 29, 2024

associative memory

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.

Hopfield network

sparse distributed memory

Milledge et al. (2022), Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models (https://arxiv.org/abs/2202.04557)

  • Nice unifying discussion of associative memory models.
  • Notes that there is something of a convergence between modern/continuous generalizations of Hopfield networks, sparse distributed memories, and transformer attention.
  • "A conceptual benefit of our framework is that it makes clear that single-shot associative memory models are simply two-layer MLPs with an unusual activation function (i.e., the separation function), which works best as a softmax or max function, and where the weight matrices directly encode explicit memory vectors instead of being learnt with backpropagation. This leads immediately to the question of whether standard MLPs in machine learning can be interpreted as associative memories instead of hierarchical feature extractors. A crucial requirement for the MLP to function as an associative memory appears to be a high degree of sparsity of intermediate representations (ideally one-hot output) so that an exact memory can be reconstructed instead of a linear combination of multiple memories. "

Ba et al. (2016), Using Fast Weights to Attend to the Recent Past(https://arxiv.org/abs/1610.06258):

  • uses an associative memory as fast weights
  • instead of explicitly constructing a weight matrix (as an outer product of previous states) you can just remember the previous states directly and construct linear combinations of them. basically a dual formulation like the kernel trick.