Blog about short term memory research

    Short term memory computer models

    Eugen G Tarnow  March 10 2013 07:26:07 AM
    I received a negative referee comment stating that in contrast to my statement - that my theory explained 80% of the variance of the initial recall distribution without using interference or other context-based theories - "extant theories of free recall that invoke contextual-change and interference-based forgetting the Temporal Context Model (TCM; Howard & Kahana, 2002), the Context-Maintenance and Retrieval (CMR; Polyn, Norman, & Kahana, 2009) model, and the Serial-Free Recall model (Farrell, 2012) provide excellent descriptive accounts of the total recall and first recall probability serial position curves."

    Models that capture the essence of a natural phenomenon can be extraordinarily useful.  An example is Pauling's chemical bond model.  One does not even need a computer to apply it!  Other times models can be of very little use such as tight-binding models of semiconductors: after the fact they can explain everything but they can predict very little of value before we know the answers.  These models can interpolate some information from small changes in configurations but they can extrapolation nothing at all. (I used to do simulations of the electronic structure of semiconductors so I know).  

    Another example are the stock market models that led to the financial crisis of 2008 - one had neglected to add the possibility of housing prices going down.  A third example is the unbelievably famous, but incorrect, model of short term memory by Atkinson & Shiffrin (see ).  Here the model authors had the data they wanted to explain, constructed a model that described the data but then failed to see what it would predict about the same data.  But that did not prevent the authors from convincing everyone that short term memory has two components.  Fitting a model to known data does not make the model "explain" anything at all, all it does is fits the data.  The Pauling model works because it goes beyond fitting: it captures the essence of chemical interactions.

    This is my gut feeling about computer models of short term memory in general.  In semiconductors we at least know what the equations are that need to be solved.  In neuroscience we know pretty much nothing except for the equations concerning isolated extremely simple neurons.  

    Nevertheless, can't computer models be useful at all?  Yes, if they are controlled and one is looking to understand particular aspects of memory.  For example, if a model is fit to a particular set of data points and one can then show that another, independent, data set is well described as well.  Then the theoretical aspects of the computer model can be important.  But are they useful if they have a lot of different parameters that are fit to the same data it is trying to explain?  Absolutely not.  Then the computer model is simply functioning as a way to interpolate data, not a way to understand it.

    The Context Maintenance & Retrieval model is based not on neurons but on concepts such as "context" which is maintained by "search lights" probing the context.  These "search lights" probe memory items nearby in time and other characteristics.  It forms a "competition where all of the items compete in parallel to have their features reinstated in the system".  The CMR model can also be described as an "iterative parallel process, where the result of each recall competition affects the course of the subsequent competition."

    Is the CMR model an activation model?  It would seem so.  The authors define "context as a pattern of activity in the cognitive system, separate from the pattern immediately evoked by the perception of a studied item, that changes over time and is associated with other coactive patterns" ... "the notion that the elements of context are activated by some stimulus or event, tend to stay active past the time this stimulus leaves the environment."  Yet there is no discussion of deactivation which would presumably occur as time passes.

    Each memory item is associated with a vector f, each context with a vector c and the interaction between them via two matrices; "a given element in an associative matrix describes the connection strength between a particular feature element, and a particular context element."  In addition to the matrices there are variables that determine how quickly c is updated by f via a matrix and how quickly f is updated by c via a matrix.

    The authors use 11-14 parameters that are fitted to 93 data points of either the Murdock (1962) dataset or the Murdock & Okada (1970) dataset.  When there are so many parameters to fit, there are typically many sets of those parameters that give similar fits.  The authors use a "genetic algorithm-fitting" technique to deal with that problem.  The 93 fitted data points are not exhaustive nor picked independently.  For example, from the probability of first recall only the final three serial positions are used. Selecting which data points to use amounts to effectively adding parameters to the model.

    Going back to the initial free recall distribution, in order to describe it the authors first includes it in the data that the large number of parameters are fit to.  Then they show that there is a good fit for the complete serial position curve for the initial free recall.  Does this provide an "excellent" descriptive account?  Surely not.  In the end they are in fact unhappy with the description of the initial free recalls and suggest that the problem is some form of rehearsal that was not taken into account.

    When the journal asked if the referee wanted to disclose her/his name the referee declined.  Since this was the PLoS journal, which is unbelievably slow in responding, I decided not to argue my case.