Forget about forgetting

J.Kowalski, November 1994, Enter, Poland Translated and updated: September 1996

This is a translation of a popular scientific article about memory and learning written by J.Kowalski, Poland, for Enter in November 1994. Translator comments are placed in square brackets

Contents:

Evolution and memory
Evolutionary optimization of forgetting
Spacing effect
Human memory vs virtual memory in operating systems
Retrievability and stability of memory
Molecular basis of memory
Internals of SuperMemo
Theory and practice
It might work but it cannot be that good
How was SuperMemo developed?
Epilog

Increase the speed of learning 50-fold! This is how SuperMemo World advertises its product: SuperMemo. Is it yet another case of product hype or SuperMemo is indeed a case worth serious consideration?

The author of SuperMemo, Piotr Wozniak, PhD, graduate of Adam Mickiewicz University (molecular biology) and the University of Technology in Poznan (computer science), claims that there are very few secrets about why SuperMemo is so effective. When we met in November 1993, Wozniak tried to dispel all my doubts about SuperMemo. It really works, that’s a fact, and nobody has to convince me about this. However, is it really the ultimate? Is it really science, or is it just a product born out of the promotional effort of the SuperMemo World’s marketing team.

Evolution and memory

The most interesting and common-sense evidence speaking for SuperMemo is the evolution. Wozniak’s approach to SuperMemo is deeply evolutionary: It’s enough to have some basic understanding of the mechanisms of evolution to understand why memory works the way it does, and why SuperMemo is the ultimate solution to the problem of forgetting.

The nervous system has been the evolutionary invention that introduced central control in, otherwise, homeostatically controlled organisms. It was like introducing a global communist government in a conglomerate of countries and federations, i.e. cells and organs, governed by purely free market of enzymes, free-flowing metabolites and hormones. In its introduction of communism, the evolution was smarter than humans in this sense that the communist practices were introduced not outright, but in proportion to available means. The developing nervous system, in the successive stages of the evolution, took over an increasing control over the organism in tandem with the increasing complexity of its neural structure. The culmination of this process was the human brain. The ultimate creator of our civilization. Not only for the successful introduction of neural communism, the evolution can be viewed as an incredibly smart designer which would not miss an opportunity for improvement. Though its mechanisms are dead slow and purely random, what could not be accomplished by guided progress has been accomplished by the power of 4 billion years that have passed since the appearance of the first living cell. The infallibility of the evolution in the range of what can be accomplished by living matter based on DNA and proteins can be a very useful guidance in understanding neurophysiology and human psychology; including the mechanisms of memory. Optimization of the mechanisms involved in memory has been based on fine-tuning the regulatory properties of metabolical, and to a degree, electrochemical processes occurring in the synapse. It did not need the involvement of new organs, nor even cells. Therefore unlike the invention of flight, which nota bene has been worked out by the evolution more than once, optimization of memory can be compared to tuning up a radio receiver as opposed to constructing it from scratch (as in the case of developing wings and the ability to fly). It is not difficult to notice that the memory mechanisms have been very much conserved in the course of philogenesis. After all, famous American neuroscientist Dr Eric Kandel has spent a few decades studying the primitive nervous system of a mollusc Aplysia caliphornica (just a couple of nerve cells making up the entire system). Kandel’s involvement with Aplysia did not prevent him from drawing far-reaching conclusions concerning the mechanisms of memory in humans. Similarly, one of the most important discoveries in the molecular research on memory in the last decade, involvement of the membrane protein kinase C in conditioning, has first been spotted in a marine snail Hermissenda. In other words, evolution did not take long to figure out the optimum properties of memory, which, according to Wozniak, are as widespread in the nervous system as the citric acid cycle in the organism. In the next sections, we will have a look at optimum properties of memory and their relationship with SuperMemo. If indeed evolution is infallible, SuperMemo makes the best use of optimum memory properties.

Evolutionary optimization of forgetting

Let us have a more detailed look why, according to Wozniak, memory and forgetting work the way that makes SuperMemo possible. To think about the brain as about a computer is a very useful metaphor. Everyone who has some basic understanding of computation will know that no computer can solve problems without memory. Memory is needed to keep the record of the computation; however, it can also be used to keep a modifiable program. After all the power of computers rests in their programmability. Human beings, more or less consciously, program their brains using the so-called long-term memory, i.e. memory which lasts for months and years. However, they can also use short-term memory, different in its physiological nature, to keep the record of the computation, or thinking, which leads to the solution, response, reflex, etc. Short- term memory, apart from its short-term functions, also serves as the framework for establishing long-term memories.

One of the first questions the user of a PC asks is how much RAM does a computer have? The same question was asked by the evolution in reference to the brain. Human RAM is enormous in its capacity. Some researchers estimate its size to several gigabits (Wozniak, using a simple mathematical models of learning, has also arrived to the hypothetical life-time limitation on the learning capacity of the brain; see later). However, memory is not unlimited, and a living organism cannot attempt storing all incoming information. A very substantial selection has to be made if the storage capacity is not to overflow in a life-time. It appears that the solution is forgetting. Let the brain filter the incoming messages and store as much as it is only possible in the long-term memory. Then let forgetting do the rest of the job by eliminating pieces of information in the order of least relevance. An important question that had to be answered by the evolution was in what order should pieces of information be forgotten so that to maximize the survival rate. It is obvious, at least for those who understand the concept of probability in an incompletely specified event space, that encountering an average event increases the probability of the same event being encountered again. For example, if you do not know Mr X and you meet him on the street today, the probability that you meet him again tomorrow must be considered greater than from before the first meeting. Naturally, if you meet him again, you have yet more reasons to believe in more meetings in the future. In other words, successive repetitions should have an increasing stimulatory effect on memory. Unfortunately, evolution proceeded mostly in the absence of volitional aspects of the human brain; hence, we do not have the capability to forget at will. We cannot decide to free memory by forgetting Mr X on hearing the news that he has died or moved away to the antipodes.

Spacing effect

The little problem remains of how the brain can prevent events that are not likely to be encountered in the future from being permanently transferred to memory as a result of a great number of repetitions? The answer was found in applying the so-called spacing effect, which says that the longer the interval between repetitions, the better the memory effect. This way a large number of repetitions in short intervals has very little impact on memory. Simply speaking, memory uses the spacing effect and the principle of increasing intervals to most effectively fix relevant information in the brain. Upon encountering an event it is temporarily transferred to long-term memory and forgotten in the matter of days. However, if the event is reencountered, the memory assumes increased probability of the event in the future and increases the retention period. Initially, in the retention period, memory is not sensitive to more encounters of the same event. Only at later stages does memory become sensitive again and a new encounter will act as a repetition that will increase the retention period and make memory temporarily insensitive to further encounters.

If anybody doubts the importance of the spacing effect, Wozniak proposed to consider the following example: Could the reader provide the name of the infamous lady that alleged having slept with the majority of Polish parliamentarians? If the reaction is: Sure, yes, wait a second, I am sure I remember it but … aha! then this can be taken as an example of spacing effect. Despite the fact that the lady has dominated Polish political life for a short period of time, many of us might find it hard to recall her name. The reason is simple, hundreds of repetitions concerning the name of the lady were cramped in a very short period of time. Because of the spacing effect, memory reacted to the phenomenon more like to a single repetition rather than a volley of memory stimulations. The biological value of such a property of the brain may be explained by the fact that events occurring densely in a short period of time may be unworthy of the precious memory storage. Otherwise, a great number of repetitions in a week could leave a useless memory trace for lifetime. Do we really need to remember the name of the promiscuous lady? We don’t … unless we are members of Polish parliament at breeding age, naturally. Using again the computer metaphor, the problem of choosing the least relevant pieces of knowledge in the process of forgetting is analogous to the problem of paging in virtual memory. In paging, the question is which memory blocks should be discarded to maximize the probability that the next memory reference will concern a block that is already placed in memory. Unlike in operating systems, the LRU algorithm (Least Recently Used) would not work fine for human memory. If LRU were used, first to forget would come the rusty primitives mastered in the primary school. It would be enough to use a calculator for a few months to have all the multiplication table discarded in priority behind the morning breakfast. The grandmother that has passed away a decade ago would serve as another early victim. Definitely, LRU would deprive the brain of flexibility and us … of humanity.

Human memory vs virtual memory in operating systems

The question arises immediately: If the biological optimization of the memory storage is as efficient as in the case of humans, why do not developers of operating systems assign memory attributes to blocks of memory, and use increasing-intervals combined with the spacing effect in developing, say, the next version of Windows. The key to the answer is in one major difference between the brain and the operating system: memory blocks can be reloaded from the disk in a wink which is not true with forgotten memories. You will not see a student at an exam say to the examiner: Wait a second, I have just forgotten it, and must reload it from my slow external storage. Obviously, a crib, or any kind of external reference can serve as a smart crutch for those who do not wish to burden their mind with the effort of remembering. Sadly, in the dog-eat-dog pace of our civilization, the LRU approach becomes more and more often applied in humans. Cribs, help systems and encyclopedias play a greater role than the memory training. The poor record of American graduates in verbal, analytical and logical tests as compared with Chinese, Koreans, or even students coming from Eastern Europe is a sad side effect of a dynamic capitalist economy promoting the shallow LRU education and a race to early accomplishment at any price. Does this LRU trend bode ill for SuperMemo? No, says Wozniak, individuals and governments have long realized the importance of education targeted at areas of lifelong applicability to a modern man. The pressure of the urgent is considered a negative factor not only in education. Even in business! Get into the office of a modern businessman, arguably the primary candidate for stress-related heart disorders (consequence of LRU thinking and prioritizing), and increasingly often you will find in broad display famous maxims targeted on fighting urgency. To ground the belief in the new trends even deeper, it is worth noticing that businesspeople are indeed one of the major customer groups of SuperMemo World.

Retrievability and stability of memory

We got to the point where the evolutionary interpretation of memory indicates that it works using the principles of increasing intervals and the spacing effect. Is there any proof for this model of memory apart from the evolutionary speculation? In his Doctoral Dissertation, Wozniak discussed widely molecular aspects of memory and has presented a hypothetical model of changes occurring in the synapse in the process of learning. The novel element presented in the thesis was the distinction between the stability and retrievability of memory traces. This could not be used to support the validity of SuperMemo because of the simple fact that it was SuperMemo itself that laid the groundwork for the hypothesis. However, an increasing molecular evidence seems to coincide with the stability-retrievability model providing, at the same time, support for the correctness of assumptions leading to SuperMemo. In plain terms, retrievability is a property of memory which determines the level of efficiency with which synapses can fire in response to the stimulus, and thus elicit the learned action. The lower the retrievability the less you are likely to recall the correct response to a question. On the other hand, stability reflects the history of earlier repetitions, and determines the extent of time in which memory traces can be sustained. The higher the stability of memory, the longer it will take for the retrievability to drop to the zero level, i.e. to the level where memories are permanently lost. According to Wozniak, when we learn something for the first time we experience a slight increase in the stability and retrievability in synapses involved in coding the particular stimulus-response association. In time, retrievability declines rapidly; the phenomenon equivalent to forgetting. At the same time, the stability of memory remains at the approximately same level. However, if we repeat the association before retrievability drops to zero, retrievability regains its initial value, while stability increases to a new level, substantially higher than at primary learning. Before the next repetition takes place, due to increased stability, retrievability decreases at a slower pace, and the inter-repetition interval might be much longer before forgetting takes place. Two other important properties of memory should also be noted: (1) repetitions have no power to increase the stability at times when retrievability is high (spacing effect), (2) upon forgetting, stability declines rapidly.

Molecular basis of memory

As mentioned earlier, the molecular mechanisms thought of underlying the memory have not been used as the basis to develop SuperMemo. Though the cross-inspiration was mutual, it is rather the retrievability-stability model which is likely to contribute more to understanding the molecular aspect of memory than vice versa. The correlates between the model and the findings on molecular memory might not be striking at first. After all, most of research on memory consistently focuses on the concept of the retrievability of a synaptic connection. The concept of stability is absolutely new and no mention of similar phenomena can be found in widely published research. However, both short-term memory, as well as the components of long-term memory: retrievability and stability, fit nicely into the presently investigated models of memory and learning.

Internals of SuperMemo

We have already seen that evolution speaks for SuperMemo, findings in the field of psychology coincide with the method, and that facts of molecular biology and conclusions coming from Wozniak’s model seem to go hand in hand. Here is the time to see how the described mechanisms have been put to work in the program itself. In the course of repetitions, SuperMemo plots the forgetting curve for the student and schedules the repetition at the moment where the retention, i.e. proportion of remembered knowledge, drops to a previously defined level. In other words, SuperMemo checks how much you remember after a week and if you remember less than desired it asks you to make repetitions in intervals less than one week long. Otherwise, it checks the retention after a longer period and increases the intervals accordingly. A little kink to this simple picture comes from the fact that items of different difficulty have to be repeated at different intervals, and that the intervals increase as the learning process proceeds. Moreover, the optimum inter-repetition intervals have to be known for an average individual, and these must be used before the program can collect data about the real student. There must be obviously the whole mathematical apparatus involved to put the whole machinery at work. All in all, Wozniak says that there have been at least 30 days in his life when he had an impression that the algorithms used in SuperMemo have significantly been upgraded. Each of the cases seemed to be a major breakthrough. The whole development process was just a long succession of trials and errors, testing, improving, implementing new ideas, etc. Unfortunately, those good days are over. There have not been any breakthrough improvement to the algorithm since 1991. Some comfort may come from the fact that since then the software started developing rapidly providing the user with new options and solutions. Can SuperMemo then be yet better, faster, more effective? Wozniak is pessimistic. Any further fine-tuning of the algorithms, applying artificial intelligence or neural networks would be drown in the noise of interference. After all, we do not learn in isolation from the world. When the program schedules the next repetition in 365 days, and the fact is recalled by chance at an earlier time, SuperMemo has no way of knowing about the accidental recollection and will execute the repetition at the previously planned moment. This is not optimal, but it cannot be remedied by improving the algorithm. Improving SuperMemo now is like fine tuning a radio receiver in a noisy car assembly hall. The guys at SuperMemo World are now less focused on science. In their view, after the scientific invention, the time has come for the social invention of SuperMemo.

[In 1995, one year after writing this article, a new SuperMemo algorithm has been developed that promises to still increase the speed of learning, esp. in the very first weeks of repetitions. In 1996, further improvements made the same algorithm less sensitive to breaks in learning. In late 1997, work started over neural network SuperMemo that ultimately proved a disappointment. In 2002, the algorithm extensions make it possible to execute mid-interval repetitions as an anti-interference mechanism]

Theory and practice

Using a simple mathematical model, according to Wozniak, one can easily predict how the learning process will look like in the long perspective. One of the most striking observations is, that apart from the initial period, the speed of learning does not decrease substantially in time (one would rather expect a rapid decline of the knowledge acquisition rate because of the accumulation of outstanding repetitions). Another interesting fact is that even with SuperMemo, one is not likely to master more than a several million facts and figures corresponding to SuperMemo items in lifetime. The average learning speed of an average student amounts to about 300 items/year/min (i.e. the student can memorize 300 items per year if he or she works one minute per day).

This theoretically predicted speed of learning has been confirmed by Wozniak and Gorzelaczyk more than once in small groups of subjects. A recent poll conducted by SuperMemo World among all registered users in Poland, indicates that the average speed of learning reported by registered users of SuperMemo is also close to 300 items/year/min, though individual differences have been more than substantial (from 50 to 3000 items/year/min), and extreme values had to be rejected for a more reliable picture. Simulation experiments based on Wozniak’s model of learning show that a student who stops repetitions after a 5-year-long work with SuperMemo is likely to forget 60% of the learned material in the first year after the cessation! [this figure has later been proven exaggerated] Though for shorter periods of time, this staggering figure has been confirmed in practice. At this point one might be disappointed with the volatility of knowledge gained with SuperMemo, but the above figures also confirm once again that learning without SuperMemo is no learning at all.

It might work but it cannot be that good

If one is convinced of the validity what has been said about SuperMemo until now, will he or she be already convinced that the program is a perfect cure for the ailing memory? Can it really capitalize on the properties of the nervous system and let learning proceed a dozen times faster than in standard circumstances? After all there have been generations of students trying to figure out better methods of learning, and a breakthrough comparable with what SuperMemo claims to be seems highly unlikely even to quite an open-minded observer. Wozniak discounts the low-probability argument as the viable source of skepticism, and says that he has more than once traced down evidence that SuperMemo-like approaches to learning have already been tried before with lesser or greater degree of success. Moreover, it is worth noticing that SuperMemo might not see the light were it not implemented as a computer program which can easily be transferred between individuals. In other words, it could have fallen into oblivion as the previous attempts to put order in the process of learning. One must remember that the skeletal algorithm of SuperMemo has been formulated in 1985, and only 1987 saw its very slow expansion in selected scientific circles in Pozna. Another turning- point to be kept in view is that SuperMemo World would not have been formed in 1991 were it not for the inspiring meeting of minds between Wozniak and his colleague from the university, Krzysztof Biedalak, currently SuperMemo World’s Vice-President. Both top-students at the university, Wozniak intended to study neuroscience in the US, Biedalak wanted to do the same in the field of artificial intelligence. Only by coincidence, they were both thrown into the world of entrepreneurial science. All this shows that despite the fact that the principles of SuperMemo are extremely simple and might have been invented several dozen times independently in several dozen countries of the planet, SuperMemo is not just a run-of-the-mill. The distinctive merit of SuperMemo World was to put the idea in practice, invest a great deal of man-hours in development of software, and focus on marketing the idea to the potential customer. Otherwise, SuperMemo would have for ever remained limited to the small circle of its early enthusiasts

How was SuperMemo developed?

Perhaps, while in the context of fulfilled-vs-unfulfilled inventions, it is interesting to take a short look at the entire story of SuperMemo from its very beginning. It was 1982, when 20-year-old student of molecular biology at Adam Mickiewicz University of Poznan, Piotr Wozniak, became quite frustrated with his inability to retain newly learned knowledge in his brain. This referred to the vast material of biochemistry, physiology, chemistry, and English, which one should master wishing to embark on a successful career in molecular biology. One of the major incentives to tackle the problem of forgetting in a more systematic way was a simple calculation made by Wozniak which showed him that by continuing his work on mastering English using his standard methods, he would need 120 years to acquire all the important vocabulary. This not only prompted Wozniak to work on methods of learning, but also, turned him into a determined advocate of the idea of one language for all people (bearing in mind the time and money spent by the mankind on translation and learning languages). Initially, Wozniak kept increasing piles of notes with facts and figures he would like to remember. It did not take long to discover that forgetting requires frequent repetitions and a systematic approach is needed to manage all the newly collected and memorized knowledge. Using a obvious intuition, Wozniak attempted to measure the retention of knowledge after different inter-repetition intervals, and in 1985 formulated the first outline of SuperMemo, which did not yet require a computer. By 1987, Wozniak, then a sophomore of computer science, was quite amazed with the effectiveness of his method and decided to implement it as a simple computer program. Effectiveness of the program appeared to go far beyond what he had expected. This triggered an exciting scientific exchange between Wozniak and his colleagues at the University of Technology and Adam Mickiewicz University. A dozen of students at his department took on the role of guinea pigs and memorized thousands of items providing a constant flow of data and critical feedback. Dr Gorzelaczyk from Medical Academy was helpful in formulating the molecular model of memory formation and modeling the phenomena occurring in the synapse. Dr Makalowski from the Department of Biopolymer Biochemistry contributed to the analysis of evolutionary aspects of optimization of memory (NB: he was also the one who suggested registering SuperMemo for Software for Europe). Janusz Murakowski, MSc in physics, currently enrolled in a doctoral program at the University of Delaware, helped Wozniak solving mathematical problems related to the model of intermittent learning and simulation of ionic currents during the transmission of action potential in nerve cells. A dozen of forthcoming academic teachers, with Prof. Zbigniew Kierzkowski in forefront, helped Wozniak tailoring his program of study to one goal: combining all aspects of SuperMemo in one cohesive theory that would encompass molecular, evolutionary, behavioral, psychological, and even societal aspects of SuperMemo. Wozniak who claims to have discovered at least several important and never-published properties of memory, intended to solidify his theories by getting a PhD in neuroscience in the US. Many hours of discussions with Krzysztof Biedalak, MSc in computer science, made them both choose another way: try to fulfil the vision of getting with SuperMemo to students around the world.

Epilog

When I asked Wozniak if his models like retrievability-stability model, model of optimum repetition spacing, etc. have been confirmed by other researchers in the field of memory and learning, I did not get an unambiguously affirmative answer. After all, says Wozniak, the outline of his methodology employed in SuperMemo has only been published in a worldwide scientific journal only in 1992 (Acta Neurobiologiae Experimentalis), and all his basic findings build up on the model of optimum repetition spacing. Why did he wait so long with publishing the theory in a respectable journal? According to Wozniak, as early as in 1990, he first attempted to publish the results of his early experiments on repetition spacing in several journals, including the most renowned Memory and Cognition. However, the editors, though expressly praising the novel approach to studying memory, constantly tossed him from journal to journal claiming that his paper did not comply exactly with their target profile. Those involved in psychology complained about the intrusion of convoluted computer algorithms, while those closer to mathematics did not want to see their journals review recent literature on spacing effect in memory. Both these components, and more, however, are central to SuperMemo. A great deal of scepticism has also been generated by the regularity of the findings. Wozniak says: the experimental data looked too good to be true; more like they were cooked for the paper.

All in all, one can either trust Wozniak and try SuperMemo, or wait months or years before its true scientific recognition. In the meantime, the marketing team of SuperMemo World is beaming with optimism: it’s enough to ask users of SuperMemo, forty thousands of them in Poland alone, how the method fares in their educational pursuits. The general opinion is more than enthusiastic. SuperMemo simply works and we do not need to prove it to our customers.

In questionnaires received at SuperMemo World, when asked what they like most in the program, users of SuperMemo overwhelmingly indicate its effectiveness. The software may be OK, but what really counts is results in learning. How about dislikes? Users are not pleased with this or that, most often with the fact that, even in Poland, SuperMemo is always released first in English. But there is no particular turn-off that takes precedence. Definitely, nobody questions the fact that with SuperMemo one can learn faster and never to worry about forgetting. Taking this rosy picture into heart, one might wonder why has SuperMemo not yet sold in millions of copies worldwide. Marczello Georgiew, Marketing Director at SuperMemo World proposed to recall the problems Graham Bell experienced when trying to introduce his funny machine for talking over a wire or how pessimistic the predictions of industry futurologists were about the expansion of the air-polluting mechanical horse. Then he adds confidently: It took Wozniak 10 years to turn necessity into invention, give us half this time, and we will turn his invention into a global necessity