| Repetition spacing algorithm used in SuperMemo 2002 through SuperMemo 2006 | Dr P.A. Wozniak, April 2, 2002 (updated Jan 2005) |
Below you will find a general outline of the seventh major formulation of the repetition spacing algorithm used in SuperMemo. It is referred to as Algorithm SM-11 since it was first implemented in SuperMemo 11.0 (SuperMemo 2002). Although the increase in complexity of Algorithm SM-11 as compared with its predecessors (e.g. Algorithm SM-6) is incomparably greater than the expected benefit for the user, there is a substantial theoretical and practical evidence that the increase in the speed of learning resulting from the upgrade may fall into the range from 30 to 50%. In addition, Algorithm SM-11 eliminates problems with repetition delay and repetition advancement evident in Algorithm SM-8.
| Historic note: earlier releases of the algorithm Although the presented algorithm may seem complex, you should find it easier to follow its steps once you understand the evolution of individual concepts such as E-Factor, matrix of optimum intervals, optimum factors, forgetting curves, or mid-interval repetition:
|
SuperMemo begins the effort to compute the optimum inter-repetition intervals by storing the recall record of individual items (i.e. grades scored in learning). This record is used to estimate the current strength of a given memory trace, and the difficulty of the underlying piece of knowledge (item). The item difficulty expresses the complexity of memories, and reflects the effort needed to produce unambiguous and stable memory traces. SuperMemo takes the requested recall rate as the optimization criterion (e.g. 95%), and computes the intervals that satisfy this criterion. The function of optimum intervals is represented in a matrix form (OF matrix) and is subject to modification based on the results of the learning process. Although satisfying the optimization criterion is relatively easy, the complexity of the algorithm derives from the need to obtain maximum speed of convergence possible in the light of the known memory models.
Important! Algorithm SM-11 is used only to compute the intervals between repetitions of items. Topics are reviewed at intervals computed with an entirely different algorithm (not described here).
This is a more detailed description of the Algorithm SM-11:
I(1)=OF[1,L+1]
I(n)=I(n-1)*OF[n,AF]
where:
- OF - matrix of optimal factors, which is modified in the course of repetitions
- OF[1,L+1] - value of the OF matrix entry taken from the first row and the L+1 column
- OF[n,AF] - value of the OF matrix entry that corresponds with the n-th repetition, and with item difficulty AF
- L - number of times a given item has been forgotten (from "memory Lapses")
- AF - number that reflects absolute difficulty of a given item (from "Absolute difficulty Factor")
- I(n) - n-th inter-repetition interval for a given item
dOF=dOFmax*a/(thalf+a)
dOFmax=(OF-1)*(OI+thalf-1)/(OI-1)
where:
- dOF - decrement to OF resulting from the spacing effect
- a - advancement of the repetition in days as compared with the optimum schedule (note that there is no change to OF if a=0, i.e. the repetition takes time at optimum time)
- dOFmax - asymptotic limit on dOF for infinite a (note that for a=OI-1 the decrement will be OF-1 which corresponds to no increase in inter-repetition interval)
- thalf - advancement at which there is half the expected increase to synaptic strength as a result of a repetition (presently this value corresponds roughly to 60% of the length of the optimum interval for well-structured material)
- OF - optimum factor (i.e. OF[n,AF] for the n-th interval and a given value of AF)
- OI - optimum interval (as derived from the matrix OF)


To sum it up. Repetitions result in computing a set of parameters characterizing the memory of the student: RF matrix, G-AF graph, and FI-G graph. They are also used to compute A-Factors of individual items that characterize the difficulty of the learned material. The RF matrix is smoothed up to produce the OF matrix, which in turn is used in computing the optimum inter-repetition interval for items of different difficulty (A-Factor) and different number of repetitions (or memory lapses in the case of the first repetition). Initially, all students memory parameters are taken as for a less-than-average student (less-than average yields faster convergence than average or more-than-average), while all A-Factors are assumed to be equal (unknown).
Optimization solutions used in Algorithm SM-11 have been perfected over 17 years of using the SuperMemo method with computer-based algorithms (first implementation: December 1987). This makes sure that the convergence of the starting memory parameters with the actual parameters of the student proceeds in a very short time. In addition, Algorithm SM-11 includes mechanisms that make it insensitive to interference from the deviation from the optimum repetition timing (e.g. delayed or advanced repetitions). It can also accept uninitialized learning process imported from other applications (e.g. other SuperMemos). The introduction of A-Factors and the use of the G-AF graph greatly enhanced the speed of estimating item difficulty. The adopted solutions are the result of constant research into new algorithmic variants. The theoretical considerations, as well as the research into the employment of neural networks in repetition spacing indicates it is not likely neural networks could effectively compete with the presented algebraic solution (see FAQ below).
Algorithm SM-11 is constantly being perfected in successive releases of SuperMemo, esp. to account for newly collected repetition data, convergence data, input parameters, instabilities, etc.
Frequently Asked Questions
How does SuperMemo compute the optimum increase in intervals?
SuperMemo algorithm can be made better
Will SuperMemo use neural networks?
Why don't you continue your experiment with neural networks?
Forgetting curve for ill-formulated items is flattened
Flatter forgetting curve does not increase optimum interval
E-Factors and A-Factors
A-Factors can drop even if grades are good
When inspecting optimization matrices, use repetition category, not repetition
number
RF stands for "retention factor"
When inspecting
optimization matrices, use repetition category, not repetition number
(Steven
Trezise, USA, Apr 20, 1999)
Question:
In my collection, I have items for which I have done between 1 and 8 repetitions.
However, when I look at the Cases matrix, there are no entries beyond repetition 3
Answer:
The algorithm used by SuperMemo
updates all optimization matrices using repetition categories, not the actual repetition
number (you can view the optimization matrices with Tools
: Statistics : Analysis : Matrices). A
repetition category is an expected number of repetitions needed to reach the currently
used interval. Once the matrices change, the estimation of repetition category may change
too. If, for example, you score well in repetitions and your intervals become longer, it
will take fewer repetitions to get to a given interval. In such a case, you might be at
8-th repetition while your repetition category will be 3. All matrices such as OF matrix,
RF matrix, etc. will be updated in the third row (not in the 8-th row)
A-Factors can
drop even if grades are good
(Dr.
Prateek Mishra, India, Oct 5, 2000)
Question:
I noticed that A-Factor may decrease even if the grade is Bright.
Is this correct?
Answer:
Usually, the better the grade the greater the increase in A-Factor. However,
the new value of A-Factor for Repetition>1 is computed by trying to find this
column in OF matrix which provides the best fit for the estimated value of the
optimal interval for the just executed repetition. In other words, the algorithm
checks how well the item fared after a given interval, make a guess as to the
optimal interval for similar situations in the future, and then, using the same
information makes a guess as to the true difficulty of the just repeated item.
If your item was considered "A-Factor(i) difficult" on Day(i)
and the values in the OF matrix increased substantially before the Day(i+1) on
which the next repetition takes place, A-Factor(j+1) may appear lower than A-Factor(i)
even if you scored Bright (which reflects the fact that the used interval
could even be longer). Simply, the relative valuation of difficulty will
have changed during the inter-repetition interval. It is worth noting that each
time you worry about the accuracy of the SuperMemo algorithm, you should simply
look at the measured forgetting index in the statistics
(Measured FI). This value should equal to your requested
forgetting index. Usually it is slightly higher due to minor inaccuracies of
the algorithm. However, the main contributor to an overly high measured
forgetting index is delay in making repetitions (the picture illustrates slow
7-months-long recovery of the measured forgetting index from a one-time use of Tools
: Mercy in SuperMemo)
To better understand the relationship between the new value of A-Factor and the estimated forgetting index, see this function taken from SuperMemo source code (simplified for clarity; valid only for Repetition>1):
function GetNewAFactorFromEstimatedFI(IOD:PItemOptimizationData):real;
var col:byte;
OptimumInterval,OFactor:real;
begin
with IOD^ do begin
OptimumInterval:=(UsedInterval*ln(0.9)/ln(1-EstimatedFI/100));
{Optimum
interval computed from EstimatedFI and adjusted to FI=10%}
OFactor:=OptimumInterval/UsedInterval*UFactor;
for col:=20
downto 1 do {scan columns of the OF
matrix}
if OFactor>EstimatedOFactor(col,RepetitionsCategory) then
break;
end;
Result:=AFactorFromOFColumn(col); {new
value of A-Factor}
end;
RF stands for "retention factor"
(Malcolm Macgregor, Saturday, September 21, 2002 12:32 AM)
Question:
What does RF stand for?
Answer:
RF stands for "retention factor". In the ideal world, RF
would be the same as OF. However, in the real world, RFs reflect the noise of the stochastic nature of forgetting. In other worlds, when the sample of measured repetitions is small, RF may deviate from OF. RFs are computed from forgetting curves plotted in the course of repetitions. For a large number of repetitions, when the forgetting curve starts resembling an negatively exponential curve, RFs will approach OFs in value. However, as this approximation process may take months or years, the matrix of OF is derived by smoothing the matrix of
RF. Some entries of the RF matrix are never filled out with real data values. For example, we never get to the 10th repetition of very easy material. We simply die before the chance to collect data occurs. This is why the RF matrix is composed of entries based on "intense research", "rough approximation" as well as "educated guess". The formulas to compute the OF matrix try to do the best fit of the theoretical OF matrix to this miscellany of RF data
Forgetting curve for ill-formulated items is flattened
(anonymous, Wednesday, July 25, 2001 2:54 PM)
Question:
I had a 5-months break in using SuperMemo. I resumed my repetitions and noticed that I still remembered many items. Initially, SuperMemo asked me to repeat difficult items (e.g. enumerations). To my surprise, I remembered many of these items. SuperMemo required a 15 days interval, while I made my repetitions after 150 days and still succeeded. I no longer believe in the optimality of the SuperMemo algorithm. Probably it is 10 times worse than optimal
Answer:
Your results are in full compliance with theoretical expectations. It is no surprise that SuperMemo initially tossed the most difficult items at you, and it is no surprise that you showed remarkable recall on these items. Those items clearly belong to those that have not been formulated in compliance with representation rules (e.g. enumerations are notoriously difficult). If you imagine memories as sets of apples (you can see an apple as a single synapse in the brain), good memories are like small collections of well-polished apples. Bad memories (e.g. enumerations, complex items, ambiguous items, etc.) are like large collections of apples of which few are spoilt. Naturally, spoilt apples rot fast and make recall difficult. After just 15 days, all spoilt apples might have been rotten. During the remaining 150 days, the remaining apples rot very slowly. Hence the typical departure of wrongly formulated items from the shape of the classical forgetting curve. For bad items, the curve is flattened (as an expected superposition of several Ebbinghausian curves). SuperMemo blindly obeys your recall criteria. If it takes 15 days to ensure 98% recall, SuperMemo will take no consideration of the fact that at 150 days you may still show 95% recall. This is why SuperMemo 2000 includes leech management. It makes it easy to identify bad items and use auto-postpone or auto-forget options. Auto-postpone will do exactly what you expect, i.e. delay bad items with little impact on overall retention. Auto-forget will help you rebuild memories from scratch. Occasionally, the newly established memory representation will click and your recall will improve. Naturally, the best method against bad items is the use of appropriate representation (see:
20 rules of formulating knowledge for
learning). Interestingly, SuperMemo can never predict the moment of forgetting of a single item. Forgetting is a stochastic process and can only operate on averages. A frequently propagated fallacy about SuperMemo is that it predicts the exact moment of forgetting: this is not true, and this is not possible. What SuperMemo does is
a search for intervals at which items of given difficulty are likely to show a given probability of forgetting (e.g. 5%). If you look for a numerical measure of the algorithm's
inaccuracy, instead of comparing intervals you should rather compare retention levels as the retention is the optimization
criterion here. Even for a pure negatively exponential forgetting curve, a 10-fold deviation in interval estimation will result in
R2=exp(10*ln(R1)) difference in retention. This is equivalent to a drop from 98% to 81%. For a flattened forgetting curve typical of badly-formulated items, this drop may be as little as 98%->95%.
For more on non-exponential forgetting curves see: Building
memory stability
Flatter forgetting curve does not increase optimum interval
(Tomasz P. Szynalski, Saturday, August 04, 2001 5:53 AM)
Question:
If the forgetting curve is flatter for difficult items, I will remember them for a longer time, right? Does that suggest that ill-formulated items are remembered better?
Answer:
No. Flattened forgetting curve will increase retention measurements in intervals that are a multiple of the optimum interval as compared with the typical negatively exponential curve for well-structured material. However, the optimum intervals for ill-formulated items will
expectedly be shorter as can be observed on the first interval graph in Tools
: Statistics : Analysis : Graphs : First Interval.
The smoothness of this graph depends on the number of repetitions recorded.
In the picture below, over 90,000 repetitions have been recorded

For more on non-exponential forgetting curves see: Building memory stability
Will SuperMemo use neural networks?
(Dawid Calinski, Fri, Apr 06, 2001 15:56)
Question:
I am curious when neural networks will have application in
SuperMemo. Although speed of learning with algorithm used in SM2000 is really great, it could be even faster with neural networks
Answer:
In other words, neural networks could be used to compute the intervals, but they do not seem to be the best tool in terms of computing power, research value, stability, and, most of all, the speed of convergence. When designing an optimum neural network, we run into similar difficulties as in designing the algebraic optimization procedure. In the end, whatever boundary conditions are set in "classic" SuperMemo, are likely to appear, sooner or later, in the network design (as can be seen in: Neural Network SuperMemo).
As with all function approximations, the choice of the tool and minor algorithmic adjustments can make a world of difference in the speed of convergence and the accuracy of mapping. Neural networks could find use in mapping the lesser known accessory functions that are used to speed up the convergence of the algebraic algorithm (e.g. user-dependent grade correlations that speed up the approximation of A-Factors, etc.).
E-Factors and A-Factors
(Malcolm Macgregor, Saturday, September 21, 2002 12:32 AM)
Question:
You say "Unlike E-Factors... A-Factors express absolute item difficulty and do not depend on the difficulty of other items in the same collection". From my reading on E-factors they seem to express the easiness in learning one item and I cannot see how they depend explicitly on the difficulty of other
items
Answer:
Originally, E-Factors were defined in the same way as O-Factors (i.e. the ratio of successive intervals). As such, they were an "objective" measure of item difficulty (the higher the E-Factor, the easier the item). However, starting with SuperMemo 4.0, E-Factors were used to index the matrix of O-Factors. They were still used to reflect item difficulty. They were still used to compute O-Factors. However, they could differ from O-Factors. In addition, difficulty of material in a given collection would shape the relationship between O-Factors and E-Factors. For example, in an easy collection, the starting-point O-Factor (i.e. first repetition of the assumed starting difficulty) would be relatively high. As performance in repetitions determines E-Factors, items of the same difficulty in an easy collection would naturally have a lower E-Factor than the exactly same items in a difficult collection. This all changed in SuperMemo 8 where A-Factors where introduced. A-Factors are "bound" to the second row of the O-Factor matrix. This makes them an absolute measure of item difficulty. Their value does not depend on the content of the collection. For example, you know that if A-Factor is 1.5, the third repetition will take place in an interval that is 50% longer than the first
interval
SuperMemo algorithm can be made better
(Dawid Calinski, dec 10, 2004, 01:51:18)
Question:
I bet SuperMemo algorithm is very good at scheduling reviews. But I also bet is now very complicated and
therefore you have a limited capability of developing it further
Answer:
Further improvements to the algorithm used in SuperMemo are not likely to result in further acceleration of learning. However, there is still scope for improvement for handling unusual cases such as dramatically delayed repetitions, massed presentation, handling items whose contents changed, handling semantic connections between items, etc.
Interestingly, the greatest progress in the algorithm is likely to come from a better definition of the model of human long-term
memory. In particular, the function describing changes in memory stability for different levels of retrievability is becoming better understood. This could dramatically simplify the algorithm. Simpler models require fewer variables and this simplifies the optimization.
The algorithm based on stability and retrievability of memory traces could also result in better handling of items with low retrievability.
However, as unusual item cases in the learning process form a minority, and testing a new algorithm would take several years, it is not clear if such an implementation will ever be
undertaken
Why don't you continue your experiment with neural networks?
(Bartosz, Poland, Nov 01, 2006, 14:08:40)
Question:
Why don't you continue your experiment with neural networks? I agree with MemAid
in that your models might be wrong, and a neural network can find the real truth
about how memory works? Neural networks are unprejudiced
Answer:
It is not true that SuperMemo is prejudiced while a neural network is not.
Nothing prevents the optimization matrices in SuperMemo to depart from the
memory model and produce an unexpected result. It is true, that over years, with
more and more knowledge of how memory works, the algorithm used in SuperMemo has
been armed with restrictions and customized sub-algorithms. None of these were a
result of a wild guess though. The progression of "prejudice" in
SuperMemo algorithms is only a reflection of findings from previous years. The
same would inevitably affect any neural network implementation if it wanted to
maximize its performance.
It is also not true that the original pre-set values of optimization matrices in SuperMemo are a form of prejudice. These are an equivalent of pre-training in a neural network. A neural net that has not be pre-trained will also be slower to converge to the optimum model. This is why SuperMemo is "pre-trained" with the model of an average student.
Finally, there is another area where neural networks must either use the existing knowledge of memory models (i.e. carry a dose of prejudice) or lose out on efficiency. The experimental neural network SuperMemo, MemAid, as well as FullRecall have all exhibited an inherent weakness. The network achieves the stability when the intervals produce a desired effect (e.g. specific level of the measured forgetting index). Each time the network departs from the optimum model it is fed with a heuristic guess on the value of the optimum interval depending on the grade scored during repetitions (e.g. grade=5 would correspond with 130% of the optimum interval in SuperMemo NN or 120% in MemAid). Algebraic SuperMemo, on the other hand, can compute an accurate value of A-Factor, use the accurate retention measurement, and produce an accurate adjustment of the value of the OF matrix. In other words, it does not guess on the optimal interval. It computes its exact value for that particular repetition. The adjustments to the OF matrix are weighted and produce a stable non-oscillating convergence. In other words, it is the memory model that makes it possible to eliminate the guess factor. With that respect, algebraic SuperMemo is less prejudiced than the neural network SuperMemo.
Neural network SuperMemo was a student project with a sole intent to verify the viability of neural networks in spaced repetition. Needless to say, neural networks are a viable tool. Moreover, all imaginable valid optimization tools, given sufficient refinement, are bound to produce similar results to those currently accomplished by SuperMemo. In other words, as long as the learning program is able to quickly converge to the optimum model and produce the desired level of knowledge retention, the optimization tool used to accomplish the goal is of secondary importance
How does SuperMemo compute the optimum increase in intervals? (#3078)
(Phil P., Jul 05, 2007, 15:15:20)
Question:
I am reading your article about the
two-components of long term
memory. I understand that your learning software SuperMemo uses the described model to optimize learning. I am curious how SuperMemo computes the constant C2 from the Equation 2
(i.e. how does it know how much intervals should increase)?
Answer:
Equation 2 is only a vastly simplified reflection of
functions actually used in SuperMemo. The simplification was needed to prove the
existence of two independent variables required to describe the status of
simple memories: stability and retrievability.
C2 tells you how much inter-repetition intervals need to increase in learning to meet your criteria on admissible level of forgetting. In reality, C2 is not a constant. It depends on a number of factors. Of these, the most important are:
Due to those multiple dependencies, the precise value of C2 is not easily predictable. SuperMemo solves this and similar optimization problems by using multidimensional matrices to represent multi-argument functions and adjusting the value of those matrices on the basis of measurements made during an actual learning process. The initial values of those matrices are derived from a theoretical model or from previous measurements, the actually used values will, over time, differ slightly from those theoretically predicted or those derived from data of previous students.
For example, if the value of C2 for a given item of a given difficulty with a given memory status produces an inter-repetition interval that is longer than desired (i.e. producing lower than desired level of recall), the value of C2 is reduced accordingly.
Historically, C2 can be found in nearly all algorithms used in SuperMemo, often under different names. This review may help you understand how SuperMemo computes C2 today and how it is likely to compute it in the future:
in the paper-and-pencil version of SuperMemo (1985), C2 was indeed (almost) a constant. Set at the average of 1.75 (varying from 1.5 to 2.0 for rounding errors and simplicity), it did not consider material difficulty, stability or retrievability of memories, etc.
in early versions of SuperMemo for DOS (1987), C2, named E-Factor, reflected item difficulty for the first time. It was decreased for bad grades and increased for good grades
SuperMemo 4 (1989) did not use C2, but, to compute inter-repetition intervals, it employed optimization matrices for the first time
in SuperMemo 5 (1990), C2, named O-Factor was finally represented as a matrix and it included both the difficulty dimension as well as the stability dimension. Again, entries of the matrix would be subject to the measure-verify-correct cycle that would, starting with the initial value based on prior measurements, produce a convergence towards the value that would satisfy the learning criteria
in SuperMemo 6 (1991), C2, in the form of the O-Factor matrix would be derived from a three-dimensional matrix that would include the retrievability dimension. The important implication of the third dimension was that, for the first time, SuperMemo would make it possible to inspect forgetting curves for different levels of difficulty and memory stability
in SuperMemo 8 (1997) through SuperMemo 2006, the representation of C2 would not change much, however, the algorithm used to produce a quick and stable transition from the theoretical to the real set of data would gradually get more and more complex. Most importantly, new SuperMemos make a better use of the retrievability dimension of C2. Thus, independent of the spacing effect, the student can depart from the initial learning criteria, e.g. to cram before an exam, without introducing noise into the optimization procedure
future SuperMemos, may or may not use simplified algorithms based on algebraic formulation of C2 (denoted as SInc, for stability increase, in Equation 7.1). Although the new formula offers simplicity, it holds no promise as to the accuracy of the determination of C2 and will have required substantial investment in testing and verifying the algorithms that might be based on it
Notes for users of SuperMemo for Windows:
Stability dimension of C2 can be inspected visually with Tools : Statistics : Analysis : Approximations
Two dimensional C2 can be viewed in 3-D with: Tools : Statistics : Analysis : 3-D Graphs
Retrievability dimension of C2 can be seen in several figures in Building memory stability through rehearsal. The retrievability dimension used in computing C2 for the requested forgetting index can be viewed with Tools : Statistics : Analysis : Forgetting curves