Implementing the repetition spacing
neural network |
Bartosz Dreger Piotr Wozniak May 29, 1998 |

See Neural Network SuperMemo for a brief introduction to repetition spacing neural network |

**Basic assumption**

The state of memory will be described with only two variables: retrievability (R) and stability (S) (Wozniak, Gorzelanczyk, Murakowski, 1995). The following equation relates R and S:

(1) R=e^{-k/S*t}

where:

- k is a constant
- t is time

For simplicity, we will set k=1 to univocally define stability.

**Input and output**

The following functions are to be determined by the network:

(2) S_{i+1}=f_{s}(R, S_{i}, D, G)

(3) D_{i+1}=f_{d}(R, S, D_{i}, G)

The neural network is supposed to generate stability (S) and item difficulty (D) on the output given R, S, D and G on the input:

(4) (R_{i}, S_{i}, D_{i}, G_{i})
=> (D_{i+1,}S_{i+1})

where:

- R
_{i}is retrievability before the i-th repetition - S
_{i }is stability before the i-th repetition - S
_{i+1 }is stability after the i-th repetition - D
_{i}is item difficulty before the i-th repetition - D
_{i+1}is item difficulty after the i-th repetition - G
_{i}is grade given in the i-th repetition

**Error
correction for difficulty D**

Target difficulty will be defined as in Algorithm SM-8 as the ratio between second and first intervals. The neural network plug-in (NN.DLL) will record this value for all individual items and use it in training the network:

(5) D_{o}=I_{2}/I_{1}

where:

- D
_{o }is guiding difficulty used in error correction (the higher the D_{o}, the less the difficulty) - I
_{1}is the first optimum interval computed for the item in question (same for all items) - I
_{2}is the second optimum interval computed for the item

Important! The optimum intervals I_{1}
and I_{2 }are not the ones proposed by the network before
its verification but the ones used in error correction after the
proposed interval had already been executed and verified (see error
correction for stability S)!

The initial value of difficulty
will be set to 3.5, i.e. D_{1}=3.5. This is for
similarity with Algorithm SM8 only. As initial difficulty is not
known, it cannot be used to determine the first interval. After
scoring the first grade the error correction is still impossible
due to the fact that second optimum interval is not known. Once
it is known, D_{o }can be used for error correction of D
on the output.

To avoid convergence problems in the network, the following
formula will be used to determine the correct output on D:

(6) D_{opt}=0.9*D_{i}+0.1*D_{o}

where:

- D
_{opt }is difficulty used in error correction after the i-th repetition - D
_{i}is difficulty before the i-th repetition - D
_{o}is guiding difficulty from Eqn (5)

The convergence factor of 0.9 in Eqn (6) is arbitrary and may change depending on the network performance.

**Error
correction for stability S**

The following formula, derived from Eqn (1) for forgetting index equal 10% and k=1, makes it easy to convert stability and the optimum interval: I=-ln(0.9)*S

In the optimum case the network should generate the requested forgetting index for each repetition. Variable forgetting index can easily be used once the stability S is known (see Eqn (1)). For simplicity then we will use forgetting index equal 10% in further analysis.

To accelerate the convergence, the
network will measure forgetting index for 25 classes of
repetitions. These classes are set by (1) five difficulty
categories: 1-1.5, 1.5-2.5, 2.5-3.5, 3.5-5, and over 5, and (2)
five interval categories: 1-5, 5-20, 20-100, 100-500 and over 500
days. We will denote the forgetting index measurements for these
categories as FI(Dm,In). Additionally, the overall forgetting
index FI_{tot }will be measured and used in stability
error correction.

The ultimate goal is to reach the forgetting index of 10% in all categories. The following formula will be used in error correction for stability:

(7) FI_{opt(m,n)}=(10*FI_{tot}+Cases(m,n)*FI(m,n))/(10+Cases(m,n))

where:

- FI
_{opt(m,n)}is forgetting index used in error correction after a repetition belonging to category (m,n) - FI
_{tot}is the overall forgetting index measured in repetitions - Cases(m,n) is the number of repetition cases used to measure the forgetting index in category (m,n)

The formula in Eqn (7) is supposed
to shift the weight on error correction from the overall
forgetting index to forgetting index recorded in given categories
as soon as the number of cases in individual categories
increases. Obviously, for Cases(m,n)=0, we have FI_{opt(m,n)}=FI_{tot}.
For Cases(m,n)=10 the weights for overall and category FI
balance, and for a large number of cases, FI_{opt(m,n)}
is approaching FI(m,n).

The following table illustrates
the assumed relationship between FI_{opt(m,n)}, grades
and the interval correction applied:

Grade |
0 |
1 |
2 |
3 |
4 |
5 |

FI |
40% |
60% |
80% |
no correction |
no correction |
no correction |

FI |
no correction |
no correction |
no correction |
no correction |
no correction |
no correction |

FI |
no correction |
no correction |
no correction |
110% |
120% |
130% |

In SuperMemo, grades less than 3 are interpreted as forgetting,
while grades equal 3 or more are understood as sufficient recall.
That is why no correction is used for passing grades in case of
satisfactory FI, and no correction is used for failing grades if
FI is greater than requested.

An exemplary correction for an excessive forgetting rate and
grade=2 for applied interval of 10 days would be 80%.
Consequently, the network will be instructed to assume Interval=8
as correct. Correct stability would then be derived from
S=-8/ln(0.9) and used in error correction.

The values of interval corrections are arbitrary but shall not
undermine the convergence of the network. In case of unlikely
stability problems, the corrections might be reduced (note that
the environmental noise in the learning process will dramatically
exceed the impact of ineffectively choosing the correction
factors!). Similar corrections used to be applied in successive
SuperMemo algorithms with encouraging results.

**Border conditions**

The following additional constraints will be imposed on the neural network to accelerate the convergence:

- interval increase in two successive repetition must be at least 1.1 (consequently, difficulty cannot be less than 1.1)
- interval increase cannot surpass 8 after the first repetition, and 4 in later repetitions
- the first interval must fall between 1 and 40 days
- difficulty measure cannot exceed 8

These conditions will not prejudice the network as they have been proven beyond reasonable doubt as true in the practice of using SuperMemo and its implementations over the last ten years.

In the pretraining stage, the following form of Eqns (2) and (3) will be used:

(8) D_{i+1}:=D_{i}+(0.1-(5-G)*(0.08+(5-G)*0.02))

(9) S_{i+1}:=S_{i*}D_{i}*(0.5+1/i)

With D_{1}=3.5 and S_{1}=-3/ln(0.9).

Eqn (8) has been derived from Algorithm SM-2 (see E-Factor equation).

Eqn (9) has been roughly derived from Matrix OF in Algorithm SM-8.

D_{1}=3.5 corresponds with the same setting in Algorithm
SM-8.

S_{1}=-3/ln(0.9) corresponds with the first interval of 3
days and forgetting index 10%. The value of 3 days is close to an
average across a wide spectrum of students and difficulty of the
learning material.

Pretraining will also use border conditions mentioned in the
previous paragraph.