In this text we would like to show the need for developing
knowledge access systems that would account for the imperfectness of human perception,
information processing and memory. The implementation of such systems will result in
enormous savings in the process of learning at all three stages of knowledge acquisition
by the human mind:
knowledge access (i.e. searching for, selecting and
prioritizing information for later review)
learning (i.e. formulating knowledge and transferring it to
knowledge retention (i.e. retaining knowledge in memory by
means of spaced repetition)
In particular, we will try to stress the importance of
repetition spacing algorithms, as well as the importance of the application of the newly
introduced concept of processing, semantic and ordinal attributes in hypertext documents.
Fusion of the hypertext paradigm with techniques targeted
against human forgetfulness
Historically, the development of repetition spacing
algorithms proceeded from common sense paper-and- pencil applications to increasingly
sophisticated computer algorithms that have finally been implemented in commercial
products that have gained substantial popularity among students of languages, medicine,
and many more.
This development process was almost entirely orientated
towards the retention of the acquired knowledge in the students memory. Currently,
there is a possibility of a similar development process being initiated in reference to
retrieval and acquisition of knowledge.
Effective learning is based not only on being able to
retain the learned material in ones memory. Before that, the to-be-learned knowledge
must be identified, pre-processed with a view to understanding, classified and selected
with respect to its relevance and importance. This process can greatly be enhanced by
means of simple techniques, which make an excellent material for computer implementation.
This implementation is more and more urgent with the
diminishing role played by printed materials in the wake of an increasing role of World
Wide Web and the vast market for CD-ROM title releases across the board of all possible
subject domains. The straightforward use of a pencil, that is often instrumental in
ones work with the printed matter, becomes increasingly impossible with more and
more multimedia titles appearing on CD-ROM and with a rapid growth of hypermedia available
via global computer networks. Some visionaries are even predicting the death of the
printed matter as we know it. The gap between the effectiveness of browsing printed vs
hypertext documents seems to grow by the minute, though still very little attention is
paid to the readers or users ability to leave the trace of his work in the
document. Most of hypertext systems distributed on CD-ROM provide the user only with
annotation and bookmark tools, which leave much room for improvement.
Let us shortly present exemplary tools and techniques that
can be used in working with printed textbooks, and what inspiration this might provide for
the design of future hypermedia documents.
The first problem with books to read is that there are
usually too many of them. A good selection of the most applicable material is the first
step to effective acquisition of knowledge. This subject, however, we will leave out from
the consideration. This is because we would like to entirely focus on the authoring
systems for development of hypertext documents, as well as the tools that would enhance
such documents, and made them more attractive from the students standpoint. The new
technologies, most notably CD-ROM, will make the authors choice easier in this
sense, that the vast capacity of the media will leave less stringent constrain on what not
to include in the final shape of the document. When we extend it to World Wide Web, the
question becomes irrelevant. With appropriate navigation and search tools, the hyperspace
might virtually remain unlimited.
After selecting the learning material, the important tool to
use is a bookmark. Apart from reference materials like encyclopedias, dictionaries,
computer documentation, etc. most of the printed material provides the possibility and
often requires a substantial dose of linear progress through the contents. As time slices
allocated for reading, often break ones work in the middle of a linear section,
bookmarks are of indispensable value. With the advent of hypertext applications, the
average length of a linearly processed text is likely to drop dramatically. However,
bookmarks do not only serve as pointers to interrupted reading, but also provide the means
of a customizable table of contents, which can be used for quickly accessing sections
which are of the greatest interest. Bookmarks have been an early and ubiquitous child of
hypertext documents; therefore, we will also not consider them in the reasoning that
After picking a book, and selecting the relevant bookmark,
the process of reading or browsing begins. First of all, the same bookmark that was used
in accessing a particular chapter or section, may serve as the pointer that helps keeping
the sight focused on the processed paragraph. This is particularly useful in richly
illustrated texts, or at moments when external interruptions require frequent shifting the
sight beyond the reading area. In a hypertext document, the counterpart of a paper
bookmark used in reading a textbook, should be a cursor that highlights a single semantic
unit that is currently being processed. The importance of such a cursor may go far beyond
the sight guidance of a traditional bookmark. Such a cursor will later on be called a
semantic focus. It is not difficult to notice that modern textbooks go further and further
into making particular semantic units of the text less context dependent. In other words,
by picking up a randomly selected sentence from a modern textbook, we are more likely to
understand it that it would be possible in the textbooks written in the style from a few
decades ago. The general trend is to shift from prose to more precise forms of
expressions. This will be most visibly seen through proliferation of illustrations,
formulas, insert boxes, enumerations, underlined text, etc. This trend comes from the
increasing tendency to convert linear textbooks to picknread reference
materials. This makes the job of a hypertext document author much less of a trouble. This
will also make semantic units live life of their own, with the benefit for knowledge
retrieval and acquisition.
The most important part of a good textbook processing
technique is to leave traces of ones work in the text. After all, let the book
itself learn about what the readers progress is, and not keep the entire burden in
that reference on readers memory. First of all, it is useful to prepare a page chart
for every carefully studied book. The page chart keeps the record of each page processed,
and the current processing status. The processing status may assume at least the three
intact - the page has not yet been
processed - the page has been read at least
once and all its semantic units have been reviewed and marked with processing attributes,
which, very much like in page charts, indicate the processing status (e.g. irrelevant,
important, memorized, etc.)
done - the page has been fully processed,
and needs no further reference. For example, all its semantic units have been marked as
irrelevant, or all its relevant semantic units have been memorized
In some cases, it may also be worthwhile to separate a few
degrees of the attribute processed (or read). After all, the page might have been read
once, twice, or several times, with all its semantic units changing the processing
attributes during each passage.
The rationale behind page charts is to have a constant opportunity to control the speed
and direction of processing a particular textbook; the greatest advantages being: (1) no
need to refer to fully processed pages marked with done, and (2) giving priority to new
material (intact) as opposed to the material that has already been, at least partly,
As mentioned earlier, all semantic units are marked with
processing attributes during the progress of reading. These are:
irrelevant - the semantic unit is not worth
relevant - the semantic unit is worth
future reference (which may change its attribute to irrelevant, to-be-memorized or
to-be-memorized - the semantic unit seems
worth remembering, and should be put to a database with the learned material associated
with the currently processed book. The process of actually transferring the unit to a
database will take place as soon as processing the book moves to more advanced stages. On
occasion, this may happen much earlier or never.
memorized - the semantic unit has been
transferred to a database with the learned material, memorized and subjected to a
repetition spacing algorithm. In other words, it needs no future reference.
The obvious rationale behind marking semantic units with
processing attributes is never to refer to irrelevant or memorized units, to focus the
reading attention on relevant units, and to use to- be-memorized units only during the
process of selecting new material for memorization.
In a majority of presently available hypertext systems, it
is difficult to develop an equivalent of page charts. Such a document still leaves an
impression of straying in a sea of information with little chance for managing the access
in a rational way. The main problems here are: (1) how to make sure that one does not wade
again through once processed material (during the reading process, it is easy to have a
pleasant impression of knowing everything beforehand just to discover that some of the
formulations evoke a déjà vu effect), (2) how to make sure that no important section is
missed (perhaps the increasing drive toward large hypertext documents that cannot be
encompassed in any way will eliminate this concern altogether). Sooner or later,
developers of hypertext tools will discover that there is much more to reading printed
books that what has until now been encapsulated in standard hypertext technologies.
Let us consider a collection of proposed enhancements to
generic hypertext systems that would provide solutions to the problems mentioned in the
The first of the mentioned problems concerned selection of
the material. What generic systems have to offer in that respect is: (1) possibility of
choosing a title, (2) collapsible tables of contents, (3) searching tools, and (4)
bookmarks. All that still leaves the reader with the entire document to work with.
The first and the easiest step toward the customized content is editable table of
contents. We will discuss the possible add-ons to tables of contents in Point 4 as we
address the problem of page charts.
A much more complicated, however, and probably more desirable approach to customizing
documents to particular needs are document filters. Boolean and fuzzy search procedures
standardly included in hypertext documents are usually armed with the ability to yield the
list of topics collected in the search. Such a list is usually presented in the sorted
form using one of the two criteria: (1) semantic order, and (2) number of search hits.
Indeed, such a newly generated list of topics can be viewed as a customized table of
contents. However, such a table has no attribute of persistence, in other words, it is
usually destroyed by repeating the search procedure. Moreover, if the newly generated
table of contents was all the reader was interested in, there is, as a rule, no way of
hiding the remaining contents of the document from other browsing procedures.
A document filter might have similar searching abilities as the mentioned standard search
procedures; however, the output of the search might have a form of the new document with a
new table of contents. Additionally, a keyword system, or better yet, semantic attributes
associated with particular topics or even semantic units, might be used in the search. In
other words, instead of looking for words or phrases, the search would look for keywords
or even semantic content expressed through semantic attributes.
The ultimate solution with respect to document filters is to let them collect all relevant
semantic units and, literally, generate a new document from the collected pieces. Before
such a solution might be implemented, quite a great deal of progress in natural language
processing will be required. In contrast, as it will be demonstrated in Points 4 and 5,
some handy solutions concerned with processing attributes might be just a few steps away.
As mentioned earlier, bookmarks are already a standard
fixture in all documents that have anything to do with hypertext capability. Bookmarks may
serve as a way of constructing a customized table of contents upon locating the most
relevant topics used in ones work with the document. In the context of document
filters, one might only propose that one of the possible outcomes of search should be an
editable bookmark table, that would make it possible to employ the results of search long
after it actually took place.
The important role of semantic focus will be shown only
later when we consider the link between a hypertext document and a database with the
learned material generated during the browsing process. At this point we only note that
its function can be compared to a selection bar in menus or caret cursor in edit controls
or word processor. The position of the semantic focus indicates the currently processed
semantic unit. Very much like in the case of cursors or selection bars, the actions
undertaken by the user or reader will affect only the selected unit. This actions might
be: (1) change the processing attributes of the unit, (2) change semantic attributes of
the unit (e.g. to determine the future search outcomes), (3) transfer semantic items
associated with the unit to a database with the learned material, and (4) perform an
editing action on the unit (delete, print, transfer to another document, etc.).
Page charts are most painfully missing upon moving from
printed matter to hyperspace. The division of books to pages seemed quite artificial, but
the benefits of charting are definitely worth this little inconvenience.
In the case of hypertext documents, the concept of a page seized to exist being replaced
with the concept of a topic. The best link to the entire semantic structure of topics from
the human standpoint comes via table of contents; hence the most obvious implementation
target for a counterpart of page charts. A flexible table of contents that would make
paper the commodity of the past, should meet the following conditions:
collapsibility (this feature, allowing chapters to be
expanded to sections or collapsed to the tile level, is increasingly apparent in modern
editability that would make the user choose the sequence of
topics, as well as to choose topics that should disappear from view not only at the
contents level, but also from the document itself
awareness of the readers progress through application of
processing attributes The last point seems the least obvious and worth the most attention.
As in the case of page charts, the reader should have the
possibility to mark topics with processing attributes (which are initially set to intact).
Marking a topic as irrelevant or done would be equivalent to erasing it from the table of
contents or leaving it in an easily distinguishable form, e.g. grayed. Marking a topic as
processed might be enhanced by the indicator of the degree of processing, which might also
be reflected in the appearance of the topics title in the table (e.g. through
coloring). Obviously, the process of tagging topics with processing attributes should be
available both at the contents level and the topic level.
Finally, individual semantic units should also be markable
with processing attributes. Initially, all semantic units would be marked as intact. Upon
the first reading, irrelevant items should be marked as irrelevant, and, depending on
users choice, disappear from the text or appear grayed in their original place.
Semantic units of utmost importance, might be immediately transferred to a database with
to-be-memorized items. At the very least, this process would allow the user to paste the
content of the semantic unit, reedit it and place it in a selected database. However, a
much more desirable solution is to associate all semantic units in a hypertext document
with ready-made collections of items that might be transferred to or linked with the
student's database with a key-stroke (e.g. after optional selection and pre-processing).
Items marked as memorized could also, depending on the set-up, become invisible or
distinguished by different coloring. The remaining items could be marked with a degree of
relevance (or number of reading passes); the highest degree being equivalent to the
attribute to-be-memorized. The degree of relevance might contribute to the application of
ordinal attributes that might be later used in prioritizing once-accessed items for
secondary access. Similarly, to-be-memorized items might also be tagged by ordinal
attributes that, in this case, would determine the memorization order. If the processing
attributes were applied, the user would be able to quickly skip the parts once identified
as irrelevant, as well as to pay less attention to those sections that have already been
entirely mastered by means of a module using repetition spacing algorithms. The usual
situation is, that at the early stages of processing the document, the intact topics and
units are of the highest processing priority. As the work progresses, the once-referred-to
units may increasingly get into the focus of attention (e.g. in the order determined by
their ordinal attributes). This will, in all likelihood, move their processing status to
increasing degrees of relevancy, up to the point where a decision is made to memorize a
particular semantic unit. In an optimum situation, a collection of simple techniques
should be developed to make sure that the flexible table of contents makes it possible to
quantitatively assess the progress of processing the semantic units in a given topic. For
example, the topic title in the table could be associated with a bar chart showing the
proportion of semantic units in the intact, irrelevant, relevant and memorized categories.
Our experience shows that there is a great potential for an
increase in the effectiveness of using hypertext documents in case the proposed tools are
provided in both the software shell and in the document in question.
Our hope is that in the future, the student will not ever
have to work with a repetition spacing algorithms employed by a dedicated program like
SuperMemo. The optimum situation is that the student will obtain access to a hypermedia
knowledge base (e.g. within the framework of World Wide Web) with a seamlessly integrated
algorithms for optimum spacing of repetitions (e.g. as a plug-in to a Web browser). In
other words, the focus should shift from software and its options, to knowledge itself.
Naturally, the development of a hypermedia interface for a knowledge base associated with
a database used in learning, will put much greater burden on the authors of a particular
learning system. However, the increase in the effectiveness of accessing and learning
knowledge will certainly fully compensate the higher development costs.
In the optimum case, all semantic units relevant to learning should be associated with
predefined, well-structured items (often in the standard question-answer form). A single
semantic unit might generate from one to several individual to-be-memorized items. In
other words, developing a seamless hypermedia knowledge base integrated with repetition
spacing algorithms would triple or quadruple the authors effort and costs.
A large subset of the proposed technologies has already
been implemented as: incremental reading
A practical guidebook to the effective reading of
electronic sources can be found in: Devouring