In this text we would like to show the need for developing knowledge access systems that would account for the imperfectness of human perception, information processing and memory. The implementation of such systems will result in enormous savings in the process of learning at all three stages of knowledge acquisition by the human mind:
- knowledge access (i.e. searching for, selecting and prioritizing information for later review)
- learning (i.e. formulating knowledge and transferring it to human memory)
- knowledge retention (i.e. retaining knowledge in memory by means of spaced repetition)
In particular, we will try to stress the importance of repetition spacing algorithms, as well as the importance of the application of the newly introduced concept of processing, semantic and ordinal attributes in hypertext documents.
Fusion of the hypertext paradigm with techniques targeted against human forgetfulness
Implementation shortcomings evident in generic hypertext interfaces
Historically, the development of repetition spacing algorithms proceeded from common sense paper-and- pencil applications to increasingly sophisticated computer algorithms that have finally been implemented in commercial products that have gained substantial popularity among students of languages, medicine, and many more.
This development process was almost entirely orientated towards the retention of the acquired knowledge in the student’s memory. Currently, there is a possibility of a similar development process being initiated in reference to retrieval and acquisition of knowledge.
Effective learning is based not only on being able to retain the learned material in one’s memory. Before that, the to-be-learned knowledge must be identified, pre-processed with a view to understanding, classified and selected with respect to its relevance and importance. This process can greatly be enhanced by means of simple techniques, which make an excellent material for computer implementation.
This implementation is more and more urgent with the diminishing role played by printed materials in the wake of an increasing role of World Wide Web and the vast market for CD-ROM title releases across the board of all possible subject domains. The straightforward use of a pencil, that is often instrumental in one’s work with the printed matter, becomes increasingly impossible with more and more multimedia titles appearing on CD-ROM and with a rapid growth of hypermedia available via global computer networks. Some visionaries are even predicting the death of the printed matter as we know it. The gap between the effectiveness of browsing printed vs hypertext documents seems to grow by the minute, though still very little attention is paid to the reader’s or user’s ability to leave the trace of his work in the document. Most of hypertext systems distributed on CD-ROM provide the user only with annotation and bookmark tools, which leave much room for improvement.
Let us shortly present exemplary tools and techniques that can be used in working with printed textbooks, and what inspiration this might provide for the design of future hypermedia documents.
- The first problem with books to read is that there are usually too many of them. A good selection of the most applicable material is the first step to effective acquisition of knowledge. This subject, however, we will leave out from the consideration. This is because we would like to entirely focus on the authoring systems for development of hypertext documents, as well as the tools that would enhance such documents, and made them more attractive from the student’s standpoint. The new technologies, most notably CD-ROM, will make the author’s choice easier in this sense, that the vast capacity of the media will leave less stringent constrain on what not to include in the final shape of the document. When we extend it to World Wide Web, the question becomes irrelevant. With appropriate navigation and search tools, the hyperspace might virtually remain unlimited.
- After selecting the learning material, the important tool to use is a bookmark. Apart from reference materials like encyclopedias, dictionaries, computer documentation, etc. most of the printed material provides the possibility and often requires a substantial dose of linear progress through the contents. As time slices allocated for reading, often break one’s work in the middle of a linear section, bookmarks are of indispensable value. With the advent of hypertext applications, the average length of a linearly processed text is likely to drop dramatically. However, bookmarks do not only serve as pointers to interrupted reading, but also provide the means of a customizable table of contents, which can be used for quickly accessing sections which are of the greatest interest. Bookmarks have been an early and ubiquitous child of hypertext documents; therefore, we will also not consider them in the reasoning that follows.
- After picking a book, and selecting the relevant bookmark, the process of reading or browsing begins. First of all, the same bookmark that was used in accessing a particular chapter or section, may serve as the pointer that helps keeping the sight focused on the processed paragraph. This is particularly useful in richly illustrated texts, or at moments when external interruptions require frequent shifting the sight beyond the reading area. In a hypertext document, the counterpart of a paper bookmark used in reading a textbook, should be a cursor that highlights a single semantic unit that is currently being processed. The importance of such a cursor may go far beyond the sight guidance of a traditional bookmark. Such a cursor will later on be called a semantic focus. It is not difficult to notice that modern textbooks go further and further into making particular semantic units of the text less context dependent. In other words, by picking up a randomly selected sentence from a modern textbook, we are more likely to understand it that it would be possible in the textbooks written in the style from a few decades ago. The general trend is to shift from prose to more precise forms of expressions. This will be most visibly seen through proliferation of illustrations, formulas, insert boxes, enumerations, underlined text, etc. This trend comes from the increasing tendency to convert linear textbooks to pick’n’read reference materials. This makes the job of a hypertext document author much less of a trouble. This will also make semantic units live life of their own, with the benefit for knowledge retrieval and acquisition.
- The most important part of a good textbook processing
technique is to leave traces of one’s work in the text. After all, let the book
itself learn about what the reader’s progress is, and not keep the entire burden in
that reference on reader’s memory. First of all, it is useful to prepare a page chart
for every carefully studied book. The page chart keeps the record of each page processed,
and the current processing status. The processing status may assume at least the three
- intact - the page has not yet been processed
- processed - the page has been read at least once and all its semantic units have been reviewed and marked with processing attributes, which, very much like in page charts, indicate the processing status (e.g. irrelevant, important, memorized, etc.)
- done - the page has been fully processed, and needs no further reference. For example, all its semantic units have been marked as irrelevant, or all its relevant semantic units have been memorized
In some cases, it may also be worthwhile to separate a few degrees of the attribute processed (or read). After all, the page might have been read once, twice, or several times, with all its semantic units changing the processing attributes during each passage.
The rationale behind page charts is to have a constant opportunity to control the speed and direction of processing a particular textbook; the greatest advantages being: (1) no need to refer to fully processed pages marked with done, and (2) giving priority to new material (intact) as opposed to the material that has already been, at least partly, processed (read).
- As mentioned earlier, all semantic units are marked with
processing attributes during the progress of reading. These are:
- irrelevant - the semantic unit is not worth future reference.
- relevant - the semantic unit is worth future reference (which may change its attribute to irrelevant, to-be-memorized or memorized).
- to-be-memorized - the semantic unit seems worth remembering, and should be put to a database with the learned material associated with the currently processed book. The process of actually transferring the unit to a database will take place as soon as processing the book moves to more advanced stages. On occasion, this may happen much earlier or never.
- memorized - the semantic unit has been transferred to a database with the learned material, memorized and subjected to a repetition spacing algorithm. In other words, it needs no future reference.
The obvious rationale behind marking semantic units with processing attributes is never to refer to irrelevant or memorized units, to focus the reading attention on relevant units, and to use to- be-memorized units only during the process of selecting new material for memorization.
In a majority of presently available hypertext systems, it is difficult to develop an equivalent of page charts. Such a document still leaves an impression of straying in a sea of information with little chance for managing the access in a rational way. The main problems here are: (1) how to make sure that one does not wade again through once processed material (during the reading process, it is easy to have a pleasant impression of knowing everything beforehand just to discover that some of the formulations evoke a déjà vu effect), (2) how to make sure that no important section is missed (perhaps the increasing drive toward large hypertext documents that cannot be encompassed in any way will eliminate this concern altogether). Sooner or later, developers of hypertext tools will discover that there is much more to reading printed books that what has until now been encapsulated in standard hypertext technologies.
New solutions proposed for hypertext systems
Let us consider a collection of proposed enhancements to generic hypertext systems that would provide solutions to the problems mentioned in the preceding section.
- The first of the mentioned problems concerned selection of
the material. What generic systems have to offer in that respect is: (1) possibility of
choosing a title, (2) collapsible tables of contents, (3) searching tools, and (4)
bookmarks. All that still leaves the reader with the entire document to work with.
The first and the easiest step toward the customized content is editable table of contents. We will discuss the possible add-ons to tables of contents in Point 4 as we address the problem of page charts.
A much more complicated, however, and probably more desirable approach to customizing documents to particular needs are document filters. Boolean and fuzzy search procedures standardly included in hypertext documents are usually armed with the ability to yield the list of topics collected in the search. Such a list is usually presented in the sorted form using one of the two criteria: (1) semantic order, and (2) number of search hits. Indeed, such a newly generated list of topics can be viewed as a customized table of contents. However, such a table has no attribute of persistence, in other words, it is usually destroyed by repeating the search procedure. Moreover, if the newly generated table of contents was all the reader was interested in, there is, as a rule, no way of hiding the remaining contents of the document from other browsing procedures.
A document filter might have similar searching abilities as the mentioned standard search procedures; however, the output of the search might have a form of the new document with a new table of contents. Additionally, a keyword system, or better yet, semantic attributes associated with particular topics or even semantic units, might be used in the search. In other words, instead of looking for words or phrases, the search would look for keywords or even semantic content expressed through semantic attributes.
The ultimate solution with respect to document filters is to let them collect all relevant semantic units and, literally, generate a new document from the collected pieces. Before such a solution might be implemented, quite a great deal of progress in natural language processing will be required. In contrast, as it will be demonstrated in Points 4 and 5, some handy solutions concerned with processing attributes might be just a few steps away.
- As mentioned earlier, bookmarks are already a standard fixture in all documents that have anything to do with hypertext capability. Bookmarks may serve as a way of constructing a customized table of contents upon locating the most relevant topics used in one’s work with the document. In the context of document filters, one might only propose that one of the possible outcomes of search should be an editable bookmark table, that would make it possible to employ the results of search long after it actually took place.
- The important role of semantic focus will be shown only later when we consider the link between a hypertext document and a database with the learned material generated during the browsing process. At this point we only note that its function can be compared to a selection bar in menus or caret cursor in edit controls or word processor. The position of the semantic focus indicates the currently processed semantic unit. Very much like in the case of cursors or selection bars, the actions undertaken by the user or reader will affect only the selected unit. This actions might be: (1) change the processing attributes of the unit, (2) change semantic attributes of the unit (e.g. to determine the future search outcomes), (3) transfer semantic items associated with the unit to a database with the learned material, and (4) perform an editing action on the unit (delete, print, transfer to another document, etc.).
- Page charts are most painfully missing upon moving from
printed matter to hyperspace. The division of books to pages seemed quite artificial, but
the benefits of charting are definitely worth this little inconvenience.
In the case of hypertext documents, the concept of a page seized to exist being replaced with the concept of a topic. The best link to the entire semantic structure of topics from the human standpoint comes via table of contents; hence the most obvious implementation target for a counterpart of page charts. A flexible table of contents that would make paper the commodity of the past, should meet the following conditions:
- collapsibility (this feature, allowing chapters to be expanded to sections or collapsed to the tile level, is increasingly apparent in modern hypertext systems)
- editability that would make the user choose the sequence of topics, as well as to choose topics that should disappear from view not only at the contents level, but also from the document itself
- awareness of the readers progress through application of processing attributes The last point seems the least obvious and worth the most attention.
- Finally, individual semantic units should also be markable with processing attributes. Initially, all semantic units would be marked as intact. Upon the first reading, irrelevant items should be marked as irrelevant, and, depending on user’s choice, disappear from the text or appear grayed in their original place. Semantic units of utmost importance, might be immediately transferred to a database with to-be-memorized items. At the very least, this process would allow the user to paste the content of the semantic unit, reedit it and place it in a selected database. However, a much more desirable solution is to associate all semantic units in a hypertext document with ready-made collections of items that might be transferred to or linked with the student's database with a key-stroke (e.g. after optional selection and pre-processing). Items marked as memorized could also, depending on the set-up, become invisible or distinguished by different coloring. The remaining items could be marked with a degree of relevance (or number of reading passes); the highest degree being equivalent to the attribute to-be-memorized. The degree of relevance might contribute to the application of ordinal attributes that might be later used in prioritizing once-accessed items for secondary access. Similarly, to-be-memorized items might also be tagged by ordinal attributes that, in this case, would determine the memorization order. If the processing attributes were applied, the user would be able to quickly skip the parts once identified as irrelevant, as well as to pay less attention to those sections that have already been entirely mastered by means of a module using repetition spacing algorithms. The usual situation is, that at the early stages of processing the document, the intact topics and units are of the highest processing priority. As the work progresses, the once-referred-to units may increasingly get into the focus of attention (e.g. in the order determined by their ordinal attributes). This will, in all likelihood, move their processing status to increasing degrees of relevancy, up to the point where a decision is made to memorize a particular semantic unit. In an optimum situation, a collection of simple techniques should be developed to make sure that the flexible table of contents makes it possible to quantitatively assess the progress of processing the semantic units in a given topic. For example, the topic title in the table could be associated with a bar chart showing the proportion of semantic units in the intact, irrelevant, relevant and memorized categories.
Our experience shows that there is a great potential for an increase in the effectiveness of using hypertext documents in case the proposed tools are provided in both the software shell and in the document in question.
Integrating repetition spacing technology with a hypertext interface
Our hope is that in the future, the student will not ever
have to work with a repetition spacing algorithms employed by a dedicated program like
SuperMemo. The optimum situation is that the student will obtain access to a hypermedia
knowledge base (e.g. within the framework of World Wide Web) with a seamlessly integrated
algorithms for optimum spacing of repetitions (e.g. as a plug-in to a Web browser). In
other words, the focus should shift from software and its options, to knowledge itself.
Naturally, the development of a hypermedia interface for a knowledge base associated with a database used in learning, will put much greater burden on the authors of a particular learning system. However, the increase in the effectiveness of accessing and learning knowledge will certainly fully compensate the higher development costs.
In the optimum case, all semantic units relevant to learning should be associated with predefined, well-structured items (often in the standard question-answer form). A single semantic unit might generate from one to several individual to-be-memorized items. In other words, developing a seamless hypermedia knowledge base integrated with repetition spacing algorithms would triple or quadruple the authors’ effort and costs.