Intro to Predictive Coding: Overview & Interpretation of Terminology June 2014 | Page 7
There are several ways that systems can get their training
examples. These training documents are a sample of all of the
documents in the collection. The examples can be selected
randomly and categorized, can be provided by expert reviewers,
chosen by the computer, or determined by some combination of
these.
Predictive coding is a kind of Computer-Assisted Review (CAR) or
Technology-Assisted Review (TAR), but it is not the only kind of
CAR/TAR. Other types include keyword searching, concept
searching, clustering, email threading, more-like-this search, and
near duplicates. These other kinds of CAR can be very useful and
can reduce the time needed to categorize documents, but they are
not predictive coding – they do not predict on the basis of
examples which documents are likely to be responsive versus nonresponsive.
In predictive coding, the computer uses the decisions made by the
expert reviewer(s) to predict how other documents should be
categorized. In clustering or the various kinds of searching, the
documents are organized into groups and, after the computer has
done its work, the reviewers then decide whether each of these
groups should be considered responsive or non-responsive.
Predictive coding involves what is called in the jargon of machine
learning “supervised learning,” while the other approach, when it
involves machine learning, is called “unsupervised learning.” In
predictive coding, the authoritative expert reviewer provides
feedback or supervision to the predictive coding system.