Intro to Predictive Coding: Overview & Interpretation of Terminology June 2014 | Page 11
8. Linguistic Analysis. Linguists examine responsive and nonresponsive documents to derive classification rules that maximize
the correct classification of documents.
9. Naïve Bayesian Classifier. A system that examines the
probability that each word in a new document came from the
word distribution derived from trained responsive documents or
from trained non-responsive documents. The system is naïve in
the sense that it assumes that all words are independent of one
another.
All of these approaches involve machine learning, except, typically,
Linguistic Analysis (which may or may not include machine
learning components). A computational process extracts pertinent
information from example documents and builds a mathematical
model that allows responsive and non-responsive documents to
be distinguished from one another based on the text that they
contain.
The accuracy of these systems will depend on the specifics of the
implementation and on the qu