Intro to Predictive Coding: Overview & Interpretation of Terminology June 2014 | Page 16
2. Exploratory Data Analysis. The producing party, recognizing
its obligation to produce responsive documents, begins
document analysis. The technology does not require sharing
training documents or seed sets with the receiving party.
Sharing these documents assumes that the technology
works as expected, but that the producing party requires
“guidance” to identify the correct documents to be produced. There are many ways to provide this guidance
without having to share non-responsive documents. Legal
and strategic concerns should govern whether these
documents should be shared, it is not an intrinsic part of the
predictive coding process.
3. Estimate Prevalence. The producing party samples the
document set to get an estimate of prevalence. How rare /
frequent are responsive documents? Prevalence is
important because special steps may be needed to make
predictive coding training efficient if responsive documents
are extremely rare (e.g., less than 1% of the documents are
responsive). Prevalence sampling may be part of the
process of training the predictive coding system.
4. Predictive Coding Training. The producing party begins
predictive coding training. The producing party may report
accuracy statistics along the way, or, if training is brief, at the
end of training. Not all predictive coding tools yield
meaningful statistics during the course of training. Some