Intro to Predictive Coding: Overview & Interpretation of Terminology June 2014

2. Exploratory Data Analysis. The producing party, recognizing its obligation to produce responsive documents, begins document analysis. The technology does not require sharing training documents or seed sets with the receiving party. Sharing these documents assumes that the technology works as expected, but that the producing party requires “guidance” to identify the correct documents to be produced. There are many ways to provide this guidance without having to share non-responsive documents. Legal and strategic concerns should govern whether these documents should be shared, it is not an intrinsic part of the predictive coding process. 3. Estimate Prevalence. The producing party samples the document set to get an estimate of prevalence. How rare / frequent are responsive documents? Prevalence is important because special steps may be needed to make predictive coding training efficient if responsive documents are extremely rare (e.g., less than 1% of the documents are responsive). Prevalence sampling may be part of the process of training the predictive coding system. 4. Predictive Coding Training. The producing party begins predictive coding training. The producing party may report accuracy statistics along the way, or, if training is brief, at the end of training. Not all predictive coding tools yield meaningful statistics during the course of training. Some

Intro to Predictive Coding: Overview & Interpretation of Terminology June 2014 | Page 16