Intro to Predictive Coding: Overview & Interpretation of Terminology June 2014 | Page 22
Glossary
Collection – A group of documents. These can be
documents gathered for a particular matter
or purpose. Information retrieval scientists tend to
use several well-known document collections (e.g.,
RCV1) for testing and comparison purposes.
Confidence interval – the expected range of results.
If you drew repeated samples from the same
population, you would expect the result to be
within the confidence interval about the proportion
of times given by the confidence level.
For example, in an election poll, the difference in
the proportion of people favoring each candidate is
described as being within a range of, say, plus or
minus 5%. All other things being equal, the smaller
the confidence interval, the larger the sample size
needs to be. Said another way, the larger the
sample size, the smaller the confidence interval.