23. 23.  Stein et. al. (2006) discuss the impact of uncorrected optical character recognition in machine learning tasks, concluding that the impact is limited.  With the rise of mass digitization projects, such as Google Books, we believe systems must be tolerant to significant levels of "noisy" or uncorrected data. 


 [ return to text ]