It goes without saying that the numbers in Table 1 can only be taken as rough estimates. We have had to make various assumptions in order to construct these figures, and some data sources are contradictory or simply not available. Here we list some of the most serious methodological qualifications, each of which offers interesting challenges for those who would seek to refine these estimates.

  • Duplication.

    It is very difficult to distinguish "copies" from "original" information. A newspaper, for example, is published on paper, often published on the Web as well, and is generally archived on microfilm. In fact, most printed materials are produced and/or archived magnetically. There is also lot of duplication within each medium: many newspapers reproduce stock prices, wire stories, advertisements, and so on. Ideally, we would like to measure the storage required for the unique content in the newspaper, but it is very hard to measure that as a number. As indicated above, the duplication issue is particularly serious for digital storage, since little of what is stored on individual hard drives is unique. We've tried to adjust for this the best we can and have documented our assumptions in the detailed treatment of each medium.

  • Compression.

    There is no unambiguous way to measure the size of digital information. A 600-dot-per-inch scanned digital image of text can be compressed to about one hundredth of its original size. A DVD version of a movie can be 1,000 times smaller than the original digital image. We've made what we thought were sensible choices with respect to compression, steering a middle course between the high estimate (based on "reasonable" compression) and the low estimate (based on highly compressed content). It is worth noting that the fact that digital storage can be compressed to different degrees depending on needs is a significant advantage for digital over analog storage.

  • Archival Media.

    Should information stored as "backup" be included in the total? This question arises for microfilm, rewritable CD ROMS, and even with print, but digital magnetic tape is the most difficult case. Tape's most common use is to archive material on hard drives and therefore it should not count towards the stock of "original information" produced each year. Industry rules of thumb suggest that there is about ten times as much storage on tape as on hard drives. This fraction has been falling as more and more data is stored on arrays of hard drives which are much more convenient to use. We've omitted most tape storage for this reason. However, we should also note that vast quantities of original scientific data are stored in tape libraries; we describe a few such repositories in the detailed treatment of magnetic storage.

  • World and US production.

    The US produces about 25% of all textual information and about 30% of the photographic information, a significant fraction of the world's total. We don't have good data on magnetic storage, but it seems plausible that the US produces at least half of the content stored on magnetic media. We've used numbers for world production when available, but in some cases have had to extrapolate from US production. Little data is available about information production in the Third World.

  • Growth rates.

    The production of unique content in books, photos, and CDs is barely growing. DVD content is growing rapidly, but that's because it is a new medium and a significant amount of legacy content is being converted. By contrast, shipments of digital magnetic storage are essentially doubling every year.

  • TV and Radio.

    Original TV content produced each year is generally stored on magnetic camcorder tapes and so is counted in that category of storage media. Much radio content is simply broadcast music, which we have already captured with the CD statistics. See Table 3 for information on how much storage it would take to back up all TV and radio broadcasts with minimal adjustment for duplication.