1. Powers of Ten

The Powers of Ten table is helpful in illustrating the relative size of gigabytes, terabytes, petabytes and the like.

2. Upper and lower estimates

The upper estimate is a reasonably "hard" number based on published data. The lower estimate is an attempt to adjust for duplication and compression. Here is a quick summary of some of those adjustments:

  • Paper. There is some duplication with ISBN numbers due to paperback, hardback, different editions, etc. There is duplication with financial papers, ads, and so on in newspapers. We used CPC compression, which captures images; conversion to ASCII eliminates images, but compresses text dramatically.
  • Film. If we used JPEG compression, rather than PhotoCD, we get a much smaller number for the storage requirements for images.
  • Music CDs. If we use MP3 compression, we get a much smaller number for the storage requirements of audio files.
  • Magnetic. We assume that about 20 percent of magnetic storage is unique.

3. Reading the data

The report's Site Map provides links to summary reports on each medium. The summaries provide links to detailed reports and spreadsheets containing the raw data. Within each media type, we have distinguished between originals and copies, and between the yearly flow of production and the accumulated stock. We've also described growth rates and compression issues for each medium.

4. Acknowledgements

Gray and Shenoy (2000) provides useful information on trends in magnetic storage. Lesk (1997) conducted an earlier study that attempted to estimate the total stock of information. Pool (1984) examined the flow of information in the US circa 1980. See the individual acknowledgements for the names of people who helped us.