Friday, October 3, 2008

Week 6: Preservation in Digital Libraries

Research Challenges in Digital Libraries

We must research digital libraries in order to get a grasp on where we can take them. They are too widespread and heterogenous to really understand anything that's going on at the moment. We also need to figure out how to preserve the digital libraries as they are now for future study.

Big Issues:
1. We must figure out how to deal with all the digital libraries and preserve them while using humans as infrequently as possible.

2. We must protect the digital archives now. They require a lot of effort to maintain, so we must find a way to do that while, again, using humans as infrequently as possible.

3. We need to look at economic and business models of digital libraries to see how we can maintain these things in ways beyond technology. How can we afford to keep them up?

4. In order to expand the usefulness of digital libraries, new technologies need to be created. This needs to happen in order to make DL's cheaper while using humans as infrequently as possible.

5. We need shared and scalable infrastructure to support digital libraries. Sequestering them within institutions prevents interoperability and scalability, which hinders the usefulness of digital libraries.

Open Archival Information System Reference Model: An Introductory Guide

Open: reference model was developed in an open public forum: anyone could participate.
Archival Information System: people and institutions who agree to preserve info and make it available.

An OAIS must:
1. Get the appropriate information.
2. Make sure they have long term control of the information.
3. Know their user community.
4. Have appropriate metadata for the user to understand the info.
5. Make sure information is totally preserved.
6. Make it available to the user.

Tasks of OAIS:
-Ingestion (of data)
-Preservation Planning
-Data Management
-Archival Storage
-Administration
-Access (of data to user)

Types of information packages:
-Submission Information Package
-Archival Information Package
-Disseminated Information Package

This model provides a formula for digital library producers to follow. By doing so, they could produce an efficient, effective digital library. The paper does not provide any guidance on the technology or infrastructure to make this happen, but it does provide the guideposts of what sorts of things the technology and infrastructure must do.

Preservation Management of Digitized Materials
- The authors state that guidance is needed for digital preservation. It seems to be a recurring theme.


This book is to extensive to takes notes in much detail. However, it is an extremely interesting, useful guide for a novice in digital libraries to get a handle on the field. It introduces the reader to the vocabulary, provides reasons on why this information is vital, and explains how digital libraries are made, who uses them, what the rules and requirements are, and provides models for institutions to follow as they delve into this realm. Since this is a very new world, and many librarians are long out of library school, having this sort of resource, perhaps with additional instruction, they can get up to speed. Staying abreast of technological developments is important, and digital libraries are a huge part of that.

Actualized Preservation Threats
The National Digital Newspaper Project is an effort to "Chronicle America" by digitally preserving printed newspapers. It "also has a digital repository component that houses the digitized newspapers, supporting access and facilitating long-term preservation. Taking on access and preservation in a single system was both a deliberate decision and a deviation from past practices at LC." They wrote this paper to discuss the work done so far. Specifically, they discuss the preservation threats encountered by the project in 2 years.

Types of failures:
Media- Failure in the portable hard drives transporting the digital images from the awardees to LC. Fixed using 'fixity checks' as part of the transfer process and keeping a copy at the awardees until it was verified that LC had received it.

Hardware- Internal hard drives failed. They avoided data loss by using multiple HD arrays in a RAID 5 array with a hot spare. This prevented data loss in case one failed. Data was only lost when a second event occurred in the array while the system was rebuilding the harddrive using the hot spare.

Software- Three software problems occurred. The first involved a validation problem: records were put into the NDNP repository that had passed validation but 'did not conform to the appropriate NDNP profile'. This was fixed with new validation rules. The second was more problematic. During transformation, the newspaper title record had stripped the original METS record of the XML, and also, was producing invalid METS records. This broke the application, and also made parts of the data unreadable. The third problem occurred when the XFS file system was corrupted. This caused data loss. In a large, complex system such as this, it is harder to prevent problems, and to diagnose them when they occur. This is a serious flaw of huge digital libraries.

Operator- One error occurred when a series of files were deleted accidentally. Another occurred when the operator accidentally ingested the same batches multiple times, or perhaps did not purge a successful ingest before re-ingesting it. Many duplicates were produced.

The conclusions of the paper are that in a huge task such as this, errors are going to occur in many different ways, no matter what one does to protect against them. This makes performing a large digitization project extremely daunting, since one of the tasks is to make sure that the files are not only accessible but also permanently preserved.


This is Katie's favorite person in the world. His name is Kevin. Yes, all 3 of us have K names. It was not planned: Katie came prenamed, and we didn't have any choice in our names.

No comments: