Friday, November 28, 2008
Muddiest Point 12
If a digital library has its security compromised, and data is stolen that is under copyright protection, can the library/organization that maintains the digital collection be legally liable?
Friday, November 21, 2008
Muddiest Point
I don't know if this counts as a muddiest point, but it's a thought. If a digital library is created in the interest of a specific population that is likely to be able to access the material in terms of how computer literate they are, ought we be concerned about making sure computer illiterate people can access it? Digital libraries are created with the grand intention of democratizing the materials, but in practice, we have to place serious limitations on who can access them anyways to abide by copyright laws. So, I feel like worrying about 'disenfranchising' the user group who are unable to use computers or the Internet is less of a problem than it is made out to be, and is rapidly becoming even less.
Week 12
Implementing Policies for Access Management
Just like regular libraries need access policies for users, so do digital libraries. They are constrained by copyright laws, and other requirements. This model is a dynamic model, allowing policies to change for users as required. The policy information is stored as metadata and is not implemented until the user attempts to access it, and the user must be authenticated. Different types of users (faculty, student) will have different policies, and the material in the library will have different policies (general, reserve, reference) depending on the role of the user. The operation for each type of material for each type of user will change. You can then create a policy table from these things.
Having a good user interface is vital to allowing these policies to be maintained without irritating the user. The user should not be bombarded with many requests for authentication, passwords, and other nonsense. It should appear as seamless as possible.
This article was a good set of guidelines for having good access management. It is important to balance the need to uphold copyright laws and the need to provide a good experience to the user. This seems to find that balance.
Lesk Ch 9
This chapter was informative and interesting on a number of levels. First of all, it presents this economic issues that digital libraries are facing in a very clear, straightforward way. It does this while also addressing the economic issues facing all libraries in the modern world: increasing cost of journal subscriptions, whether or not to only offer digital subscriptions, and the constant need to prove one's worth to the umbrella organization, despite the fact that the library brings in very little money. It discusses these things simply with plenty of explanation and examples. It is a good chapter for understanding library economics.
Arms Ch. 6
This chapter discussed many of the same things that Lesk did, dealing with the economic framework of a digital library. The interesting part of this ocmpared to Lesk was the discussion of the legal issues related to digital libraries, particularly in terms of copyright. We have had copyright laws for a long time, but they've come into a new light since the advent of the digital age. Libraries and publishers are still trying to work out what is a good way to charge for these things, and how to appease everyone.
Personally, I like the idea of paying for it using advertisement. As browsers and languages have become more advanced, it is possible to have less obtrusive advertising. Plenty of bloggers support themselves by including advertising, and it is becoming less stigmatized to do so. I see nothing wrong with a free institution support its product with advertising, as long as it is appropriate and not annoying.
Arms Ch 7
This chapter discussed many of the same topics that the first article did: access management. However, Arms went into greater detail about the protocol for access management, as well as delving into the world of digital library security and encryption. If you are going to try to restrict access to the materials in a digital library, you need to make sure that it is difficult for some one to illegally access them without authorization. Understanding encryption is necessary.
In closing, here is an insanely happy puppy frolicking through the field.
 
Just like regular libraries need access policies for users, so do digital libraries. They are constrained by copyright laws, and other requirements. This model is a dynamic model, allowing policies to change for users as required. The policy information is stored as metadata and is not implemented until the user attempts to access it, and the user must be authenticated. Different types of users (faculty, student) will have different policies, and the material in the library will have different policies (general, reserve, reference) depending on the role of the user. The operation for each type of material for each type of user will change. You can then create a policy table from these things.
Having a good user interface is vital to allowing these policies to be maintained without irritating the user. The user should not be bombarded with many requests for authentication, passwords, and other nonsense. It should appear as seamless as possible.
This article was a good set of guidelines for having good access management. It is important to balance the need to uphold copyright laws and the need to provide a good experience to the user. This seems to find that balance.
Lesk Ch 9
This chapter was informative and interesting on a number of levels. First of all, it presents this economic issues that digital libraries are facing in a very clear, straightforward way. It does this while also addressing the economic issues facing all libraries in the modern world: increasing cost of journal subscriptions, whether or not to only offer digital subscriptions, and the constant need to prove one's worth to the umbrella organization, despite the fact that the library brings in very little money. It discusses these things simply with plenty of explanation and examples. It is a good chapter for understanding library economics.
Arms Ch. 6
This chapter discussed many of the same things that Lesk did, dealing with the economic framework of a digital library. The interesting part of this ocmpared to Lesk was the discussion of the legal issues related to digital libraries, particularly in terms of copyright. We have had copyright laws for a long time, but they've come into a new light since the advent of the digital age. Libraries and publishers are still trying to work out what is a good way to charge for these things, and how to appease everyone.
Personally, I like the idea of paying for it using advertisement. As browsers and languages have become more advanced, it is possible to have less obtrusive advertising. Plenty of bloggers support themselves by including advertising, and it is becoming less stigmatized to do so. I see nothing wrong with a free institution support its product with advertising, as long as it is appropriate and not annoying.
Arms Ch 7
This chapter discussed many of the same topics that the first article did: access management. However, Arms went into greater detail about the protocol for access management, as well as delving into the world of digital library security and encryption. If you are going to try to restrict access to the materials in a digital library, you need to make sure that it is difficult for some one to illegally access them without authorization. Understanding encryption is necessary.
In closing, here is an insanely happy puppy frolicking through the field.
 
Friday, November 7, 2008
Weekly Response 10
Arms Ch. 8: Usability and interface design
Aspects of usability: interface design, functional design, data and metadata, computer systems and networks. These are built on the conceptual model.
Interface design: appearance on the screen and manipulation by the user
Functional design: functions available to the user
Data and metadata: provided by the library
Computer systems and networks: necessary to make everything work.
Desktop metaphor: pioneered by the amazing Apple, it is most common now: a graphical interface that mimics the idea of an actual desk with folders, files and documents on it.
Browser function: retrieve a file from a web server and render it on the computer.
Digital libraries must accept that browsers are how users access the DL.
Digital Library Design for Usability
Interface Usability: Learnability, efficiency, memorability, and errors.
Organizational Usability: accessibility, compatibility, integratibility into work places, and social-organizational expertise.
Take home message: Know your community. Make the DL software to serve them.
Evaluation of Digital Libraries
Digital library designers and digital library users are at war!
Evaluation of digital libraries is not widespread because they are very complex, it is too early in their development to evaluate them, nobody cares, there isn't enough money, and evaluation is not part of the culture. Hence, it is difficult for DL designers to realize that the users are angry. This is unfortunate, because libraries of any sort are supposed to cater to patrons. Without happy patrons, the DL will not be used. If it is not used, then funding will dry up. Make your users/patrons happy!
Designing User Interfaces
Usability goals:
1. Appropriate to user needs.
2. Must be reliable in function.
3. Must be standardized to ease learning.
4. Must complete projects on time and within budget.
Thoughts: This article lays out many of the ideas mentioned before, but in a more generalized sense. One of the ideas they talk about is standardization and compatibility. The more standardized a program is, and the more compatible it is between other programs and versions, the more successful it will be. Think Microsoft, Windows and the Office suite. It's success was a function of the compatibility and standardization of all of those things. What surprises me so much about Vista and the new Office 2007 suite is how much they abandoned that premise. Office is barely compatible with its previous versions, and they completely revamped everything so it is new and not standardized. No wonder no one likes it! Perhaps Windows should learn from its own lessons.
Aspects of usability: interface design, functional design, data and metadata, computer systems and networks. These are built on the conceptual model.
Interface design: appearance on the screen and manipulation by the user
Functional design: functions available to the user
Data and metadata: provided by the library
Computer systems and networks: necessary to make everything work.
Desktop metaphor: pioneered by the amazing Apple, it is most common now: a graphical interface that mimics the idea of an actual desk with folders, files and documents on it.
Browser function: retrieve a file from a web server and render it on the computer.
Digital libraries must accept that browsers are how users access the DL.
Digital Library Design for Usability
Interface Usability: Learnability, efficiency, memorability, and errors.
Organizational Usability: accessibility, compatibility, integratibility into work places, and social-organizational expertise.
Take home message: Know your community. Make the DL software to serve them.
Evaluation of Digital Libraries
Digital library designers and digital library users are at war!
Evaluation of digital libraries is not widespread because they are very complex, it is too early in their development to evaluate them, nobody cares, there isn't enough money, and evaluation is not part of the culture. Hence, it is difficult for DL designers to realize that the users are angry. This is unfortunate, because libraries of any sort are supposed to cater to patrons. Without happy patrons, the DL will not be used. If it is not used, then funding will dry up. Make your users/patrons happy!
Designing User Interfaces
Usability goals:
1. Appropriate to user needs.
2. Must be reliable in function.
3. Must be standardized to ease learning.
4. Must complete projects on time and within budget.
Thoughts: This article lays out many of the ideas mentioned before, but in a more generalized sense. One of the ideas they talk about is standardization and compatibility. The more standardized a program is, and the more compatible it is between other programs and versions, the more successful it will be. Think Microsoft, Windows and the Office suite. It's success was a function of the compatibility and standardization of all of those things. What surprises me so much about Vista and the new Office 2007 suite is how much they abandoned that premise. Office is barely compatible with its previous versions, and they completely revamped everything so it is new and not standardized. No wonder no one likes it! Perhaps Windows should learn from its own lessons.
Friday, October 17, 2008
Muddiest Point
I'm not sure I understand the Deep Web. Does that mean that it hasn't been indexed by a web crawler? How can a given website owner know if his site is in the deep or visible web?
Weekly Response 8
Chapter 1. Definition and Origins of OAI-PMH
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH): greater interoperability between digital libraries and more efficient dissemination of information. Gee, that sounds like the general goal of all libraries. One thing I'm learning in this class is that while digital libraries are very different than traditional libraries in terms of structure and management, they have the same goals. Get information to the people! Preserve it for future people!
Scope: Metadata, using XML. It is moving into working with other classes of metadata and full content. The metadata is specifically for document-like-objects, in digital form. Often digital libraries are not just books and papers, but digital images, digital objects, and other things that require metadata.
Purpose: define a standard way to move metadata information from point A to point B in the world wide web; to facilitate sharing and aggregation of metadata.
They accomplish this by dividing the universe into OAI data providers (have the content and/or metadata) and OAI service providers (harvest info from data providers and make it available). This follows the client/server model. Data providers are the servers, service providers are the clients. This model allows one-stop shopping.
   
What it is not: an open access system, an archival standard, Dublin Core, or a realtime/dynamic search service.
Federated Searching: Put it in its place
Users want a search box! Give simple and easy access to information in one place, just like Google does. Whether or not the answer is the best one or from the best source is a moot point. Therefore, make federated searching mimic Google: one stop shopping that spits out an answer.
The Truth about Federated Searching
1. Does not search everything, ever! You will still have to consult other sources.
2. You will still get duplicates. To truly avoid duplication, it would take too long to download.
3. Relevancy is not perfect because it is only looking at the citation.
4. Federated searching out to be used as a service, not purchased as software. Updates happen to often to make it feasible.
5. The federated search engine does not search your catalog better than you can, it only searches it as well as your own search engine can.
The Z39.50 Information Retrieval Standard
Z39.50 is a standard allowing patrons to search other libraries' catalogs using their native library's interface. A client machine searches the server for data and it is retrieved using the client machine.
The server has all the catalog information and it retrieves the appropriate information and returns it to the user machine. Each set of database records has a set of access points for the collection.
Search Engine Technology and Digital Libraries
Since libraries are academic institutions with minimal universal searching capacity, and places like Google are universal search engines with minimal (although still a lot!) academic focus, the best of both worlds would be to marry the two entities: the academic internet! Google does have GoogleScholar now, although I am uncertain if it existed in June 2004, when this article was written. My understanding is that GoogleScholar works by bringing up papers and publications known to be 'academic' in nature that fulfill the search request. If you are searching from an academic IP address (like Pitt!) it will sort things so that emphasis is given to information available through the databases that that IP address subscribes to. So, if you search GoogleScholar from a Pitt computer, you are likely to retrieve fulltext items that you could have found through a database available at Pitt, but with the comfort of the Google interface.
This article appears to be focusing on academic libraries indexing the academic internet and making it available. Essentially, they would be putting the "LIBRARIAN APPROVED!" stamp on it. This helps the uninitiated user discern what would be an appropriate and trust-worthy source, vs. an inappropriate and untrustworthy source.
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH): greater interoperability between digital libraries and more efficient dissemination of information. Gee, that sounds like the general goal of all libraries. One thing I'm learning in this class is that while digital libraries are very different than traditional libraries in terms of structure and management, they have the same goals. Get information to the people! Preserve it for future people!
Scope: Metadata, using XML. It is moving into working with other classes of metadata and full content. The metadata is specifically for document-like-objects, in digital form. Often digital libraries are not just books and papers, but digital images, digital objects, and other things that require metadata.
Purpose: define a standard way to move metadata information from point A to point B in the world wide web; to facilitate sharing and aggregation of metadata.
They accomplish this by dividing the universe into OAI data providers (have the content and/or metadata) and OAI service providers (harvest info from data providers and make it available). This follows the client/server model. Data providers are the servers, service providers are the clients. This model allows one-stop shopping.
What it is not: an open access system, an archival standard, Dublin Core, or a realtime/dynamic search service.
Federated Searching: Put it in its place
Users want a search box! Give simple and easy access to information in one place, just like Google does. Whether or not the answer is the best one or from the best source is a moot point. Therefore, make federated searching mimic Google: one stop shopping that spits out an answer.
The Truth about Federated Searching
1. Does not search everything, ever! You will still have to consult other sources.
2. You will still get duplicates. To truly avoid duplication, it would take too long to download.
3. Relevancy is not perfect because it is only looking at the citation.
4. Federated searching out to be used as a service, not purchased as software. Updates happen to often to make it feasible.
5. The federated search engine does not search your catalog better than you can, it only searches it as well as your own search engine can.
The Z39.50 Information Retrieval Standard
Z39.50 is a standard allowing patrons to search other libraries' catalogs using their native library's interface. A client machine searches the server for data and it is retrieved using the client machine.
The server has all the catalog information and it retrieves the appropriate information and returns it to the user machine. Each set of database records has a set of access points for the collection.
Search Engine Technology and Digital Libraries
Since libraries are academic institutions with minimal universal searching capacity, and places like Google are universal search engines with minimal (although still a lot!) academic focus, the best of both worlds would be to marry the two entities: the academic internet! Google does have GoogleScholar now, although I am uncertain if it existed in June 2004, when this article was written. My understanding is that GoogleScholar works by bringing up papers and publications known to be 'academic' in nature that fulfill the search request. If you are searching from an academic IP address (like Pitt!) it will sort things so that emphasis is given to information available through the databases that that IP address subscribes to. So, if you search GoogleScholar from a Pitt computer, you are likely to retrieve fulltext items that you could have found through a database available at Pitt, but with the comfort of the Google interface.
This article appears to be focusing on academic libraries indexing the academic internet and making it available. Essentially, they would be putting the "LIBRARIAN APPROVED!" stamp on it. This helps the uninitiated user discern what would be an appropriate and trust-worthy source, vs. an inappropriate and untrustworthy source.
Labels:
access,
federated searching,
week 8,
weekly response,
Z39.50
Friday, October 10, 2008
Muddiest Point
This is not about the lecture, but I do need clarification on the final project, so here's the place.
I know we are supposed to have variation among the 3 digital collections. Could we do 2 collections of digital photographs of objects that are unrelated to each other and then a 3rd collection of scanned material? Would that be varied enough?
I know we are supposed to have variation among the 3 digital collections. Could we do 2 collections of digital photographs of objects that are unrelated to each other and then a 3rd collection of scanned material? Would that be varied enough?
Week 7: Access in Digital Libraries
Web Search Engines: Part 1
Problems for Web Searchers:
1. Infrastructure: must have the computers and hardware to meet the number of demands in a given period of time.
2. Crawling algorithms: Bots that go around the internet and index it. Crawlers start with a list or queue of good 'seed URLs'- sites that have lots of links to to other good websites. They then add all the unseen links to the queue, and save the content for indexing. The keep doing this until they hit the end of the queue.
Speed: One crawler can't do the whole internet! Need multiple crawlers who are assigned to different URLs to work in parallel (hashing function). Each crawler machine has internal parallelism as well, with multiple threads working at once.
Politeness: Don't harass the website's servers!
Excluded content: must look at robot.txt to determine what content should not be crawled.
Duplicate content: Avoid it.
Continuous crawling: Have a priority queue so important URL's are checked more frequently than low value/static URL's.
Spam: Prevent it! Blacklists, etc.
Web Search Engines: Part 2
Indexing algorithms: Scans document for indexable terms. These are then ranked in terms of position and repetition to give importance.
Real indexers:
Scaling up: divide the load among many machines, and fill up memory space with partial inverted files, and then combine the partials.
Term look up: So many phrases, so little time. Engines use trees, hierarchies and 2 level structures to make things more efficient.
Compression: Save space, compress data structures. This also makes searches faster.
Phrases: produce lists of common phrases.
Anchor text: words used to describe a link. A strongly repeated anchor text gives a good clue as to what the website is about.
Link popularity: the more people link to you, the better you are. This is proof that life really is a popularity contest, no matter what your mother told you.
Query-independent score: high scores in other ways improve ranking, even if it doesn't match the query as well.
   
Query processing algorithms: simple query processor looks up each word in its dictionary and locates postings list. It scans the postings list for documents in common.
Make them faster! Skip unnecessary parts of the list, end the results list early, number the documents based on their decreasing query-independent scores. Another option: cache! Precompute and store HTML results pages for popular searches. Spit this out upon request.
Henzinger:
The first part of this article focused on the same issues related to web search engines as the previous two articles.
Where it differed was in the third section:
Content Quality: How do you deal with wrong or misleading information? This is a topic that has occupied librarians' attention for a while. We produce guides and tutorials and lists on how to filter out the 'junk' on the internet. You forget that search engines try to help out with that. It's not a question of tricking the search engine into giving results that are not appropriate to the query, but one of whether or not the information provided is correct, even if it answers the query. Page rank and hits are a good measure, but not perfect. Anchor text might be useful, but junky websites can still have quality links. The most plausible is text based analysis.
Quality Evaluation: Measure the number of clicks a given result gets, and then the number of click throughs from that website.
Web Conventions: These habits of websites must be adhered to for the search engine to be able to use them correctly.
Anchor Text: Text in the links describes the link.
Appropriate and interesting links for the website's audience, related to the website's content.
Meta tags: like metadata in a library catalog, meta tags in a webpage can describe the site's content.
Duplicate hosts: multiple domain names resolve at the same end site for increased visibility. This is why typing in "pubmed.gov" sends you to http://www.ncbi.nlm.nih.gov/sites/entrez/. This is called a mirror. Search engines run the risk of providing results for each of those names, even though they have identical content. However, it could be hard to tell that if the ads on the pages are slightly different from one viewing to the next. If a webcrawl of the full site is not complete, then it might appear that they are not duplicates. A good way to avoid them is to predict whether similar domain names are likely to be duphosts.
Vaguely-structured data: Prose on a website that is marked up with HTML to affect how it is seen by the viewer. This HTML can give clues to the website. Large text followed by small text can imply that the small text is further details about the large text. Pages with an image in the upper left are often personal pages. Pages with more meta mistakes are likely to be of lower quality.
These readings gave me some interesting insight into how search engines work. I know that none of them are specific, because the actual algorithms are tightly guarded secrets. But they do give a clue as to why we get the results that we do, and just how hard the programmers work to fight off the spammers and such.
Problems for Web Searchers:
1. Infrastructure: must have the computers and hardware to meet the number of demands in a given period of time.
2. Crawling algorithms: Bots that go around the internet and index it. Crawlers start with a list or queue of good 'seed URLs'- sites that have lots of links to to other good websites. They then add all the unseen links to the queue, and save the content for indexing. The keep doing this until they hit the end of the queue.
Speed: One crawler can't do the whole internet! Need multiple crawlers who are assigned to different URLs to work in parallel (hashing function). Each crawler machine has internal parallelism as well, with multiple threads working at once.
Politeness: Don't harass the website's servers!
Excluded content: must look at robot.txt to determine what content should not be crawled.
Duplicate content: Avoid it.
Continuous crawling: Have a priority queue so important URL's are checked more frequently than low value/static URL's.
Spam: Prevent it! Blacklists, etc.
Web Search Engines: Part 2
Indexing algorithms: Scans document for indexable terms. These are then ranked in terms of position and repetition to give importance.
Real indexers:
Scaling up: divide the load among many machines, and fill up memory space with partial inverted files, and then combine the partials.
Term look up: So many phrases, so little time. Engines use trees, hierarchies and 2 level structures to make things more efficient.
Compression: Save space, compress data structures. This also makes searches faster.
Phrases: produce lists of common phrases.
Anchor text: words used to describe a link. A strongly repeated anchor text gives a good clue as to what the website is about.
Link popularity: the more people link to you, the better you are. This is proof that life really is a popularity contest, no matter what your mother told you.
Query-independent score: high scores in other ways improve ranking, even if it doesn't match the query as well.
Query processing algorithms: simple query processor looks up each word in its dictionary and locates postings list. It scans the postings list for documents in common.
Make them faster! Skip unnecessary parts of the list, end the results list early, number the documents based on their decreasing query-independent scores. Another option: cache! Precompute and store HTML results pages for popular searches. Spit this out upon request.
Henzinger:
The first part of this article focused on the same issues related to web search engines as the previous two articles.
Where it differed was in the third section:
Content Quality: How do you deal with wrong or misleading information? This is a topic that has occupied librarians' attention for a while. We produce guides and tutorials and lists on how to filter out the 'junk' on the internet. You forget that search engines try to help out with that. It's not a question of tricking the search engine into giving results that are not appropriate to the query, but one of whether or not the information provided is correct, even if it answers the query. Page rank and hits are a good measure, but not perfect. Anchor text might be useful, but junky websites can still have quality links. The most plausible is text based analysis.
Quality Evaluation: Measure the number of clicks a given result gets, and then the number of click throughs from that website.
Web Conventions: These habits of websites must be adhered to for the search engine to be able to use them correctly.
Anchor Text: Text in the links describes the link.
Appropriate and interesting links for the website's audience, related to the website's content.
Meta tags: like metadata in a library catalog, meta tags in a webpage can describe the site's content.
Duplicate hosts: multiple domain names resolve at the same end site for increased visibility. This is why typing in "pubmed.gov" sends you to http://www.ncbi.nlm.nih.gov/sites/entrez/. This is called a mirror. Search engines run the risk of providing results for each of those names, even though they have identical content. However, it could be hard to tell that if the ads on the pages are slightly different from one viewing to the next. If a webcrawl of the full site is not complete, then it might appear that they are not duplicates. A good way to avoid them is to predict whether similar domain names are likely to be duphosts.
Vaguely-structured data: Prose on a website that is marked up with HTML to affect how it is seen by the viewer. This HTML can give clues to the website. Large text followed by small text can imply that the small text is further details about the large text. Pages with an image in the upper left are often personal pages. Pages with more meta mistakes are likely to be of lower quality.
These readings gave me some interesting insight into how search engines work. I know that none of them are specific, because the actual algorithms are tightly guarded secrets. But they do give a clue as to why we get the results that we do, and just how hard the programmers work to fight off the spammers and such.
Friday, October 3, 2008
Week 6: Preservation in Digital Libraries
Research Challenges in Digital Libraries
We must research digital libraries in order to get a grasp on where we can take them. They are too widespread and heterogenous to really understand anything that's going on at the moment. We also need to figure out how to preserve the digital libraries as they are now for future study.
Big Issues:
1. We must figure out how to deal with all the digital libraries and preserve them while using humans as infrequently as possible.
2. We must protect the digital archives now. They require a lot of effort to maintain, so we must find a way to do that while, again, using humans as infrequently as possible.
3. We need to look at economic and business models of digital libraries to see how we can maintain these things in ways beyond technology. How can we afford to keep them up?
4. In order to expand the usefulness of digital libraries, new technologies need to be created. This needs to happen in order to make DL's cheaper while using humans as infrequently as possible.
5. We need shared and scalable infrastructure to support digital libraries. Sequestering them within institutions prevents interoperability and scalability, which hinders the usefulness of digital libraries.
Open Archival Information System Reference Model: An Introductory Guide
Open: reference model was developed in an open public forum: anyone could participate.
Archival Information System: people and institutions who agree to preserve info and make it available.
An OAIS must:
1. Get the appropriate information.
2. Make sure they have long term control of the information.
3. Know their user community.
4. Have appropriate metadata for the user to understand the info.
5. Make sure information is totally preserved.
6. Make it available to the user.
Tasks of OAIS:
-Ingestion (of data)
-Preservation Planning
-Data Management
-Archival Storage
-Administration
-Access (of data to user)
Types of information packages:
-Submission Information Package
-Archival Information Package
-Disseminated Information Package
This model provides a formula for digital library producers to follow. By doing so, they could produce an efficient, effective digital library. The paper does not provide any guidance on the technology or infrastructure to make this happen, but it does provide the guideposts of what sorts of things the technology and infrastructure must do.
Preservation Management of Digitized Materials
- The authors state that guidance is needed for digital preservation. It seems to be a recurring theme.
We must research digital libraries in order to get a grasp on where we can take them. They are too widespread and heterogenous to really understand anything that's going on at the moment. We also need to figure out how to preserve the digital libraries as they are now for future study.
Big Issues:
1. We must figure out how to deal with all the digital libraries and preserve them while using humans as infrequently as possible.
2. We must protect the digital archives now. They require a lot of effort to maintain, so we must find a way to do that while, again, using humans as infrequently as possible.
3. We need to look at economic and business models of digital libraries to see how we can maintain these things in ways beyond technology. How can we afford to keep them up?
4. In order to expand the usefulness of digital libraries, new technologies need to be created. This needs to happen in order to make DL's cheaper while using humans as infrequently as possible.
5. We need shared and scalable infrastructure to support digital libraries. Sequestering them within institutions prevents interoperability and scalability, which hinders the usefulness of digital libraries.
Open Archival Information System Reference Model: An Introductory Guide
Open: reference model was developed in an open public forum: anyone could participate.
Archival Information System: people and institutions who agree to preserve info and make it available.
An OAIS must:
1. Get the appropriate information.
2. Make sure they have long term control of the information.
3. Know their user community.
4. Have appropriate metadata for the user to understand the info.
5. Make sure information is totally preserved.
6. Make it available to the user.
Tasks of OAIS:
-Ingestion (of data)
-Preservation Planning
-Data Management
-Archival Storage
-Administration
-Access (of data to user)
Types of information packages:
-Submission Information Package
-Archival Information Package
-Disseminated Information Package
This model provides a formula for digital library producers to follow. By doing so, they could produce an efficient, effective digital library. The paper does not provide any guidance on the technology or infrastructure to make this happen, but it does provide the guideposts of what sorts of things the technology and infrastructure must do.
Preservation Management of Digitized Materials
- The authors state that guidance is needed for digital preservation. It seems to be a recurring theme.
This book is to extensive to takes notes in much detail. However, it is an extremely interesting, useful guide for a novice in digital libraries to get a handle on the field. It introduces the reader to the vocabulary, provides reasons on why this information is vital, and explains how digital libraries are made, who uses them, what the rules and requirements are, and provides models for institutions to follow as they delve into this realm. Since this is a very new world, and many librarians are long out of library school, having this sort of resource, perhaps with additional instruction, they can get up to speed. Staying abreast of technological developments is important, and digital libraries are a huge part of that. 
The National Digital Newspaper Project is an effort to "Chronicle America" by digitally preserving printed newspapers. It "also has a digital repository component that houses the digitized newspapers, supporting access and facilitating long-term preservation. Taking on access and preservation in a single system was both a deliberate decision and a deviation from past practices at LC." They wrote this paper to discuss the work done so far. Specifically, they discuss the preservation threats encountered by the project in 2 years.
Types of failures:
Media- Failure in the portable hard drives transporting the digital images from the awardees to LC. Fixed using 'fixity checks' as part of the transfer process and keeping a copy at the awardees until it was verified that LC had received it.
Hardware- Internal hard drives failed. They avoided data loss by using multiple HD arrays in a RAID 5 array with a hot spare. This prevented data loss in case one failed. Data was only lost when a second event occurred in the array while the system was rebuilding the harddrive using the hot spare.
Software- Three software problems occurred. The first involved a validation problem: records were put into the NDNP repository that had passed validation but 'did not conform to the appropriate NDNP profile'. This was fixed with new validation rules. The second was more problematic. During transformation, the newspaper title record had stripped the original METS record of the XML, and also, was producing invalid METS records. This broke the application, and also made parts of the data unreadable. The third problem occurred when the XFS file system was corrupted. This caused data loss. In a large, complex system such as this, it is harder to prevent problems, and to diagnose them when they occur. This is a serious flaw of huge digital libraries.
Operator- One error occurred when a series of files were deleted accidentally. Another occurred when the operator accidentally ingested the same batches multiple times, or perhaps did not purge a successful ingest before re-ingesting it. Many duplicates were produced.
The conclusions of the paper are that in a huge task such as this, errors are going to occur in many different ways, no matter what one does to protect against them. This makes performing a large digitization project extremely daunting, since one of the tasks is to make sure that the files are not only accessible but also permanently preserved.
 This is Katie's favorite person in the world. His name is Kevin. Yes, all 3 of us have K names. It was not planned: Katie came prenamed, and we didn't have any choice in our names.
This is Katie's favorite person in the world. His name is Kevin. Yes, all 3 of us have K names. It was not planned: Katie came prenamed, and we didn't have any choice in our names.
Labels:
Digital library,
preservation,
week 6,
weekly response
Muddiest Point 4
In XML, it seems like there are multiple ways of structuring things to get the same result. Are these rules hard and fast, or fairly soft?
Sunday, September 28, 2008
Flickr Assignment
To see some cool pictures of Batman comicbook covers, check out this URL:
http://www.flickr.com/photos/30893186@N05/
http://www.flickr.com/photos/30893186@N05/
Friday, September 26, 2008
Muddiest Point 3
Week 5: XML Galore!
"Introducing the Extensible Markup Language"
XML is extensible: it can be altered and added to indefinitely to tweak the language to suit the needs of the user. This makes it a robust language to use for digital libraries. As things change, XML can accommodate the changes without requiring a total overhaul of the system. Libraries like things that work that way, because it doesn't require them to reinvent the wheel. It is also useful for metadata, because the tags can be used for labeling different types of metadata.
"A Survey of XML Standards" is a good reference source for the different versions of XML because it provides other resources to look at for further instruction. The sheer number of versions illustrates the extensibility of XML.
"Extending your Markup" is an interesting and short overview of how XML works. Again, it is a good resource for a novice to look at to get started in this new world.
Major definitions:
DTD: document type definitions. This tags a given field as including a given type of information, such as author. They define the structure of the XML document.
DTD elements:
Nonterminal: they have a series of other choices or sequeneces. A DTD defining a book has sequences following it such as author.
Terminal: They do not have choices. They may include things like PC data, or are empty, or labeled as 'any'.
DTD attributes: do not prescribe order on the DTD, but include further information
Namespaces: to prevent conflict between two fields that use the same tag but in different contexts (email address vs. postal address) namespaces define the two as distinct. Do not play well with DTDs.
Linking: Goes beyond HTML to describe different types of linking
Xlink: describes how 2 documents can be linked
Xpointer: links 2 parts of the same document.
XPath: (used by Xpointer) describes the linking path
XSLT: Extensible Style Sheet Language Transformer: goes from XSL to HTML.
XML Schema: Overcome the limitations of DTDs (expression limited and non XML syntax)
Document definition markup language (DDML): define datatypes
Document content description (DCD)
Schema for object-oriented XML (SOX)
XML-Data (replaced by DCD)
"Introduction to XML schema"
Schema replace DTDs! They do the same things like define the element, define child elements, define the order of the elements, and other similar things. However, they are more extensible, richer and powerful, they support data types and namespaces and they are still XML. Essentially, they perform the same function as DTD's only better.
XML is extensible: it can be altered and added to indefinitely to tweak the language to suit the needs of the user. This makes it a robust language to use for digital libraries. As things change, XML can accommodate the changes without requiring a total overhaul of the system. Libraries like things that work that way, because it doesn't require them to reinvent the wheel. It is also useful for metadata, because the tags can be used for labeling different types of metadata.
"A Survey of XML Standards" is a good reference source for the different versions of XML because it provides other resources to look at for further instruction. The sheer number of versions illustrates the extensibility of XML.
"Extending your Markup" is an interesting and short overview of how XML works. Again, it is a good resource for a novice to look at to get started in this new world.
Major definitions:
DTD: document type definitions. This tags a given field as including a given type of information, such as author. They define the structure of the XML document.
DTD elements:
Nonterminal: they have a series of other choices or sequeneces. A DTD defining a book has sequences following it such as author.
Terminal: They do not have choices. They may include things like PC data, or are empty, or labeled as 'any'.
DTD attributes: do not prescribe order on the DTD, but include further information
Namespaces: to prevent conflict between two fields that use the same tag but in different contexts (email address vs. postal address) namespaces define the two as distinct. Do not play well with DTDs.
Linking: Goes beyond HTML to describe different types of linking
Xlink: describes how 2 documents can be linked
Xpointer: links 2 parts of the same document.
XPath: (used by Xpointer) describes the linking path
XSLT: Extensible Style Sheet Language Transformer: goes from XSL to HTML.
XML Schema: Overcome the limitations of DTDs (expression limited and non XML syntax)
Document definition markup language (DDML): define datatypes
Document content description (DCD)
Schema for object-oriented XML (SOX)
XML-Data (replaced by DCD)
"Introduction to XML schema"
Schema replace DTDs! They do the same things like define the element, define child elements, define the order of the elements, and other similar things. However, they are more extensible, richer and powerful, they support data types and namespaces and they are still XML. Essentially, they perform the same function as DTD's only better.
Friday, September 19, 2008
Muddiest Point week 3
Who assigns a DOI? Is it the creator of the digital object, or an outside organization?
Week 4: META DATA GALORE!
Witten:
Bibliographic systems:
1. Finding: locate item with known info.
2. Collocation: finding other things related to this item, such as other books the author has written.
3. Choice: A list of other available options arranged graphically (other editions) or topically (similar subjects).
Bibliographic entities
1. Documents: analog or digital form
2. Works: inhabitants of bibliographic universe: can have different forms, mediums and editions
3. Editions: multiple publications, revisions. Electronic form is usually a version, release or revision not an edition
4. Authors: Can have different names, numbers of authors, versions of name, can be a group or entity like the LOC. The LOC provides controlled vocabulary and standard names to clear up any problems.
5. Titles: straight forward attribution of the work
6. Subject: key-phrase extraction or key-phrase assignment. LOC uses a controlled vocabulary (LCSH) to standardize subject assignment.
7. Subject classification: organizing books on the shelf by subject. LC call number system does this automatically, as does Dewey. This allows the user to physically browse the shelves and gain access to the full content to choose materials.
Bibliographic Metadata
1. MARC: Machine Readable Catalog: using numerical tags, organizes info
2. Dublin Core: same concept, but simplified without all the numerical tags.
3. BibTex: prefered by scientific and technical authors who use a lot of mathematical structures.
4. Refer: basis of EndNote
Metadata for images, etc
1. Tagged Image File Format: TIFF. Used for images. Tags describe elements of the image, such as size, colors, etc.
2. MPEG-7: multimedia content description interface. Tags describe the data in the file.
Extracting Metadata
1. Reading the document helps one understand it.
2. Markup languages give clues as to the content without reading the full document: XML, etc.
3. Extracting information: generic entity extraction can pull information out using clues in the text
4. Bibliographic references: provide information in the form of citations. A citation index, such as a 'works cited' page organizes these.
Setting the Stage
This article covers the basics of different types of metadata systems already covered. However, what it does cover is how metadata, the structure of metadata, and the organization of metadata are important to extend to museums and archives, especially as those institutions move to digital resources. The use of metadata is second nature to libraries because they've been doing it for generations now. The analog metadata can easily be transcribed into digital systems when the items are digitized. However, archives and museums have resisted using metadata and instead use 'finding guides' to locate their items. This precludes amateur users from independently finding items, and it precludes digitization. This is a situation that needs to be rectified in order for these institutions to move into the digital age.
Border Crossings
This article looks back on the past 10 years of the efforts of the D-Lib DCMI management team. It talks about how necessary it is to create a universal and international system of metadata management. As information and metadata become more digitized and accessible over the Internet, the more important it is for the systems to be able to speak to each other. An overarching goal of libraries has always been for them to be able to share information with each other and make materials as accessible to patrons every where as possible. The Internet provides the infrastructure to make that happen, but in order to work, all the different systems must be able to communicate. This article focuses on the metadata aspect of that. I found especially applicable the comparison to the rail changes between Mongolia and China. Two given libraries ought not have the animosity of centuries that those nations do, so they certainly shouldn't have the level of complexity of communication that they do.
Puppy picture!
 A tired puppy is a good puppy.
A tired puppy is a good puppy.
Bibliographic systems:
1. Finding: locate item with known info.
2. Collocation: finding other things related to this item, such as other books the author has written.
3. Choice: A list of other available options arranged graphically (other editions) or topically (similar subjects).
Bibliographic entities
1. Documents: analog or digital form
2. Works: inhabitants of bibliographic universe: can have different forms, mediums and editions
3. Editions: multiple publications, revisions. Electronic form is usually a version, release or revision not an edition
4. Authors: Can have different names, numbers of authors, versions of name, can be a group or entity like the LOC. The LOC provides controlled vocabulary and standard names to clear up any problems.
5. Titles: straight forward attribution of the work
6. Subject: key-phrase extraction or key-phrase assignment. LOC uses a controlled vocabulary (LCSH) to standardize subject assignment.
7. Subject classification: organizing books on the shelf by subject. LC call number system does this automatically, as does Dewey. This allows the user to physically browse the shelves and gain access to the full content to choose materials.
Bibliographic Metadata
1. MARC: Machine Readable Catalog: using numerical tags, organizes info
2. Dublin Core: same concept, but simplified without all the numerical tags.
3. BibTex: prefered by scientific and technical authors who use a lot of mathematical structures.
4. Refer: basis of EndNote
Metadata for images, etc
1. Tagged Image File Format: TIFF. Used for images. Tags describe elements of the image, such as size, colors, etc.
2. MPEG-7: multimedia content description interface. Tags describe the data in the file.
Extracting Metadata
1. Reading the document helps one understand it.
2. Markup languages give clues as to the content without reading the full document: XML, etc.
3. Extracting information: generic entity extraction can pull information out using clues in the text
4. Bibliographic references: provide information in the form of citations. A citation index, such as a 'works cited' page organizes these.
Setting the Stage
This article covers the basics of different types of metadata systems already covered. However, what it does cover is how metadata, the structure of metadata, and the organization of metadata are important to extend to museums and archives, especially as those institutions move to digital resources. The use of metadata is second nature to libraries because they've been doing it for generations now. The analog metadata can easily be transcribed into digital systems when the items are digitized. However, archives and museums have resisted using metadata and instead use 'finding guides' to locate their items. This precludes amateur users from independently finding items, and it precludes digitization. This is a situation that needs to be rectified in order for these institutions to move into the digital age.
Border Crossings
This article looks back on the past 10 years of the efforts of the D-Lib DCMI management team. It talks about how necessary it is to create a universal and international system of metadata management. As information and metadata become more digitized and accessible over the Internet, the more important it is for the systems to be able to speak to each other. An overarching goal of libraries has always been for them to be able to share information with each other and make materials as accessible to patrons every where as possible. The Internet provides the infrastructure to make that happen, but in order to work, all the different systems must be able to communicate. This article focuses on the metadata aspect of that. I found especially applicable the comparison to the rail changes between Mongolia and China. Two given libraries ought not have the animosity of centuries that those nations do, so they certainly shouldn't have the level of complexity of communication that they do.
Puppy picture!
 A tired puppy is a good puppy.
A tired puppy is a good puppy.
Friday, September 12, 2008
Week 3 Readings
Lesk Ch. 2
Computer typesetting:
1. Printers
2. Word processing
a. exact appearance of the text
b. content of the text
Text Formats
1. ASCII standard: 7-bit code for 26 Latin letters
2. Unicode is gaining popularity: covers all characters for all major languages in 16-bit-per character
3. Higher level descriptive systems: characters are marked for meaning
a. MARC: Machine-Readable Cataloging
b. SGML: Standard generalized Markup Language
c. HTML: Hypertext Markup Language
Document Conversion: analog to digital forms
1. Keying in: expensive
2. Scanning: less expensive
a. Optical character recognition: improving reliability
3. Converted documents can then be made online: digital libraries!
Arms Ch. 3
1. Structure: elements of the document: font, characters, paragraphs, etc
2. Appearance: How the elements are arranged on the page
3. Page-description languages: describe appearance on the page. TeX, PostScript, PDF
4. Encoding characters: ASCII, Unicode, transliteration, SGML, HTML (simplified SGML), XML (bridge between SGML and HTML)
5. style sheets (formatting on screen/printed page)
a. Cascading style sheets (CSS): used with HTML
b. Extensible style language (XSL): used with XML
6. Page description languages: layout
a. TeX: focus on mathematics
b. PostScript: graphical output for printing, with support for fonts
c. Portable document format (PDF): from PostScript. Similar attributes to reading paper, but on the screen. Can limit unlawful printing. Adobe provides excellent, free PDF readers, making the format widely accepted.
Identifiers and Their Role In Networked Information Applications
1. ISBN, ISSN, OCLC, RILN: make locating a given object easy.
2. New identifiers are emerging the electronic world: URLs and URNs
a. URLs: not long lasting locators, very ephemeral.
b. URN: naming authority identifier and object identifier
c. OCLC persistant URL (PURL): maintained for a much longer time than regular URLs- less likely to produce dead links.
d. Serial Item and Contribution identifier (SICI): using ISSN, can identify individual journal or article.
e. Book Item and Contribution Identifier (BICI): can identify individual volumes or chapters within a work.
f. Digital object identifier (DOI): based on the URN idea. Can allow copyright limitations to control who has what kind of access
Digital Object Identifier
1. DOI is the digital identifier of an object, not the identifier of a digital object. It is a persistent identifier.
2. It includes: Syntax (name), resolution of the name to the object, metadata describing the object, and social networking of the object through interoperability
3. DOI does not preserve the object: it merely finds a way of sharing information about the object.
These 4 readings are all centered around communicating meaning about a given object or text. The characters on the page don't mean anything to a computer, so it is necessary to tag them and use appropriate languages so that you can convey that meaning to the computer. When you do that, the computer can organize it in the way you want.
Affixing meaning also applies to identifiers. Without a good identifier, a given object will be very difficult to find. Providing an identifier like a DOI not only helps the user to access the object, but it also provides other information about the object that is translatable across a variety of mediums. This means that the record will be persistent.
All of this applies to digital libraries. What is the point of having a digital library if you can't find what you are looking for? Or if you may have found what you're looking for, but you're not quite sure if it is without looking at the entire object? Providing information about a given object is absolutely vital in any library, including digital libraries.
And, here is an entirely gratuitous puppy picture, for those who are interested.
 We took her camping in Fayette county a few weeks ago. There was a lake there and she swam and swam and swam. She's a water dog, you might say.
We took her camping in Fayette county a few weeks ago. There was a lake there and she swam and swam and swam. She's a water dog, you might say.
Look at those little paws paddling! awwww.
Computer typesetting:
1. Printers
2. Word processing
a. exact appearance of the text
b. content of the text
Text Formats
1. ASCII standard: 7-bit code for 26 Latin letters
2. Unicode is gaining popularity: covers all characters for all major languages in 16-bit-per character
3. Higher level descriptive systems: characters are marked for meaning
a. MARC: Machine-Readable Cataloging
b. SGML: Standard generalized Markup Language
c. HTML: Hypertext Markup Language
Document Conversion: analog to digital forms
1. Keying in: expensive
2. Scanning: less expensive
a. Optical character recognition: improving reliability
3. Converted documents can then be made online: digital libraries!
Arms Ch. 3
1. Structure: elements of the document: font, characters, paragraphs, etc
2. Appearance: How the elements are arranged on the page
3. Page-description languages: describe appearance on the page. TeX, PostScript, PDF
4. Encoding characters: ASCII, Unicode, transliteration, SGML, HTML (simplified SGML), XML (bridge between SGML and HTML)
5. style sheets (formatting on screen/printed page)
a. Cascading style sheets (CSS): used with HTML
b. Extensible style language (XSL): used with XML
6. Page description languages: layout
a. TeX: focus on mathematics
b. PostScript: graphical output for printing, with support for fonts
c. Portable document format (PDF): from PostScript. Similar attributes to reading paper, but on the screen. Can limit unlawful printing. Adobe provides excellent, free PDF readers, making the format widely accepted.
Identifiers and Their Role In Networked Information Applications
1. ISBN, ISSN, OCLC, RILN: make locating a given object easy.
2. New identifiers are emerging the electronic world: URLs and URNs
a. URLs: not long lasting locators, very ephemeral.
b. URN: naming authority identifier and object identifier
c. OCLC persistant URL (PURL): maintained for a much longer time than regular URLs- less likely to produce dead links.
d. Serial Item and Contribution identifier (SICI): using ISSN, can identify individual journal or article.
e. Book Item and Contribution Identifier (BICI): can identify individual volumes or chapters within a work.
f. Digital object identifier (DOI): based on the URN idea. Can allow copyright limitations to control who has what kind of access
Digital Object Identifier
1. DOI is the digital identifier of an object, not the identifier of a digital object. It is a persistent identifier.
2. It includes: Syntax (name), resolution of the name to the object, metadata describing the object, and social networking of the object through interoperability
3. DOI does not preserve the object: it merely finds a way of sharing information about the object.
These 4 readings are all centered around communicating meaning about a given object or text. The characters on the page don't mean anything to a computer, so it is necessary to tag them and use appropriate languages so that you can convey that meaning to the computer. When you do that, the computer can organize it in the way you want.
Affixing meaning also applies to identifiers. Without a good identifier, a given object will be very difficult to find. Providing an identifier like a DOI not only helps the user to access the object, but it also provides other information about the object that is translatable across a variety of mediums. This means that the record will be persistent.
All of this applies to digital libraries. What is the point of having a digital library if you can't find what you are looking for? Or if you may have found what you're looking for, but you're not quite sure if it is without looking at the entire object? Providing information about a given object is absolutely vital in any library, including digital libraries.
And, here is an entirely gratuitous puppy picture, for those who are interested.
 We took her camping in Fayette county a few weeks ago. There was a lake there and she swam and swam and swam. She's a water dog, you might say.
We took her camping in Fayette county a few weeks ago. There was a lake there and she swam and swam and swam. She's a water dog, you might say.Look at those little paws paddling! awwww.
Friday, September 5, 2008
Muddiest Point 1
This is a muddiest point about muddiest points. Do we have to post a muddiest point about the lecture, or can it also be the readings? We didn't have a lecture this week, so obviously this muddiest point is not about the lecture. Speaking of, are the readings for this week supposed to go with next week's lecture? Did I post my response to the readings too early? Does it not really matter?
Week 2 Response
First of all, the over arching theme of these readings is interoperability. A large emphasis is on interchangeable parts: different tools that can be exchanged and used as needed by multiple types of digital libraries. This makes sense, and is a concept that has been around for a long time. Car manufacturers save time, money and effort by building their engines and cars with a lot of parts that can be used in as many of their products as possible. By making sure that every car in their 2008 fleet uses widget A to complete task 1, they can make a whole lot of widget A's all at once and put them in every car. If some cars used widget A, others used widget B and the rest used widget C to complete task 1, they would have to make widget A's, B's and C's, and each of them would require a different factory or machine to produce. That raises the cost of completing task 1. It's what one might call 'reinventing the wheel'.
With this in mind, it is completely logical to take this concept into the digital library environment. Why reinvent the wheel? Obviously, different digital libraries are going to have different requirements, so they can pick and choose their given widgets cafeteria-style. This lowers the cost of developing the digital library. Hence, this is why the the Suleman article discusses producing software toolkits for producing digital libraries.
Furthermore, it allows for different digital libraries to talk to each other if there is a common language. This is a concept that is not new to libraries. Much of the technology that they produced before the digital age was focused on sharing information between libraries. Union catalogs filled this purpose by letting people know what various libraries had available. Bibliographies helped libraries know what's new in their particular field. In the digital universe, libraries being able to share what they have and have the collections communicate is a logical extension of this philosophy. The Payette article gives definitive protocols and evidence of their success for the interoperability of digital library systems.
Now. Is the Internet a digital library? It is a collection of data and information, in a digital format, that is stored on various servers and can be searched and accessed. By that definition, it is a digital library. However, the Internet is not maintained by a given body or individual. It is full of wrong information and a lot of the good information is hard to find. Much of it has restricted access. Amazon.com has servers storing a lot of personal data, but users can't access it using Google.
One might say that the Internet is a 'bad' digital library. It has many characteristics that the authors of these articles are specifically trying to avoid, and problems that they are trying to overcome in digital libraries. It seems unfair to declare something as a non-entity just because it is not a good example of it. It is akin to saying that your daughter is not your child anymore because she misbehaved.
However, learning how to overcome these problems and develop robust digital library systems could revolutionize the Internet. Perhaps one day the recalcitrant child will grow up to be a fine, upstanding citizen!
With this in mind, it is completely logical to take this concept into the digital library environment. Why reinvent the wheel? Obviously, different digital libraries are going to have different requirements, so they can pick and choose their given widgets cafeteria-style. This lowers the cost of developing the digital library. Hence, this is why the the Suleman article discusses producing software toolkits for producing digital libraries.
Furthermore, it allows for different digital libraries to talk to each other if there is a common language. This is a concept that is not new to libraries. Much of the technology that they produced before the digital age was focused on sharing information between libraries. Union catalogs filled this purpose by letting people know what various libraries had available. Bibliographies helped libraries know what's new in their particular field. In the digital universe, libraries being able to share what they have and have the collections communicate is a logical extension of this philosophy. The Payette article gives definitive protocols and evidence of their success for the interoperability of digital library systems.
Now. Is the Internet a digital library? It is a collection of data and information, in a digital format, that is stored on various servers and can be searched and accessed. By that definition, it is a digital library. However, the Internet is not maintained by a given body or individual. It is full of wrong information and a lot of the good information is hard to find. Much of it has restricted access. Amazon.com has servers storing a lot of personal data, but users can't access it using Google.
One might say that the Internet is a 'bad' digital library. It has many characteristics that the authors of these articles are specifically trying to avoid, and problems that they are trying to overcome in digital libraries. It seems unfair to declare something as a non-entity just because it is not a good example of it. It is akin to saying that your daughter is not your child anymore because she misbehaved.
However, learning how to overcome these problems and develop robust digital library systems could revolutionize the Internet. Perhaps one day the recalcitrant child will grow up to be a fine, upstanding citizen!
Labels:
Digital library,
Internet,
interoperability,
week 2,
weekly response
Friday, August 29, 2008
Week 1 Responses
While this is my first foray into thinking and learning about digital libraries, I have used them before. Pitt has one that I have looked at, and my local library (Mt Lebanon Public Library) has a digital library that they produced in connection with the local historical society to place historical photographs online.
I appreciated the definitions and preliminary explanations of digital libraries. I had not thought that the phrase 'digital library' really is a nebulous phrase. It can mean anything in any situation! What a given organization defines as a digital library might be entirely different from another digital library.
The other major idea that intrigued me was mentioned at the end of the Paepcke article, when the author discusses how computer scientists and librarians are disillusioned and disappointed with how the technology has not met their expectations in the past 10 years. They expected it to change the world, and it really hasn't done that.
Logically, however, we should be aware that things really aren't going to change that much when a new technology is introduced. The human race is not exactly known for embracing change quickly, and generally abides by the theory, "if it ain't broke, don't fix it". The library system has been this way for 5,000 years with only minor changes. Librarians and library users are not going to embrace a drastic revolution in the way libraries are structured. The old system must integrate the new technology into is extant architecture, not the other way around.
Gradual change is for the better anyways. When Vatican II changed the entire structure of the Catholic mass in 1962, the Catholic population (the users!) reacted poorly. Many people left the church because it wasn't what they were used to, and those who didn't were still disgruntled. To say the least, it didn't go well. We don't want to anger our patrons by changing everything around on them with no warning, introduction, or trial period. Then we will surely be out of jobs.
I appreciated the definitions and preliminary explanations of digital libraries. I had not thought that the phrase 'digital library' really is a nebulous phrase. It can mean anything in any situation! What a given organization defines as a digital library might be entirely different from another digital library.
The other major idea that intrigued me was mentioned at the end of the Paepcke article, when the author discusses how computer scientists and librarians are disillusioned and disappointed with how the technology has not met their expectations in the past 10 years. They expected it to change the world, and it really hasn't done that.
Logically, however, we should be aware that things really aren't going to change that much when a new technology is introduced. The human race is not exactly known for embracing change quickly, and generally abides by the theory, "if it ain't broke, don't fix it". The library system has been this way for 5,000 years with only minor changes. Librarians and library users are not going to embrace a drastic revolution in the way libraries are structured. The old system must integrate the new technology into is extant architecture, not the other way around.
Gradual change is for the better anyways. When Vatican II changed the entire structure of the Catholic mass in 1962, the Catholic population (the users!) reacted poorly. Many people left the church because it wasn't what they were used to, and those who didn't were still disgruntled. To say the least, it didn't go well. We don't want to anger our patrons by changing everything around on them with no warning, introduction, or trial period. Then we will surely be out of jobs.
Tuesday, August 26, 2008
First Post

Hi! My name is Katrina Kurtz. This is my blog for Digital Libraries. I am posting this because I am compulsive and feel obligated to have something here until I do the first readings and can post about them.
In the meantime, here's my puppy! Her name is Katie. She's a pain in the butt, but she's cute so she makes up for it. This morning, I had the pleasure of chasing her around to retrieve the following items:
1. a shoe
2. A bag of sunflower seeds
3. My freshly stamped and sealed gas bill payment
4. A bottle cap.
5. A pen.
6. A plastic bag.
After that, she fell asleep next to me, on her back, snoring. I couldn't be mad at her anymore.
Subscribe to:
Comments (Atom)
 
