The latest HathiTrust Update (June 2012) is now online. Here are a few highlights. You can access the complete issue here.
- HathiTrust has updated its bibliographic metadata specifications and minimum bibliographic metadata requirements in preparation for moving to Zephir (under development by California Digital Library) as the bibliographic metadata management system for HathiTrust. The requirements are in effect immediately for institutions that have not previously deposited content in HathiTrust.
- The University of Michigan made the first iteration of tools available to aid institutions in transforming, validating, and packaging digital content for deposit in HathiTrust. The tools can be downloaded at http://www.hathitrust.org/ingest_tools.
- Michigan staff completed the majority of development necessary to support a new rights status in HathiTrust Web applications. The status will apply to works that were restored to being in copyright in the United States by the General Agreement on Tariffs and Trade (GATT), but are now in the public domain in the rest of the world. An increasing number of these volumes are being identified as part of CRMS-World, the IMLS-funded continuation of the CRMS project.
- HathiTrust continued working with Boston College and began working with Penn State and the University of Illinois on ingest of volumes digitized by the Internet Archive.
- California Digital Library refined the algorithm used to score spelling suggestions based on queries extracted from HathiTrust log files and improved the way suggestions are made when stop words and words that are inappropriately combined are present in the query. The next step will be to experiment with making suggestions in different languages.
- Michigan removed a long-standing bottleneck in the full-text indexing process, effectively doubling throughput. Under ideal conditions, staff believe it should be possible now to index approximately 100,000 documents per hour.
Database Growth and Overall Size
- 50,193 volumes were added to the database during June
- Overall database size is now 10,408,905 items
- Public domain materials make up ~29% or 3,105,587 volumes
- Statistics and Visualizations
(Via LJ INFOdocket.)