Skip to Main Content
Bowdoin College Library <Ask Us!

Text Mining Databases: Library catalog records & metadata

FYI

Library catalogs that support text mining of their catalog records. Other metadata (data describing data) that are text-minable.

Text from library catalog records

  • WorldCat   WorldCat Search API, including Terms and Conditions. "WorldCat.org lets you search the collections of libraries in your community and thousands more around the world." "Search WorldCat and retrieve bibliographic records for cataloged items such as books, videos, music and more in WorldCat. Retrieve records in MARC XML or Dublin Core formats. Get item information in the standard bibliographic citation formats (APA, Chicago, Harvard, MLA, and Turabian). Find about libraries that own an item and retrieve links to online catalog records when available."
  • MARC Open-Access, Library of Congress. "An 'open-access' provision, MDSConnect, [...] includes nearly 25 million MARC records, as distributed in the unabridged 2014 Retrospective file sets. These MDS record sets have been made available primarily for research and development usage."
  • Digital Public Library of America (DPLA) "DPLA connects people to the riches held within America’s libraries, archives, museums, and other cultural heritage institutions. All of the materials found through DPLA—photographs, books, maps, news footage, oral histories, personal letters, museum objects, artwork, government documents, and so much more—are free and immediately available in digital format." Access does not require facilitation by a librarian. Developers
  • Catalog of U.S. Government Publications (CGP)   "The finding tool for federal publications that includes descriptive information for historical and current publications as well as direct links to the full document, when available."  Access does not require facilitation by a librarian.  usgpo/cataloging-records
  • Records from Compass. Compass is the catalog of the Colby, Bates & Bowdoin Libraries. Access requires facilitation by a librarian. Contact us for more information. (We are sorry, but due to the vendor's licensing restrictions, we cannot provide access to the images of book covers in Compass.)

Other metadata

  • The Internet Archive Metadata API "A non-profit library of millions of free books, movies, software, music, websites, and more." Access does not require facilitation by a librarian.
  • data.gov (U.S. General Services Administration) Descriptions of datasets from the U.S. federal government. Access does not require facilitation by a librarian. Developers  Types of organizations that supply datasets to Data.gov APIs "The data.gov catalog is powered by CKAN, a powerful open source data platform that includes a robust API. Please be aware that data.gov and the data.gov CKAN API only contain metadata about datasets. This metadata includes URLs and descriptions of datasets, but it does not include the actual data within each dataset."
  • NextGen Catalog API, National Archives and Records Administration (NARA)