HT Ingest Service

From MPublishing

Jump to: navigation, search

What is the service? In order to ingest digtized books (or booklike volumes) into HathiTrust, each included image must possess proper OCR, pageturner and preservation metadata. Although books digitized by Google, Internet Archive, and other vendors already have these, locally-digitized images may not. MPublishing has developed software and workflows to produce this metadata for and begin the ingest process on behalf of HathiTrust partners.


Frequently Asked Questions

1. How is pricing structured?

Fees are structured in two tiers:

1-100 books: $5/book 101+ books: $4.5/book


(The first 100 books incur a slightly higher fee to account for one-time setup tasks.)


2. Does MPublishing offer volume discounts for larger projects? No, not at this time.


3. What is the turnaround time?

Normally, books should appear in HathiTrust within 4-6 weeks of initial delivery, provided that they meet the agreed-upon image specifications.


4. Where can I see examples?

The Utah State University collection [[1]] was ingested via this service.


5. What are the steps in the process?

1. If needed, Convert the PDF to bitonal and contone TIFFs.

2. Send bitonal TIFFs to OCR.

3. Add needed preservation headers to TIFFs, convert contone TIFFs to JP2s.

4. Manually add necessary structural metadata for Pageturner, page by page.

5. Integrate OCR, Images, and Pagetag data into a package and pass to Core Services for ingest.

Personal tools