Choosing a Publishing Model

From MPublishing

(Difference between revisions)
Jump to: navigation, search
(Page images scanned from paper)
Current revision (13:31, 18 September 2013) (edit) (undo)
 
(32 intermediate revisions not shown.)
Line 1: Line 1:
= Choosing a Publishing Model =
= Choosing a Publishing Model =
-
The Scholarly Publishing Office recognizes that there are many formats in which a publishing partner may have material for SPO to publish. For both material to be republished by SPO and for material to be published for the first time, there may be paper copies, electronic files in various formats, or both.
+
Selecting the best publishing model for your materials can be a challenging process and will probably require some fine-tuning beyond the scope of this document. That being said, this is meant to help you gain a sense of what the Michigan Publishing can do with the materials you are able to supply and a sense of what—within that range of options—will be most suitable to you as a finished product.
-
We can discuss conversion of paper copies with you; the best practices for this fall outside the scope of this document.
+
Note that the models described below apply to how articles are delivered, not to whether articles are published in issues or one at a time.
-
This document explains how to choose a SPO publishing model based on what types of electronic source documents are available. The choice of source format depends on the ease of conversion to SPO's preservation-quality digital formats and the completeness of a given format, and this in turn determines which publishing model we choose for your publication.
+
== Page images versus digital text ==
-
== Page images scanned from paper ==
+
Michigan Publishing can publish your content as either page images or digital text.  In brief:
-
SPO uses the page image model for publications with a large print backfile, the content of which is not available in electronic form. Users can "turn" pages in a sequence or jump directly to a given page number.
+
{| border="1"
 +
! page images
 +
! digital text
 +
|-
 +
| flip through individual pages
 +
| scroll through the whole text or a section of it
 +
|-
 +
| fine control of typography and typesetting
 +
| regularized styling of content across your entire publication
 +
|-
 +
| limited to display of items on a page
 +
| ability to introduce hyperlinks, clickable footnote references, and high-resolution images
 +
|-
 +
| browsing a text is cumbersome without downloading PDF
 +
| easy to scroll through a text to browse it
 +
|-
 +
| full-text search varies in quality
 +
| full-text search highly accurate
 +
|}
 +
 
 +
Examples and more details on each model are given below.
 +
 
 +
== What do you have, and what do you need? ==
 +
 
 +
To help facilitate your choice of a publishing model, review all of the formats in which your materials currently exist.  Try to be as specific as possible about media (i.e., paper or electronic files, specific electronic file formats, etc.) and editing stage represented in each format (e.g., do your PDFs, QuarkXpress or InDesign, and/or Word documents contain all final revisions?):
 +
 
 +
{|
 +
|Volumes, Issues, etc.
 +
|Available format(s) that match final proofs
 +
|-
 +
|_____________________
 +
|___________________________________________
 +
|-
 +
|_____________________
 +
|___________________________________________
 +
|-
 +
|_____________________
 +
|___________________________________________
 +
|-
 +
|_____________________
 +
|___________________________________________
 +
|-
 +
|_____________________
 +
|___________________________________________
 +
|}
 +
 
 +
 
 +
 
 +
Review whether you need Michigan Publishing to put your project online as a first-time publication or whether it exists already and you need Michigan Publishing to serve as an online co-publisher. (Examples of the latter could include journals that already exist in print but which seek a parallel online edition or materials published elsewhere online for which Michigan Publishing will serve as a stable archive.)
 +
 
 +
= The models =
 +
 
 +
The following “models” are intended to offer you a glimpse of some approaches Michigan Publishing takes to the more common publishing scenarios that present themselves. Links to Michigan Publishing publications employing each approach are provided for reference purposes.
 +
 
 +
== Models A & B: Page images scanned from paper ==
 +
 
 +
Michigan Publishing uses the page image model for journals with a large print backfile, the content of which is not available in electronic form. Users can "turn" pages in a sequence or jump directly to a given page number. Page images can also be used as [http://hdl.handle.net/2027/spo.5597602.0001.001 a quick way to put a monograph online].  For monographs, models A and B are equivalent.
We process these images with OCR software to allow users to search the full text of a document. This software can operate on most languages written in the Latin script; however, it works best on unilingual texts. The word recognition rate will be much lower for texts in multiple languages, but the predominant language of the text will have a higher level of accuracy than the other languages.
We process these images with OCR software to allow users to search the full text of a document. This software can operate on most languages written in the Latin script; however, it works best on unilingual texts. The word recognition rate will be much lower for texts in multiple languages, but the predominant language of the text will have a higher level of accuracy than the other languages.
=== Model A: Journal issue as unit ===
=== Model A: Journal issue as unit ===
-
ex: http://quod.lib.umich.edu/b/basp/
+
* [http://quod.lib.umich.edu/b/basp/ journal example]
 +
 
 +
''Best for:  publications with a large print backfile and when pagination of articles is not consistent.''
-
When scanning a journal from paper, it's most efficient to scan an entire issue without attempting to determine article boundaries at the time of scanning. We can still provide links to individual articles, but the user will be able to turn pages from one article to another. It's best to use this model when there is a large print backfile and when pagination of articles is not consistent. Because this model prevents SPO from implementing article-level access restrictions, we generally choose Model B instead when scanning from paper.
+
When scanning a journal from paper, it's most efficient to scan an entire issue without attempting to determine article boundaries at the time of scanning. We can still provide links to individual articles, and the user will be able to turn pages across articles. Because this model prevents Michigan Publishing from implementing article-level access restrictions, we generally choose Model B instead when scanning from paper.
=== Model B: Journal article as unit ===
=== Model B: Journal article as unit ===
-
ex: http://quod.lib.umich.edu/m/mjcsl/ (before volume 8)
+
* [http://quod.lib.umich.edu/m/mjcsl/ journal example] (before volume 8)
-
It can be more worthwhile to scan articles separately so that these will become separate units in the delivery system. This is best for a publication where new documents sent to SPO will also be split at the article level or where article-level access restrictions in the delivery system are critical.
+
''Best for: publication where new documents sent to Michigan Publishing will also be split at the article level or where article-level access restrictions in the delivery system are critical.''
-
=== Model C: True electronic text ===
+
It can be more worthwhile to scan articles separately so that these will become separate units in the delivery system, giving greater flexibility in repackaging journal articles or restricting access by article.
-
ex: http://quod.lib.umich.edu/w/wsfh/
+
-
SPO generally prefers to publish true electronic text since it allows for hyperlinks, multimedia, and accurate searching of the full text based on the structure of the text. In addition, true electronic text allows the documents to be disseminated in various ways not tied to the print page. If the publishing partner provides PDF files, SPO can put these online as an alternative format for readers.
+
== Model C: Digital Text ==
 +
* [http://quod.lib.umich.edu/m/mfr/ journal example]
 +
* [http://hdl.handle.net/2027/spo.5621225.0001.001 monograph example]
-
For this model, SPO generally needs electronic source documents. For small volumes of text, or when supplementary funding is available, we can create electronic text from print sources, using OCR software and then verifying words discovered by a spellchecker and correcting as needed.
+
''Best for: instances where electronic source documents are available.''
-
=== Model D: Page images from PDF files ===
+
Michigan Publishing generally prefers to publish digital text since it allows for hyperlinks, multimedia, and the capacity for rich full-text searching limited to portions of a work.  (See, for example, the [http://quod.lib.umich.edu/cgi/t/text/text-idx?c=machyn;cc=machyn;page=simple three "search in" options] in <cite>A London Provisioner's Chronicle</cite>.) In addition, digital text allows the documents to be disseminated in various ways not tied to the print page. If the [[Creating downloadable PDFs|publishing partner provides PDF files]], Michigan Publishing can put these online as an alternative format for readers.
-
ex: http://quod.lib.umich.edu/m/mjcsl/ (volume 8 to present)
+
-
For some publications, we display page images but also have electronic text underneath that allows for more accurate searching. We do this when:
+
== Model D: Page images from PDF files ==
 +
* [http://quod.lib.umich.edu/m/mjcsl/ journal example] (volume 8 to present)
-
* There are many diagrams and figures that would be difficult to render in electronic text.
+
''Best for: cases where a publication includes many diagrams and figures that would be difficult to render in digital text (Model C) and/or cases where a publishing partner values precise page layout that cannot be consistently replicated with digital text.''
-
* You value precise page layout that can't be consistently replicated online.
+
-
For this model, we need PDF files in which the text can be highlighted when you open the PDF. Text in more than one column can present problems for extraction of text, so have us test a sample file if this applies to you. Note that our current software only allows extraction of text written in the Latin script, so non-Latin text will not be searchable by users.
+
For some publications, we display page images but also have electronic text underneath that allows for more accurate searching. For this model, we need PDF files in which the text can be highlighted when you open the PDF.  
-
Unfortunately, extracting text from PDF files leads to a number of problems that decrease the accuracy of searching:
+
A note of caution: Using page images in this way comes with the following limitations:
-
* Words hyphenated across line breaks can't be automatically reconstructed into whole words.
+
* Text in more than one column can present problems for extraction of text.
-
* Other words at line-breaks are often not followed by a space character, causing them to run together with the word on the next line after extraction.
+
* Our current software only allows extraction of text written in the Latin script, so non-Latin text will not be searchable by users.
-
* You're unable to search for phrases spanning pages, columns, and sometimes even lines.
+
* Extracting text from PDF files leads to a number of problems that decrease the accuracy of searching:
 +
*#Words hyphenated across line breaks can't be automatically reconstructed into whole words.
 +
*#Other words at line-breaks are often not followed by a space character, causing them to run together with the word on the next line after extraction.
 +
*#One cannot search for phrases spanning pages, columns, or sometimes even lines.  
-
We only use this model for publications where the journal article is the unit.
+
When using this model for journals, we always treat the journal article, not the issue, as the unit of digitization for greater flexibility, as described in Model B.
-
=== Combination of models ===
+
== Combination of models ==
-
We often use one model for backfiles and another for new documents sent to SPO. Possible combinations are:
+
We often use one model for backfiles and another for new documents sent to Michigan Publishing. Possible combinations include:  
 +
* Models A & C: [http://quod.lib.umich.edu/m/mdiag/ journal example]
 +
* Models B & C: [http://quod.lib.umich.edu/f/fs/ journal example] (access restricted to subscribing institutions) – This publication currently uses only Model C, but Model B content (for back issues) is forthcoming.
 +
* Models B & D: [http://quod.lib.umich.edu/m/mjcsl/ journal example]
-
* Models A & C: ex: http://quod.lib.umich.edu/m/mdiag/
 
-
* Models B & C: ex: http://quod.lib.umich.edu/f/fs/ (access restricted to subscribing institutions) -- Currently uses only Model C, but Model B content (for back issues) is forthcoming.
 
-
* Models B & D: ex: http://quod.lib.umich.edu/m/mjcsl/
 
-
For more information on these models, see the Scholarly Publishing Office whitepaper [http://www.lib.umich.edu/spo/whitepapers/20070529.pdf "Choice of DocEncodingType and encoding level for SPO publications."]
+
<analytics uacct="UA-20101573-2" ></analytics>

Current revision

Contents

[edit] Choosing a Publishing Model

Selecting the best publishing model for your materials can be a challenging process and will probably require some fine-tuning beyond the scope of this document. That being said, this is meant to help you gain a sense of what the Michigan Publishing can do with the materials you are able to supply and a sense of what—within that range of options—will be most suitable to you as a finished product.

Note that the models described below apply to how articles are delivered, not to whether articles are published in issues or one at a time.

[edit] Page images versus digital text

Michigan Publishing can publish your content as either page images or digital text. In brief:

page images digital text
flip through individual pages scroll through the whole text or a section of it
fine control of typography and typesetting regularized styling of content across your entire publication
limited to display of items on a page ability to introduce hyperlinks, clickable footnote references, and high-resolution images
browsing a text is cumbersome without downloading PDF easy to scroll through a text to browse it
full-text search varies in quality full-text search highly accurate

Examples and more details on each model are given below.

[edit] What do you have, and what do you need?

To help facilitate your choice of a publishing model, review all of the formats in which your materials currently exist. Try to be as specific as possible about media (i.e., paper or electronic files, specific electronic file formats, etc.) and editing stage represented in each format (e.g., do your PDFs, QuarkXpress or InDesign, and/or Word documents contain all final revisions?):

Volumes, Issues, etc. Available format(s) that match final proofs
_____________________ ___________________________________________
_____________________ ___________________________________________
_____________________ ___________________________________________
_____________________ ___________________________________________
_____________________ ___________________________________________


Review whether you need Michigan Publishing to put your project online as a first-time publication or whether it exists already and you need Michigan Publishing to serve as an online co-publisher. (Examples of the latter could include journals that already exist in print but which seek a parallel online edition or materials published elsewhere online for which Michigan Publishing will serve as a stable archive.)

[edit] The models

The following “models” are intended to offer you a glimpse of some approaches Michigan Publishing takes to the more common publishing scenarios that present themselves. Links to Michigan Publishing publications employing each approach are provided for reference purposes.

[edit] Models A & B: Page images scanned from paper

Michigan Publishing uses the page image model for journals with a large print backfile, the content of which is not available in electronic form. Users can "turn" pages in a sequence or jump directly to a given page number. Page images can also be used as a quick way to put a monograph online. For monographs, models A and B are equivalent.

We process these images with OCR software to allow users to search the full text of a document. This software can operate on most languages written in the Latin script; however, it works best on unilingual texts. The word recognition rate will be much lower for texts in multiple languages, but the predominant language of the text will have a higher level of accuracy than the other languages.

[edit] Model A: Journal issue as unit

Best for: publications with a large print backfile and when pagination of articles is not consistent.

When scanning a journal from paper, it's most efficient to scan an entire issue without attempting to determine article boundaries at the time of scanning. We can still provide links to individual articles, and the user will be able to turn pages across articles. Because this model prevents Michigan Publishing from implementing article-level access restrictions, we generally choose Model B instead when scanning from paper.

[edit] Model B: Journal article as unit

Best for: publication where new documents sent to Michigan Publishing will also be split at the article level or where article-level access restrictions in the delivery system are critical.

It can be more worthwhile to scan articles separately so that these will become separate units in the delivery system, giving greater flexibility in repackaging journal articles or restricting access by article.

[edit] Model C: Digital Text

Best for: instances where electronic source documents are available.

Michigan Publishing generally prefers to publish digital text since it allows for hyperlinks, multimedia, and the capacity for rich full-text searching limited to portions of a work. (See, for example, the three "search in" options in A London Provisioner's Chronicle.) In addition, digital text allows the documents to be disseminated in various ways not tied to the print page. If the publishing partner provides PDF files, Michigan Publishing can put these online as an alternative format for readers.

[edit] Model D: Page images from PDF files

Best for: cases where a publication includes many diagrams and figures that would be difficult to render in digital text (Model C) and/or cases where a publishing partner values precise page layout that cannot be consistently replicated with digital text.

For some publications, we display page images but also have electronic text underneath that allows for more accurate searching. For this model, we need PDF files in which the text can be highlighted when you open the PDF.

A note of caution: Using page images in this way comes with the following limitations:

  • Text in more than one column can present problems for extraction of text.
  • Our current software only allows extraction of text written in the Latin script, so non-Latin text will not be searchable by users.
  • Extracting text from PDF files leads to a number of problems that decrease the accuracy of searching:
    1. Words hyphenated across line breaks can't be automatically reconstructed into whole words.
    2. Other words at line-breaks are often not followed by a space character, causing them to run together with the word on the next line after extraction.
    3. One cannot search for phrases spanning pages, columns, or sometimes even lines.

When using this model for journals, we always treat the journal article, not the issue, as the unit of digitization for greater flexibility, as described in Model B.

[edit] Combination of models

We often use one model for backfiles and another for new documents sent to Michigan Publishing. Possible combinations include:

  • Models A & C: journal example
  • Models B & C: journal example (access restricted to subscribing institutions) – This publication currently uses only Model C, but Model B content (for back issues) is forthcoming.
  • Models B & D: journal example


Personal tools