Representing the Manuscripts

Our standards for representing the original documents focus on the following features: the structure of the item; the layout and formatting of text; gaps in the text; and authorial deletions, insertions and changes of hand. Our approach here aligns strictly with the guidelines set forth by the TEI.

Structural Elements

We encode page breaks using the empty element n attribute reflecting the page number in the original document, and the value of the facs attribute being the filename of the corresponding image, as in <pb n="1" facs="../images/ew_a1_342_001.jpg"/>. We use the container <div> to divide up documents that have multiple sections, particularly when those sections begin with titles or subtitles, which we encode with <head>. We enclose each paragraph within a <p> element, and use the empty element <lb/> to mark the place where a line break occurs in the original (when the line break falls in the middle of a word, we use a value of no for em>break (em><lb break="no"/>). We use the empty element <cb/> to mark the beginning of a column on a page that has more than one column.

Many of the documents we are editing are letters, which generally have as structural elements openings and closings. We encode these using <opener> and <closer>, respectively. These do not need to appear in a <div>, but we place each in its own <div> anyway, as this seems to simplify matters when trying to manipulate them in terms of layout. Openers generally include such elements as <date>, <address> and <salute>.

Some letters include images of the corresponding envelope. As the TEI-XML P5 Guidelines do not appear to contemplate this situation, we have developed our own approach, treating the envelope as the first page in the corresponding XML document, encoding each address block (sender, receiver), and anything else, as a <p>, with <lb/> at the end of each line.

Layout/Formatting of Text

In our transcriptions, we endeavor to reproduce the formatting of the originals. However, the documents in the Eartha M.M. White Collection fall into diverse categories, with differing formal possibilities that can present themselves in each case. We need to balance, therefore, our interest in replicating the original layout and formatting with the need to have transcribing guidelines that are straightforward and not overly cumbersome. For this purpose, we restrict our efforts in this area to a limited number of factors, which include the use of underlining, italics and boldface; the indentation of paragraphs; horizontal alignment; horizontal positioning; and relative text size. For the purposes of approaching the markup of such aspects, we divide documents into two possible types of text blocks: header and body text.

We consider a header to be the intitial title of a document, or the heading of a sub-section. These are most often present in newspaper articles, monographic works, pamphlets or other such formal or structured writing. Letters generally will not contain headings. In the transcription view within our interface, we display different levels of headings in a standardized fashion: an initial heading (<body><head>) will be extra-large, a first-level subheading (<body><div><head>) will be large, a second-level subheading (<body><div><div><head>) will be medium, and a third-level subheading (<body><div><div><div><head>) will be small. Each subheading must be within its own nested <div>, which will also contain all of the body text corresponding to that subsection.

We assume all headings are centered in the original, and display them in this fashion in the transcription view by default. When they are not centered in the original, we use "left" "right," or "justify" for the style attribute on the respective <head> element(s), as in <head style="center">.

We consider body text to be any text that is not a header. In our documents, such text is generally contained in a paragraph (<p>), an <opener> or a <closer>. We assume the text in such blocks to be left-justified. To indicate otherwise, we use a value of "center", "right" or or "justify" for the style attribute on any of these elements (or on <div>, or other elements), as in <p style="center">. The first line of a paragraph is assumed to begin at the left margin. To indicate instead that the first line is indented, we use a value of "indent" for the style attribute on the p element, as in <p style="indent">.

In terms of horizontal positioning, we simplify the possibilites that a document could present. Centered and right-justified text will be begin in the corresponding horizonal positions (as determined automatically), but left-aligned text may begin only at the left margin, one-fourth of the way across the page, mid-page or three fourths across, as in <opener style="one-fourth">, <p style="mid-page">) or three-fourths of the way across (<closer style="three-fourths">). We do not, in other words, endeavor to recreate all the possible horizontal positioning of blocks of text, but rather "round" to one of these positions. When spacing of more than one blank space is used as a layout mechanism for representing separate elements on the same line, we indicate this using <space/>, which will be displayed in a standardized fashion in the interface. All body text will be assumed to be the same size, regardless of actual variations in the original. We do not reproduce vertical spacing. Any number of blank lines are reduced to one blank line in our transcriptions. As a convention, one blank line will follow every paragraph.

We assume all headings to be bold, and display them as such in the transcription view in our interface. When a heading is not bold in the original, we mark up the text of the heading with <hi>, using a value of "normal" for the <p style> attribute. When words in a heading appear in italics or underlined, we do likewise, using values of "italics" or "underlined" for the <p style> attribute. When words appear in bold, italicized or underlined in body text, we do the same. When text is underlined or italicized to provide emphasis where a word is being referred to as itself (for example, "weather and whether are homonyms"), or because they are the titles of written works, we also mark up the italicized/underlined word as <mentioned> or as <title level="m">, respectively, as discussed in the section on semantic tagging below, so that we can also handle them appropriately in our regularized version.

When any text, whether in headings or body, appears in all caps, we transcribe the text in title or sentence case, as appropriate, and indicate the original formatting in all caps as explained in the section Capital Letters below.

Gaps in the Text

We use <gap> to mark an unrecoverable gap in the original, using the value of the reason attribute to explain the circumstances (for example, <gap> reason="page missing from document"</gap> ).

Deletions, Insertions and Changes of Hand

<del> encloses any material that has been visibly struck from the original. We use the type attribute to indicate the nature of this action (type="strikeout" for struck text, type="overwritten" when something else has been written on top).

<add type="caret"> encloses any material that has been added above the line, or between lines, with a caret or other mark indicating its point of insertion in the original. <add type="no_caret"> encloses any material that has been added, with the point of insertion not indicated (meaning that an editorial decision has been made about where to locate this text).

We use <note> to transcribe any material we find hand-written into the margin of a document, using the place attribute to indicate which margin ("marginLeft", "marginRight", "marginTop", or "marginBottom"). When the note appears in the left or right margin, we locate this element as close as possible to the place which it corresponds in the top-to-bottom flow of the text. We use <note type="authorial" place="marginTop"> to record a heading that runs across the top of the page throughout a document but is separate from the flow of the text (as might occur in a diary).

We use <supplied> to add material that we believe, according to our editorial discretion, was unintentionally omitted from the text. The reason attribute is used to provide a justification (for example, "omitted in original") and the value of the cert attribute to indicate our degree of certainty (low, medium, or high).

Special Characters

We have transcribed the following with character references: & (the ampersand, meaning “and”) as &amp;, and ° (the degree sign, as in 98° Fahrenheit) as &#176;.

Representing the Manuscripts