Structural elements

<pb n="1" facs="../images/ew_a1_342_001.jpg"/>

Page break. The value of the attribute n should be the page number. If there is no explicit pagination in the document, you can impose it here, numbering sequentially from 1. The value of the attribute facs should be filename of the corresponding image. (You assigned this previously — see above).

If you’re working with a document that has page numbers, please record them in your transcription as follows, but number your page breaks and images beginning with 1 (or 001), regardless of those numbers. In the following example, we see an example of what this would look like in a document that has two pages labeled “53” in the original:

 <pb n="53" facs="../images/ew_d5_1687_053.jpg"/>
 <pb n="54" facs="../images/ew_a1_342_054.jpg"/>
 <pb n="55" facs="../images/ew_a1_342_055.jpg"/>

Note that we use <choice> to remove the original page numbers from the edition view, and we continue with the sequential numbering of the n attribute and the file names, ignoring the error in pagination in the original document.



encloses a paragraph. I strongly recommend putting these elements on their own lines, as follows:

   This is the text of a paragraph.

If a paragraph is indented in the original, you can indicate that as follows:

<p rend="indent">

(The rend attribute is used here and elsewhere to describe how things appear in the original, not how they should be rendered in any eventual output of the XML document. See


***Need to add here instructions for using <hi rend=…> for larger/smaller font sizes.***



marks the place where a line break occurs in the transcription. When a word is divided across the line, we use attribute break to indicate this:

<lb break="no"/>. 

For instance:

   This document has several<lb/> 
   lines of text. Most lines end<lb/> 
   neatly at the end of a word,<lb/> but this line quite stubborn<lb break="no"/> 
   ly does not.<lb/> 

If stubbornly were hyphenated in the original, you would not need to transcribe here the hyphen.



encloses a section or division of some sort within a document. You likely will only need this if your document contains internal headings of some sort (see next element below). A <div> may be nested inside another <div> when appropriate. For instance, this might be useful in a document containing sections (with headings) and below those, subsections (with subheadings).



encloses a heading. Every heading must go inside its own <div>, except perhaps if you have only one heading at the top of a document (<body> seems to function like a <div> in this case).



marks the beginning of a column on a page that has more than one column. See the example on this page:


<figure><graphic url="ew_w2_648_002_001.jpg"/></figure>

can be used to mark the place that a graphic of some sort (image, drawing, etc.) appears on a page. The url attribute points to the image itself (you’ll need to crop your selection out of the larger archival scan provided by Special Collections. Please note that you need to indicate the source of your image as follows:

<figure><desc>Taken from University of North Florida Special Collections, Eartha M.M. White Collection, Folder W2, Item 648.</desc><graphic url="../images/ew_w2_648_002_001.jpg"></graphic></figure>

For the naming of such cropped files, see Naming Files. More info: 



can be used to represent information that is presented in tabular format. This is the basic structure:


(This would be for a table with two rows and two columns. For more examples, see


<list rend="simple">

combined with <head> and <item>, can be used to display a list. rend=”simple” should (theoretically) cause the list to be displayed without any numbers or bullets. Here is an example from Chad’s ew_q4_1164.xml:

<list rend="simple">
 <head rend="center">Officers</head>
 <item><name type="person"> Mrs. J.W. Ward </name>,<name role="administrator"> President </name></item>
 <item><name type="person"> Mrs. Phillis Witsell </name>,<name role="administrator"> First Vice President </name></item>
 <item><name type="person"> Mrs. A.J. Williams </name>,<name role="administrator"> Second Vice President </name></item>
 <item><name type="person"> Mrs. T.G. Freeland </name>,<name role="administrator"> Third Vice President </name></item>
 <item><name type="person"> Mrs. E.L. James </name>,<name role="administrator"> Financial Secretary </name></item>
 <item><name type="person"> Mrs. G.N. Griffin </name>,<name role="administrator"> Corresponding Secretary </name></item>
 <item><name type="person"> Miss Eartha M.M. White </name>,<name role="administrator"> City Organizer </name></item>

For more info, see


Deletions, Insertions and Changes of Hand

<del type="strikeout"></del>

encloses any material that has been crossed out.

<del type="overwritten"></del> 

encloses any material that has been written over.


<add type="caret"></add>

encloses any material that has been added above the line, or between lines, with a caret or other mark indicating its point of insertion in the original:

<add type="no_caret"></add>

encloses any material that’s been added, with the point of insertion not indicated (meaning you’ve have to make an educated guess about where it goes).



can be used to transcribe any material you find hand-written into the margin of a document, using the place attribute to indicate which margin.

<note place="marginLeft"></note>
<note place="marginRight"></note>
<note place="marginTop"></note>
<note place=""marginBottom"></note>

If the note is in the left or right margin, you should locate this element as close as possible to the place to which it corresponds in the top-to-bottom flow of the text.

<note type="authorial" place="marginTop"></note> 

can also be used to record a heading that runs across the top of the page throughout a document but is separate from the flow of the text (as might occur in a diary, perhaps).


 <handShift medium="red-ink"/>

is used to mark the place in a document where a shift in the writing takes place, such as a switch of hands (a different person begins to write), a change in the color or ink or writing medium, or some other change. The medium attribute indicates the nature of the change. Here is an example provided by Aislinn:

 She had a nephew that died here last Aug and he was a veteran of the<lb/>
 last <handShift medium="red-ink"/>WAR, <handShift medium="black-ink"/> and left her his insurance

In this example, only the word “WAR” and the comma appear in red ink, and then we have a shift back to black ink.


 <supplied reason="omitted-in-original" cert="high">met</supplied>

can be used to add a word that you believe, according to your editorial discretion, was unintentionally omitted from the text. The reason attribute is used to provide your justification for this addition. Cert indicates your degree of certainty (valid values are low, medium, and high).


Representing formatting of original document

When we encode documents with TEI-XML, we are concerned more with content than appearance. Indeed, one of the benefits of using XML is that it separates content from how that content will ultimately be presented. That styling is generally done with XSLT, the stylesheet/transformation language for XML (like CSS is the stylesheet language for HTML). However, we are indeed interested in recording the appearance, or formatting, of the original document itself. This is why, after all, we are using <pb/>, <lb/> and other such structural elements. Here are a few others that may be useful:

<hi rend="center"></hi>

encloses centered text (see

<hi rend="superscript"></hi> 

encloses raised text.


<hi rend="italic">


<hi rend="underline"></hi>


If headings or labels of any sort appear in the document in all caps, please transcribe them using title case, and use the rend attribute to indicate the all caps formatting in original, as follows:

 <head rend="case(allcaps)">Indigent Hospital Patients</head>

(This text appears in the original as “INDIGENT HOSPITAL PATIENTS.”)


 <label rend="case(allcaps)">Laws and Rules</label>

(This appears in original as “LAWS AND RULES”.)

If other material appears in all caps in the original, please use the following:

 <emph rend="case(allcaps)">




<unclear cert="low" reason="Handwritten signature is difficult to read.">J. Henderson</unclear>

can be used to encode your uncertainty about any part of your transcription. The cert attribute indicates your level of certainty (low, medium, high), and please use reason to explain the circumstances.

<!-- -->

is the format for an XML comment, which can be used anywhere to add additional documentation.


Elements for regularization


can be used to simultaneously record an abbreviation and provide its resolution, as in:


If the letters st in this example actually appeared in raised script in the original, we would document that as follows:

<choice><abbr>1<hi rend="superscript">st</hi></abbr><expan>first</expan></choice>

As a general rule, we will resolve all abbreviations.



can be used to simultaneously record and correct a misspelling (a misused homonym, a word spelled incorrectly, etc.):

We had some very nice <choice><sic>whether</sic><corr>weather</corr></choice> last week.


Lisa's dog is partly <choice><sic>Dauchshound</sic><corr>Dachshund</corr></choice>, I think.



can be used to modernize a correct, but obsolete, spelling. I am not sure we will have occasion to do so in the current project. If you encounter a case of which you’re unsure, please let me know.

We can, however, use this sequence of elements to regularize punctuation, as in the following examples:

removing comma:
removing period:
adding comma:
changing comma to period:
adding period:
replacing semicolon with comma:


Gaps in the text/material you add yourself

<gap reason="page missing from document"/> 

can be used to mark a place where there is an unrecoverable gap in the original, using reason attribute to explain the circumstances. This will often have to do with damage to the original or missing pages.

If the gap is small,

 <supplied reason="text smudged" cert="medium"></supplied> 

can be used to provide the letters or words that can’t be read. Use reason attribute to give a text explanation for the circumstances, and the cert attribute to indicate your degree of certainty about the solution you’ve provided (the acceptable values are high, medium, and low).



<!-- text here -->

can be used to insert comments into your XML file. This is metadata that will not display in the output of your file. You can use comments of this sort to record doubts or questions you might want to follow up on later.



Special characters

Some characters have to be transcribed in XML with character references. These include:

° (the degree sign, as in 98° Fahrenheit) must be represented in your text as follows:


& (the ampersand, meaning “and”) must be represented as follows:




Semantic elements

<date when="YYYY-MM-DD"></date> 

encloses a date, however it is articulated. You might see a standard situation like the following:

She was born on <date when="1970-04-01">April 1, 1970</date>.

If you had only year, this would be the format:

<date when="1970">1970</date>

Only month and year would be:

<date when="1980-02">February 1980</date>

Only month and day won’t validate, so if that’s all you have (April 1) and you don’t know the year, you will be unable to tag it, I believe.

Sometimes references to dates aren’t explicit, but we can still tag them, as in:

Earlier <date when="1970">that same year</date>, her parents had moved to Florida.


<name type="person"></name>

encloses the proper name of a person, as in:

<name type="person">Nikolai Vitti</name>

This can also be used to mark common nouns or phrases that refer to a specific, identifiable person, as in:

The <name type="person">current superintendent</name> of the local public school system...

where current superintendent refers, for instance, to Nikolai Vitti.



<name type="person_group"></name>

encloses a proper noun indicating the name of a group of people that has a particular name, such as those of a given nationality or some other category. We would typically write this type of word with an initial capital, but not always. Here are some examples:

the <name type="person_group">Seminole</name> and <name type="person_group">Creek</name>


the <name type="person_group">British</name> and <name type="person_group">French</name>





<name type="place"></name> 

encloses the proper names of places of any type. This includes buildings, streets, cities, states (and other political divisions), as well as geographical features like rivers, lakes, etc. for example:

<name type="place">Jacksonville</name>

This can also be used to mark common nouns or phrases that refer to specific, identifiable places, as in:

the level of toxicity in the <name type="place">river</name> has increased...

where river refers, for instance, to the St. John’s.

The subtype attribute can be used to provide a more specific category for a place, as in

<name type="place" subtype="river">St. John's River</name>


<name type="place" subtype="city">Jacksonville</name>

Let’s handle specific street addresses as follows:

<name type="place" subtype="address">123 Main Street</name> 

For places that can be located on a map, let’s include latitude and longitude as follows:

<name type="place">The Clara White Mission<location><geo>30.332632 -81.664020</geo></location></name>

To get this information, search for the place in Google Maps, right click on the location on the map and select “What’s here?”. A box will pop up showing lat. and long. If you aren’t able to copy the numbers from there, click on them, and they will appear in the search box at the left, from where you will be able to copy them.


A question we need to answer: How do we map an area (“Jacksonville,” “La Villa,” a plantation, etc.), as opposed to a single point in space?


<name type="company"></name>


<name type="organization"></name>

can be used to tag the names of such entities.


<name type="event"></name>

can be used to indicate the name of an event, as in:

<name type="event">The World's Fair</name>



<title level="m"></title> 

encloses the title of a monographic (“m”) work (a book, primarily).

<title level="a"></title> 

encloses the title of an “analytic” (“a”) work (a journal chapter, an article, etc.)


Openings and Closings of Letters

These elements are both structural and semantic, so I’m putting them in their own section.

Here is an example of how to mark up the opening of a letter:

 Dear <name type="person">Miss White</name> :<lb/>
 <hi rend="center">
 <choice><sic>your</sic><corr>Your</corr></choice> sister in Christ.<lb/>
 <name type="person">Sarah Best</name>.<lb/>

These elements do not go inside <p> or <head> elements.


Editorial Annotations

<note type="editorial"></note>

can be used to add a note of your own. This would go right after the word or phrase in question.


is used in your note to encode the word to which you are referring (the interface will render this in italics).

Here is an example:

<note type="editorial"><mentioned>Moosa</mentioned> was previously an alternative form in English of the Spanish <mentioned>Mosé</mentioned>... [source].</note>

You would need to cite an academically rigorous source for your information, and please use Chicago Author-Date format (see If you want to mark something you want to annotate but need to do research before you write the note, you could add a placeholder note of this sort:

<note type="editorial"><mentioned>Moosa</mentioned></note>