The Chicago Foreign Language Press Survey was published in 1942 by the Chicago Public Library Omnibus Project of the Works Progress Administration of Illinois. The purpose of the project was to translate and classify selected news articles that appeared in the foreign language press from 1855 to 1938. The project consists of 120,000 typewritten pages translated from newspapers of 22 different foreign language communities of Chicago.

Editing and Text Encoding

The Newberry version of the Chicago Foreign Language Press Survey of nearly 50,000 articles transcribed in XML files conforming to a schema adopted through the guidelines of the Text Encoding Initiative (TEI). The Newberry worked with a digitization vendor, PanGeo Partners of Chicago, which created a base XML transcription for each article from digital images. Much effort went into capturing the structure of the metadata for each article so that the information could be extracted into a database later. The transcriptions record the Internet Archive image identifiers for each sheet, and observe page breaks and page numberings. Within the body of the text only paragraphs and simple table structures have been represented.

After the initial transcription, an editing phase of the project checked the vendor's work and then looked at the articles in bulk to evaluate the work of the original WPA project. Although the 1930s editors and proofreaders took care to maintain a high degree of quality, some inconsistencies and errors inevitably made it through their review. The Newberry project transformed the vendor XML files into new expanded files, mapping key metadata fields into TEI header elements. Through a modified TEI schema suited to this phase of work, these fields could be further constrained, which made it possible to identify and correct typographical and other errors. In addition, this editing step put date values into a consistent format when possible, ensured that subject codes matched the project's list, and edited newspaper and source names to be more consistent. To the extent possible, we have made such corrections in the TEI header, while leaving errors and inconsistencies uncorrected in the body if they correctly transcribed errors in the original, rather than mistakes introduced during digitization.

Where we have been unable as yet to transcribe part of a text, the gap element in the original XML identifies a missing span, which may range from a single letter to a full page. Tabular information is represented with elements for tables, rows, and cells.