Processing

Digitizing and processing the source

In a first step, the Basel University Library digitised all 116 annual volumes of the Avisblatt from 1729 to 1844, using three different sets of bound volumes, and provided them with some basic metadata. The project team contributed further metadata, especially which page in the digital copy belongs to which issue.

The structuring and initial processing of the digitised material, which was stored on an iiif server, was carried out in Freizo, a working environment provided by Data Futures GmbH. Each issue was subdivided into the masthead, section headings and the individual advertisements or announcements made by the editor himself. This was done by automatic pre-segmentation with manual post-processing in Transkribus, where text recognition was also carried out with the help of two text recognition models trained by the project team.

The pageXML exported from Transkribus was further processed with the programming language R. In the project we developed our own R-package, which combines data, processing and query functions and offers correspondingly experienced users further access and analysis options.

An essential part of the preparation in R was

  • assigning of the respective category (as indicated by the section header) to each advertisement
  • automated classification/keywording of the adverts
  • recognizing the language of the advertisement (German vs French)
  • distinguishing between original advertisements and their reprint in subsequent issues ("posting" vs "reprint") - advertisements were repeated by default (usually once), unless the advertiser reported the advertisement to the Avisblatt publisher as finished after the first publication.

Language, section header and keywords can be used in the Database search on this website. We strongly recommend that you familiarise yourself with the specifics beforehand (see Important information and Help regarding search terms).

To read more about the workflows and the decision-making involved, please consult the technical documentation.