Alberti Magni e-corpus : Searching

Searching the Alberti Magni e-corpus

Albert the Great
A 1986 bronze sculpture by Heinrich Appel

Access the search engine right away

In order to use the search engine efficiently and to understand the true nature of what it can and cannot do, it might be wise to read the following explanation at least once.

About the searchable corpus

As of March 2018, 60 of Albert’s works are included in the Alberti Magni e-corpus searchable database. Note that some of them have been edited in the Editio Coloniensis, which offers more reliable versions of the texts than previous editions like Borgnet’s. The first criterion used in determining what works to digitize and incorporate into the Alberti Magni e-corpus is that a given text has not yet been edited by the Albertus-Magnus-Institut and is therefore not yet part of the Aschendorff Verlag electronic corpus. The presence within the Alberti Magni e-corpus of works which the Institut has already published results from the fact that some of them were digitized for this website before the Institut published them (e.g., Liber divisionum), or that an unexpected opportunity presented itself to obtain a given text at low cost (e.g., De generatione et corruptione). For a table comparing the databases of the Alberti Magni e-corpus and of Aschendorff Verlag in relation to the list of Albert’s extant works, read the section History, nature, and goals of the Alberti Magni e-corpus project on this website.

The searchable electronic versions of the works to which this website gives access differ from their printed versions in several respects:

1) Albert frequently uses Greek words, which he and the medieval manuscript tradition more or less distort and which he always writes in Roman letters, but which Borgnet usually decided to turn into more "legitimate" Greek words, often modifying their spelling and even grammatical form, and using the Greek alphabet. In order to simplify and facilitate electronic searches, those words in Greek characters have been romanized. (The system of transliteration used is that of the Perseus project.) Those romanized Greek words are displayed in purple in the passages returned by the search engine. (A similar remark can be made about Albert's limited use of Hebrew letters and words in a small number of commentaries on Scripture, which have also been romanized and which appear in pink.)

2) Also in order to simplify the digitization and electronic searches, the ligatures æ and œ have been replaced by the character pairs ae and oe, respectively, and italics and boldfaced script have been eliminated. The only exceptions are the commentaries on Scripture and some of the very short works (e.g., sermon and letters), where italics (as well as boldfaced script, in the case of Super Iob) had to be kept for legibility's sake.

3) In order to facilitate the work of the search engine and its identification of the location of a given match, the format of the titles and labels of text units has been made more systematic. For instance, Caput unicus has been replaced with Caput I; TRACTATUS SECUNDUS with TRACTATUS II; punctuation has been uniformly applied; etc.

4) Albert’s works contain a small number of illustrations, which in their digitized version have been replaced by the mention [ILLUSTRATION]. One can use the printed versions or the image files of these (available on this site) in order to view a given illustration itself.

5) All printed editions contain errors, whether they be typographical or of another kind, and the same is true of the texts that were digitized for this website. Since those errors can in many cases constitute very serious obstacles to an electronic search, the most obvious of them that were identified during the digitization and revision process have been corrected. In the search results produced by the engine, faulty forms are in brown and corrected ones, in green. Since producing new editions, strictly speaking, has never been part of the Alberti Magni e-corpus project, corrections extend only to errors that are very easy to identify, such as: 1) mere typos; 2) the editor forgetting their own spelling conventions; 3) the editor (i.e., Borgnet) misreading the very edition they seek to reproduce (i.e., Jammy). One can also consult the whole list of those corrections, which are almost always confirmed by the Jammy edition of Albert’s works, here.

6) Turning editions from the 19th and the beginning of the 20th centuries into machine-readable texts is a difficult task which itself introduces errors. Each text of the database was revised three times, which allowed for the elimination of almost all such errors. Given the size of the corpus and the speed with which the different revisions had to be accomplished, however, mistakes introduced by the digitization process do remain. (The user of the search engine should keep in mind that the present website was built with very limited financial resources.) The most common mistakes that remain are the following: missing words; missing punctuation marks (especially semicolons); missing a, o, or e in what used to be a ligature (most common case: -ae which now wrongly reads as -a); wrong paragraph divisions; missing or misplaced italics (in the commentaries on Scripture); and erroneous numerals, be they Arabic or Roman. There are probably more remaining errors in the commentaries on Scripture than in the other works, due to their more challenging format. Users who identify any such errors would do a great service to the project by informing the project supervisor, Bruno Tremblay.

The main goal of the transformation of image files into machine-readable texts, at least as far as this website is concerned, is to allow electronic searches. One would be well advised, before using the passages identified by the search engine in scholarly publications, to double-check with the hard copy or the image file of the best available edition.

Note, finally, that whenever the electronic version of a work also contains the text that Albert is commenting on (e.g., Pierre Lombard's Sentences), said text is visible on the screen but is NOT searchable.

Features of the search engine

The search engine is a modified version of the one which was originally developed by the University of Waterloo for the electronic Oxford English Dictionary (OED) and which was later adapted in order to suit the needs of the MARGOT project. A few minutes will normally suffice for the average user to get accustomed to its features and to take advantage of its capabilities.

Word(s) or phrase(s) to be searched
The main features of the engine, which allows for boolean searching, are the following :

a sequence of alphabetic characters matches any word or part of a word in which those characters appear in sequence
an AND operator, which is represented by the plus sign: +
an OR operator, which more precisely means AND/OR and which is represented by the vertical bar: |
a NOT operator, which is represented by the minus sign: -
an EXACT WORD MATCH operator, which cannot be part of a phrase query and which is represented by the equal sign: =
the use of QUOTATION MARKS in order to look for a phrase instead of a word or to include one or more punctuation marks: " "
a WILD CARD operator, which can only be used within quotation marks (as above, between two words or strings of characters) and which is represented by an asterisk: *
the use of BRACKETS in order to establish the order in which to apply the Boolean operators within a given search: ( )
by default, spaces are irrelevant unless they are enclosed within quotation marks (i.e., the spaces then become part of what is searched).

NOTE 1: The editorial insertion of page numbers (among other things) within or between words will at times prevent the search engine from retrieving all matches to a phrase query. (Simple word queries are not affected by that situation.) One way to somewhat alleviate the problem is the use of the partial wild card, which within a phrase query potentially replaces anything (i.e., one or many letters or words, punctuation, page number, correction, identification of Greek, nothing, etc.). Thus, "Aristoteles dicit" will not retrieve a passage that contains Aristoteles /234a/ dicit, but "Aristoteles*dicit" will. (The imperfection of the solution is of course that the second query will also retrieve many other passages one is not interested in.)
NOTE 2: Passages that cross sentence boundaries (whether it be a period, an exclamation mark, a question mark, or a paragraph mark) cannot be matched as phrases. Thus, it is perfectly acceptable to search for "hoc est, in angulis directe oppositas, non semper esse", but instead of searching for "Homo est. Homo non est.", one should search for "Homo est."+"Homo non est" within a paragraph.
NOTE 3: Passages that contain both regular roman script and italics (or regular roman script and boldfaced characters, or italics and boldfaced characters) cannot be matched as phrases. Thus, a search for "in Prologo galeato de Daniele" will not match "in Prologo galeato de Daniele", in Super Danielem, praefatio, p.448, but searching for "Prologo galeato"+Daniele will. This serious limitation applies only to commentaries on Scripture and a few short works (e.g., sermons and letters), given that italics and boldfaced characters were replaced with regular roman script in the digitization of other works.
NOTE 4: The search engine is in general relatively fast. Given the way the system is presently implemented, however, a query which includes exact word matching (i.e., =) is processed more slowly, usually taking between 30 and 60 seconds.

Here are a few examples of searches that can be performed, together with the results they will yield:

=scientia: will retrieve all passages that include the exact and complete word scientia;
scientia : will retrieve all passages that include the string of letters scientia, whether it be as a full word (scientia) or as part of a word (e.g., praescientia, scientias, scientiarum, etc.);
" scientia " : will retrieve all passages that include the exact word scientia when surrounded by spaces;
" scientia" : will all retrieve passages that include a space followed by the exact word scientia or by a word that begins with that string of letters (e.g., scientias, scientiarum, etc.);
"scientia " : will retrieve all passages that include the word scientia, followed by a space, or a word that ends with that string of letters (e.g., praescientia), followed by a space;
"scientia*logica": will retrieve all passages that include the string of letters scientia and the string of letters logica, in that precise order (in other words, the engine will search for the following phrase: scientia[+ anything that may or may not come in between]logica);
scientia+=logica : will retrieve all passages that include both the string of letters scientia and the exact word logica;
scientia+" logica" : will retrieve all passages that include the string of letters scientia and a space followed by the exact word logica (or by a word that begins with that string of letters);
scientia-logica : will retrieve all passages that include the string of letters scientia but that exclude the string of letters logica;
" scientia"|logica : will retrieve all passages that include a space followed by one word that begins with (or is the same as) scientia, and/or the string of letters logica;
"scientias logicas"-Avicenna : will retrieve all passages that include the phrase scientias logicas and that exclude the string of letters Avicenna;
(scientia+=logica)-(rhetorica|dialectica) : will retrieve all passages 1) that include the string of letters scientia and the exact word logica, and 2) that exclude the string of letters rhetorica and/or the string of letters dialectica.

Regions to search
This allows the user to determine the textual unit in which the search is performed (and therefore also the context in which each match is to be initially displayed on the results page). This choice is especially important if more than one word or phrase is to be taken into account in the search. The options are:

1) paragraphs and titles: matches will be searched within each paragraph and each title;
2) paragraphs: matches will be searched within each paragraph, excluding titles (note that a paragraph is defined as any group of words coming after a carriage return [“enter” or “return” key] and ending with a carriage return, which is a definition that will be problematic in a very small number of cases, for instance when dealing with a poem, whose individual verses are separated with paragraph marks);
3) sentences: matches will be searched within each sentence, including titles (sentences are taken to be any sequence of words within paragraphs or titles and are separated by periods, question marks or exclamation marks; this occasionally will create inaccurate results since a period, a question mark, or an exclamation mark can sometimes be found within sentences);
4) titles: matches will be searched within each title;
5) sections: matches will be searched within each section (a section is defined as a labeled textual unit [e.g., tractatus, caput, particula, questio, subparticula, etc.] which is not itself subdivided into a smaller such unit, or, in other words, as the region indicated by a complete location, ignoring page numbers);
6) pages: matches will be searched within each page, including titles found on that page;
7) columns: matches will be searched within each column (note that not all texts are divided into columns), including titles when part of a column.

Stylesheet for display
Three stylesheets are available. The first two are likely to be of interest to most users when searching and the third may be useful when browsing:

Edited: shows the text with the corrections that were made to the simplest and most obvious errors contained in the printed edition (corrected forms are in green);
Unedited: shows the text without said corrections (faulty forms are in brown);
Show tags: displays the XML tagging added to the text so it can be searched by the engine.

Note that the file that is searched contains in effect both the faulty forms and the corrected forms, and that therefore both errors from the printed edition and corrections that were made for this project can be retrieved by the search engine. Choosing the Edited or the Unedited stylesheet determines only what will be seen on the screen when looking at the search results, not the version of the text to be searched.

Start new search, matching spelling (i.e., find exact matches)
This is the default search button, which is activated when pushing the carriage return on one's keyboard. It allows the user to search for words that match the spelling used in the search box.

Start new search with normalized spelling (j/i, ae/e, etc.)
The presence within the corpus of different editions (and even of discrepancies within a given individual edition) means the presence of a few words that are spelled differently from one work to another or within one work. (Think for example of aer/aër, philosophia/phylosophia, caelestium/coelestium, etc.) This button allows for searches that: 1) ignore diacritics, h, apostrophes, and repeated letters; 2) uses the following equivalence table: ae/e, ci/ti, d/t, j/i, k/c, m/n, oe/e, ph/f, v/u, and y/i. Users should use this button with some prudence, if only because it will at times yield more results than they may want to work with. Also, because of the way the search engine is written, this function might match unexpected passages when searching for words or strings of letters whose last letter might be the start of a pair that could be normalized. (Concretely, that means requested words or strings of letters ending with -a, -c, -o, and -p.) In a few rare cases users will obtain better results by asking, without normalized spelling, for that string of letters and/or that same string of letters ending with the relevant pair (e.g., philosophia|philosophiae,condic|condici, etc.).

Corpus
By default, a search is always performed in all listed works. Users can instead limit their search by manually selecting the individual work(s) they are interested in. Someone using the Alberti Magni e-corpus and the Aschendorff Verlag corpus in conjunction may also want to exclude/include the works that are also searchable with the latter by clicking the “Exclude/Include fully available texts in the Editio Coloniensis” button.

Once a search has been done, a first report is displayed at the top of the window. It enumerates the number of matches found in each work and also the total number of matches found in the whole selected corpus. (1) Clicking on the title of an individual work (in blue) takes the user to a new window, which displays each individual match found in that given work, whereas clicking on the number of matches found in all selected texts (in blue) will allow the user to see all results grouped together. Clicking then on the location of an individual match (in blue) will cause the full page on which that match is found to be displayed, together with browsing buttons to move forward to the next page or back to the previous one. (2) Alternatively, clicking instead on the button-like title of the work which is located in the lower half of the page, under "Corpus", and whose matches one would like to examine, will take the user to an outline view of said work, showing in what parts of the work matches are to be found, symbolized by red stars. Clicking then on any star will take the user to the full page that contains the match. (As above, that page includes browsing buttons to move forward to the next page or back to the previous one.)

When searches yield a very high number of matches, the following two functions will at times prove useful:

Number of matches per text to display
The user can decide how many matches will be displayed on a page of results. The default number is set at 10.

Continuing from which match?
This allows the user to determine where the list of displayed results will start: the first that was found, the fifth, the sixteenth, etc. By default, the first result that was found will also be the first to be displayed.

Go back to the main page