A wildcard character can be used to substitute for any other character or characters in a search query. A lot of users are familiar with the asterisk (*) in search engines, for instance. Below are the ones you can use in Annis.
Wildcard
Meaning
Search item
Search result
.
followed by 1 letter
al.
als
..
followed by 2 letters
al..
alle, alte
...
followed by 3 letters
al...
altes, alter
?
previous letter is optional
das?
da, das
*
previous letters occurs as many times or none at all
das*
da, das, dass
+
previous letters occurs as many times but at least once
da+
das, dass
.+
followed by 1 or more letters
gu.+
gut, gute, gutes, gummieren
Error-annotated Searches
Target Hypotheses (ZH1 and ZH2)
Because Falko and WHiG are error-annotated corpora, every essay contains two mirrored layers, called target hypotheses, where errors have been rectified. The first one, called ZH1 (Zielhypothese 1, target hypothesis 1) comprises error rectifications on orthography, morphology, and syntax. The second one, called ZH2 (Zielhypothese 2, target hypothesis 2) comprises error rectifications on semantics, lexis, pragmatics, stylistics. A more detailed overview of error annotation can be find in the German-language Falko Handbook on Corpus Design and Annotations (pp. 31 onwards).
Source and Target Comparison (ZHDiff1 and ZHDiff2)
When errors have been annotated, the target hypothesis is compared with the source text and the differences (ZH1Diff) or (ZH2Diff) are marked in the following way:
ZH1Diff / ZH2Diff
meaning
explanation
INS
insert
a word has been inserted
DEL
delete
a word has been deleted
CHA
change
a word has been changed and stays put
MOVS
move source
a word has been moved away
MOVT
move target
a word has moved to this location
MERGE
merge
two or more words have been merged into one
SPLIT
split
one word has been split into two or more words
This tutorial cannot anticipate all the errors Annis users would be interested in, so we merely show you a few examples.
Remember
If one token follows another token, use: #1.#2
If one token is the same as another token, use: #1_=_#2