UWB Crest

School of Creative Studies and Media

whig_logo  What's Hard in German?

« return to WHiG's main page

Annis - WHiG's freely accessible online corpus platform

Funders

   and ahrclogo

Contact us about this WHiG page on Annis

Astrid Ensslin and Cedric Krummes

More information about Annis

Falko and Annis - WHiG's parent project

When citing Annis

Zeldes, Amir, Julia Ritz, Anke Lüdeling & Christian Chiarcos. (2009), "ANNIS: A Search Tool for Multi-Layer Annotated Corpora". In: Proceedings of Corpus Linguistics 2009, July 20-23, Liverpool, UK. Online:
http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/mitarbeiter-innen/amir/pdf/CL2009_ANNIS_pre.pdf

Accessing Annis

Go to Annis, the online corpus platform. In the left-hand menu, select each relevant corpus. For WHiG, these are:

  • FalkoEssayL1V2_0: the essay corpus of German native speakers

  • FalkoEssayL2V2_0: the essay corpus of all learners

  • FalkoWHIG1: the WHiG essay corpus of British learners

To select more than one corpus, hold down the Control key and select the corpora.

Overview of Annis




Overview of Annis results



Searches typed in the "AnnisQL:" box

with links to YouTube clips

Please note

All search queries are case sensitive and all quotes are straight double quotes (not single quotes and not curved quotes)

Phenomenon Example Copy-Paste
1 word the word "Hand" word="Hand"
2 words "ich bin"

word="ich"&

word"bin"&

#1.#2

3 words "Zum Schluss ist"

word="Zum"&

word="Schluss"&

word="ist"&

#1.#2&

#2.#3

4 words "Meiner Meinung nach ist"

word="Meiner"&

word="Meinung"&

word="nach"&

word="ist"&

#1.#2&

#2.#3&

#3.#4

Non-neighbouring Words "je" followed later on by "desto" (1 to 6 words in between)

word="je"&

word="desto"&

#1.1,6#2

Prefix or Word-initial words starting with "Kauf–" word=/Kauf.+/
Infix or Word-medial words that have the stem "–trag–" word=/.+trag.+/
Suffix or Word-final words ending in "–ung" word=/.+ung/
Part-of-Speech (POS) using the Stuttgart-Tübingen Tagset (STTS): download their guidelines and the word forms for closed word classes an article (includes both definite and indefinite) pos="ART"
Lemma (the canonical form of a word: singular for nouns, infinitive for verbs, ending-free for adjectives) all the forms of "klein" lemma="klein"
Word and POS Combination the verb "meinen" (as opposed to the possessive pronoun) lemma="meinen"&
pos=/VV.+/&
#1_=_#2
POS and Lemma Combination definite articles pos="ART"&
lemma="d"&
#1_=_#2
Lemma followed by POS an indefinite article followed a common noun lemma="ein"&
pos="NN"&
#1.#2
Sequence of POS, Lemma, and POS a preposition followed by a definite article followed by a common noun pos="APPR"&
lemma="d"&
pos="NN"&
#1.#2&
#2.#3
Typos or Anglicisms or Unknown Neologisms typos or anglicisms or unknown neologisms lemma="[unknown]"

Wildcard Characters

A wildcard character can be used to substitute for any other character or characters in a search query. A lot of users are familiar with the asterisk (*) in search engines, for instance. Below are the ones you can use in Annis.

Wildcard Meaning Search item Search result
. followed by 1 letter al. als
.. followed by 2 letters al.. alle, alte
... followed by 3 letters al... altes, alter
? previous letter is optional das? da, das
* previous letters occurs as many times or none at all das* da, das, dass
+ previous letters occurs as many times but at least once da+ das, dass
.+ followed by 1 or more letters gu.+ gut, gute, gutes, gummieren

 

Error-annotated Searches

Target Hypotheses (ZH1 and ZH2)

Because Falko and WHiG are error-annotated corpora, every essay contains two mirrored layers, called target hypotheses, where errors have been rectified. The first one, called ZH1 (Zielhypothese 1, target hypothesis 1) comprises error rectifications on orthography, morphology, and syntax. The second one, called ZH2 (Zielhypothese 2, target hypothesis 2) comprises error rectifications on semantics, lexis, pragmatics, stylistics. A more detailed overview of error annotation can be find in the German-language Falko Handbook on Corpus Design and Annotations (pp. 31 onwards).

Source and Target Comparison (ZHDiff1 and ZHDiff2)

When errors have been annotated, the target hypothesis is compared with the source text and the differences (ZH1Diff) or (ZH2Diff) are marked in the following way:

ZH1Diff / ZH2Diff meaning explanation
INS insert a word has been inserted
DEL delete a word has been deleted
CHA change a word has been changed and stays put
MOVS move source a word has been moved away
MOVT move target a word has moved to this location
MERGE merge two or more words have been merged into one
SPLIT split one word has been split into two or more words

This tutorial cannot anticipate all the errors Annis users would be interested in, so we merely show you a few examples.

Remember

If one token follows another token, use: #1.#2
If one token is the same as another token, use: #1_=_#2

Example Copy-Paste
The word “soll” has been used, but the correct word is “sollte”. word="soll"&
ZH2="sollte"&
#1_=_#2
The verb “müssen” has been used, but the correct verb is “dürfen”. lemma="müssen"&
ZH2lemma="dürfen"&
#1_=_#2
The auxiliary “sein” has been used, but the correct (passive) auxiliary is “werden”. lemma="sein"&
ZH2lemma="werden"&
#1_=_#2
The word “wegen” followed by an error (e.g. wrong article). lemma="wegen"&
ZH1Diff="CHA"&
#1.#2
A preposition followed by the wrong article. pos="APPR"&
pos=/ART/&
ZH1Diff="CHA"&
#1.#2&
#2_=_#3
A full verb has been moved to a different location. ZH1Diff ="MOVT" &
ZH1pos=/VVFIN/&
#1_=_#2

« return to WHiG's main page