Icelandic Parsed Historical Corpus View resource name in all available languages
About 1000000 words of Icelandic text, from every century between the 12th and the 21st centuries inclusive annotated for phrase structure, part-of-speech-tagged and lemmatized. Distribution Availability
Available - Restricted Use
Licence LGPL
Restrictions: Attribution, Share Alike
User Nature: Academic, Commercial
Distribution Access/Medium: Accessible Through Interface, Downloadable
Attribution Details: Wallenberg, Joel, Anton Karl Ingason, Einar Freyr Sigurðsson and Eiríkur Rögnvaldsson. 2011.
Icelandic Parsed Historical Corpus (IcePaHC).
Version 0.9.
Distribution rights holders:
IPR Holder
Contact Person
Monolingual text corpus Languages
Language Script: Latn
Linguality Linguality type: Monolingual
Text Format Size
73,014 Sentences
1,000,000 Tokens
Character encoding
UTF - 8
Modalities Annotation Syntactic Annotation - Treebanks Tagset:
StandOff: False
Segmentation level: Sentence, Word
Standard practices conformance: Penn Tree Bank
Theoretic Model: Phrase structure
Annotation Mode: Mixed
Annotation Manual:
Time Coverage
12th to 21st centuries
Geographic coverage
Metadata Created: 11/29/2011
Last Updated: 01/25/2013
Version Version: 0.9
Last Updated: 08/29/2011
Usage Foreseen Use Nlp Applications Use NLP Specific: Other
Actual Use - Nlp Applications Use NLP Specific: Other
Document Type: In Proceedings
Eiríkur Rögnvaldsson, Anton Karl Ingason, Einar Freyr Sigurðsson, Joel Wallenberg,
The Icelandic Parsed Historical Corpus (IcePaHC) ,
, LREC 2012
, 2012
Editor: Nicoletta Calzolari, Khalid Choukr, Thierry Declerck, Joseph Mariani, Jan Odijk, Stelios Piperidis
Publisher: European Language Resources Association (ELRA)
Book Title: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)
ISBN: 978-2-9517408-7-7
Keywords: Icelandic, Faroese, treebank, parsed corpus, annotation
Document Language:
