What the Hell the Corpus is?

Corpus

In our Digital Resourse class we have recently been asked to write a long report concerning the corpus. No problem so far except a small one. What is the corpus? I asked several classmates who were as lost as I was so I have finally decided to look seriously into the matter and I present you here what I have found out.

A corpus is basically a set of texts. They are gathered for people to consult about any doubts they may have, analyse different situations or get statistics on some particular cases or structures.

Corpora can be monolingual or plurilingual. The new technologies allow an electronic storage so, nowadays, the easiest way to get access to a corpus is the Internet. The corpora include a system of research known as annotation. This means that entries are classified with tags which make it easier to find a special application or topic. Tags include information as useful as type of word or the root where it comes from.

Fields such as computational linguistics, speech recognition or even machine translation work on the analysis of various types of corpora. There are several corpus of interest for linguistics students and researchers and some websites, like this one by AHDS that I find quite practical, which help you using them. They are increasing in use and importance and will soon become an indispensable tool for the analysis of the language.

I hope this information helps you in the ardous task of finally understanding what the hell the corpus is.

Bibliography

Wikipedia contributors, “Text corpus,” Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Text_corpus&oldid=203756862 (Accessed April 8, 2008 )

Wynne, M (editor), “Developing Linguistic Corpora: a Guide to Good Practice“, Oxford, 2005: Oxbow Books, Available online from http://ahds.ac.uk/linguistic-corpora/ (Accessed April 8, 2008 )

Publicado en on Abril 8, 2008 at 10:05 am Dejar un comentario

El URI para hacer TrackBack a esta entrada es: http://welcometothedollhouse.wordpress.com/2008/04/08/what-the-hell-the-corpus-is/trackback/

Canal RSS de los comentarios de la entrada.

Leave a Comment