What the Hell the Corpus is?

Corpus

In our Digital Resourse class we have recently been asked to write a long report concerning the corpus. No problem so far except a small one. What is the corpus? I asked several classmates who were as lost as I was so I have finally decided to look seriously into the matter and I present you here what I have found out.

A corpus is basically a set of texts. They are gathered for people to consult about any doubts they may have, analyse different situations or get statistics on some particular cases or structures.

Corpora can be monolingual or plurilingual. The new technologies allow an electronic storage so, nowadays, the easiest way to get access to a corpus is the Internet. The corpora include a system of research known as annotation. This means that entries are classified with tags which make it easier to find a special application or topic. Tags include information as useful as type of word or the root where it comes from.

Fields such as computational linguistics, speech recognition or even machine translation work on the analysis of various types of corpora. There are several corpus of interest for linguistics students and researchers and some websites, like this one by AHDS that I find quite practical, which help you using them. They are increasing in use and importance and will soon become an indispensable tool for the analysis of the language.

I hope this information helps you in the ardous task of finally understanding what the hell the corpus is.

Bibliography

Wikipedia contributors, “Text corpus,” Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Text_corpus&oldid=203756862 (Accessed April 8, 2008 )

Wynne, M (editor), “Developing Linguistic Corpora: a Guide to Good Practice“, Oxford, 2005: Oxbow Books, Available online from http://ahds.ac.uk/linguistic-corpora/ (Accessed April 8, 2008 )

Advertisement
Publicado en on abril 8, 2008 at 10:05 am  Dejar un comentario  

El URI para hacer TrackBack a esta entrada es: http://welcometothedollhouse.wordpress.com/2008/04/08/what-the-hell-the-corpus-is/trackback/

RSS feed para los comentarios de esta entrada.

Deja un comentario

Fill in your details below or click an icon to log in:

Logo de WordPress.com

You are commenting using your WordPress.com account. Log Out / Cambiar )

Twitter picture

You are commenting using your Twitter account. Log Out / Cambiar )

Facebook photo

You are commenting using your Facebook account. Log Out / Cambiar )

Connecting to %s

Seguir

Get every new post delivered to your Inbox.