English Corpora

There is a great number of corpora in English language. I have had a look at some of the most important an these are my conclusions:

  • British National Corpus (BNC): Covers British written and spoken English of the twentieth century. It contains a 100-million-word corpus of samples. It is marked up following  the TEI. It is distributed in XML format and XAIRA sofware.  When using it I found it a little complex but you can also use it in a much simpler way through the Davies website.
  • American National Corpus: It is related to American English. When completed it is aimed to be comparable to the BNC in number of texts and uses but this task is not finished yet. It contains lots of information but it is extremely easy to get lost in it. A tip: go to “resources”.
  • Oxford English Corpus: Created by the makers of the Oxford University Press language research programme and the Oxford English Dictionary. It contains two billion words from all types of literary sources. Each document includes interesting information about the author, gender, etc. It includes very good how-to-use explanations.
  • Bank of English: It was created by HarperCollins Publishers and the University of Birmingham. It is very useful because it explains what a corpus is.
  • Brown Corpus: It is the Brown University Standard Corpus of Present-Day American English. It is more limited than the others but contains a user-frienly manual of use.
  • Scottish Corpus Of Texts and Speech: It contains texts in Scotish English and variations of Scots. The search is quick and easy but the interest limited to Scotish studies.

Bibliography

Wikipedia contributors, “Text corpus,” Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Text_corpus&oldid=203756862 (Accessed April 8, 2008 )

Wynne, M (editor), “Developing Linguistic Corpora: a Guide to Good Practice” Oxford, 2005: Oxbow Books. Available online from http://ahds.ac.uk/linguistic-corpora/ (Accessed April 8, 2008 )

Publicado en on Abril 8, 2008 at 10:57 am Dejar un comentario

El URI para hacer TrackBack a esta entrada es: http://welcometothedollhouse.wordpress.com/2008/04/08/english-corpora/trackback/

Canal RSS de los comentarios de la entrada.

Leave a Comment