Del.icio.us

We have been recently introduced to del.icio.us as a new tool. It is basically a collection of favourites which offers some advantages compared to the traditional favourites lists.

First of all del.icio.us allows the storage, classification and sharing of links to the websites of your preference. As it is extremely difficult to find the information you need among the huge amount of data that the Internet provides, it is very useful to have something that leads you wherever you want to go virtually in seconds. It also lets you filter the information to avoid uncomfortable useless websites.

Its social bookmarks manager allows to attach the sites you visit to your del.icio.us with a simple click of your mouse. And sites are also classified and ready to use. The tags make it easy and quick to classify as many things as you want. Besides, its complete and practical tools help you use and manage the resources.

 

All this makes del.cio.us a website with even more visits than the Wikipedia and with a potencial difficult to quantify.

 

Bibliography

Consumer contributors, Del.icio.us: Los favoritos de todos, Consumer.esEroski, 22/06/05, http://www.consumer.es/web/es/tecnologia/internet/2005/06/22/143141.php, (accessed 29/04/08 )

Wikipedia contributors, “Del.icio.us,” Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Del.icio.us&oldid=208689635 (accessed April 29, 2008 )

Publicado en on Abril 29, 2008 at 10:12 am Comentarios (0)

Will Machine Translation Replace Standard Translation?

Several years ago technology advanced enough to produce systems that could take words in one language and substitute them by their equivalent in a different language. It was the beginning of Machine Translation, and many feared it would substitute human translators. So far, programmes have improved but translator are still necessary. Will this change in a more or less distant future?

 

At the innitial moment, machines could only substitute words without interpreting them. The next step was to attempt mere complex texts and sequences of words. The corpora helped machines recognise phrases, translate idioms or identify types of words.  In this moment translation software limits the score of permited substitutions which makes systems much more effective. If the language is formulaic, as in legal documents for instance, the results are astonishing good but problems arise in more literary texts.

 

However, translators are still necessary. Machines cannot fully understand and translate some expressions, puns, idioms or simply the intention of the author. Besides, it is not always possible to get an exact equivalence from one language to another. Nowadays, it is common for human translators to start translating with machines and correct what they have done. But the final approach, the human touch, is only understood by other humans. So, unless a software that allows machines to think by themselves is invented machine translators will never replace people.

 

Bibliography

Lenssen, Philipp, “Google Translator: The Universal Language,” Google Blogoscoped, http://blogoscoped.com/archive/2005-05-22-n83.html , 2005, (accessed April 15, 2008 )

Wikipedia contributors, “Dictionary-based machine translation,” Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Dictionary-based_machine_translation&oldid=183145791 (accessed April 15, 2008 )

Wikipedia contributors, “Machine translation,” Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Machine_translation&oldid=205089471 (accessed April 15, 2008 )

Wikipedia contributors, “Statistical machine translation,” Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Statistical_machine_translation&oldid=202036737 (accessed April 15, 2008 )

Publicado en on Abril 15, 2008 at 10:05 am Comentarios (2)

English Corpora

There is a great number of corpora in English language. I have had a look at some of the most important an these are my conclusions:

  • British National Corpus (BNC): Covers British written and spoken English of the twentieth century. It contains a 100-million-word corpus of samples. It is marked up following  the TEI. It is distributed in XML format and XAIRA sofware.  When using it I found it a little complex but you can also use it in a much simpler way through the Davies website.
  • American National Corpus: It is related to American English. When completed it is aimed to be comparable to the BNC in number of texts and uses but this task is not finished yet. It contains lots of information but it is extremely easy to get lost in it. A tip: go to “resources”.
  • Oxford English Corpus: Created by the makers of the Oxford University Press language research programme and the Oxford English Dictionary. It contains two billion words from all types of literary sources. Each document includes interesting information about the author, gender, etc. It includes very good how-to-use explanations.
  • Bank of English: It was created by HarperCollins Publishers and the University of Birmingham. It is very useful because it explains what a corpus is.
  • Brown Corpus: It is the Brown University Standard Corpus of Present-Day American English. It is more limited than the others but contains a user-frienly manual of use.
  • Scottish Corpus Of Texts and Speech: It contains texts in Scotish English and variations of Scots. The search is quick and easy but the interest limited to Scotish studies.

Bibliography

Wikipedia contributors, “Text corpus,” Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Text_corpus&oldid=203756862 (Accessed April 8, 2008 )

Wynne, M (editor), “Developing Linguistic Corpora: a Guide to Good Practice” Oxford, 2005: Oxbow Books. Available online from http://ahds.ac.uk/linguistic-corpora/ (Accessed April 8, 2008 )

Publicado en on Abril 8, 2008 at 10:57 am Comentarios (0)

What the Hell the Corpus is?

Corpus

In our Digital Resourse class we have recently been asked to write a long report concerning the corpus. No problem so far except a small one. What is the corpus? I asked several classmates who were as lost as I was so I have finally decided to look seriously into the matter and I present you here what I have found out.

A corpus is basically a set of texts. They are gathered for people to consult about any doubts they may have, analyse different situations or get statistics on some particular cases or structures.

Corpora can be monolingual or plurilingual. The new technologies allow an electronic storage so, nowadays, the easiest way to get access to a corpus is the Internet. The corpora include a system of research known as annotation. This means that entries are classified with tags which make it easier to find a special application or topic. Tags include information as useful as type of word or the root where it comes from.

Fields such as computational linguistics, speech recognition or even machine translation work on the analysis of various types of corpora. There are several corpus of interest for linguistics students and researchers and some websites, like this one by AHDS that I find quite practical, which help you using them. They are increasing in use and importance and will soon become an indispensable tool for the analysis of the language.

I hope this information helps you in the ardous task of finally understanding what the hell the corpus is.

Bibliography

Wikipedia contributors, “Text corpus,” Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Text_corpus&oldid=203756862 (Accessed April 8, 2008 )

Wynne, M (editor), “Developing Linguistic Corpora: a Guide to Good Practice“, Oxford, 2005: Oxbow Books, Available online from http://ahds.ac.uk/linguistic-corpora/ (Accessed April 8, 2008 )

Publicado en on at 10:05 am Comentarios (0)