Some Challenges of Automated Annotation in
A Multilingual Scenario

Arindam Roy; Sunita Sarkar; B. S. Purkayastha

Abstracto

Some Challenges of Automated Annotation in A Multilingual Scenario

Arindam Roy, Sunita Sarkar, B. S. Purkayastha

A key ingredient of today’s NLP scenario is annotation and this paper discusses challenges involved in one of the toughest annotation tasks which is sense marking. A large amount of data needs to be sense marked accurately by human annotators in order to train the machine to understand the spoken languages. The sense marked corpus for various languages facilitate the task of Word Sense Disambiguation (WSD) which is required for translation. For accurately sense marking voluminous data, a standard and definitive lexicon is required. In the work reported here, the corpus is taken from the newspaper domain and tourism domain. The Princeton WordNet (Version 2.1) is used as the sense repertoire for English text while the Hindi and Nepali WordNets have been used for Hindi and Nepali texts respectively. The corpus was independently tagged by different annotators and it was found that the agreement level on word sense disambiguation was about 85% across the three languages, i.e., English, Hindi and Nepali. Different senses of a particular word in WordNet are quite specific, yet there have been cases when the senses provided had limitations and posed challenges to the human sense markers.

Descargo de responsabilidad: este resumen se tradujo utilizando herramientas de inteligencia artificial y aún no ha sido revisado ni verificado.

Aspectos destacados de la revista

Bioquímica Botánica Ciencias Aplicadas Dinámica de fluidos Ingeniería Aeroespacial Ingeniería Biomédica Técnicas de cromatografía

Indexado en

Academic Keys

ResearchBible

CiteFactor

Cosmos SI

Búsqueda de referencia

Universidad Hamdard

Catálogo mundial de revistas científicas

director académico

Factor de impacto de revistas innovadoras internacionales (IIJIF)

Instituto Internacional de Investigación Organizada (I2OR)

Cosmos

Revistas internacionales

Ciencias farmacéuticas Ciencias Generales Ciencias Médicas Ingeniería

Revista Internacional de Investigación Innovadora en Ciencia, Ingeniería y Tecnología

Abstracto

Some Challenges of Automated Annotation in A Multilingual Scenario

Aspectos destacados de la revista

Indexado en

Revistas internacionales

Dirección