Sentence many human language: GATE provides many plugins

Sentence Splitter: the process which split raw text string into a list of sentences. Typical sentence splitter can be: internal type (combination of exclamation, question mark, or one to four dots) and external (new line). Sentence Splitter can work well with regular input, but in case with irregular input, Regular Expression Sentence Splitter is a good alternative option. It is based on regular expression, using the default Java implementation, and help to improve the execution time and robustness.The named entity recognizer/classification: Gazetteer process is to identify entity names in the text base on lists. The gazetteer lists used are in plain text files, with one entry per line. Each list represents a set of names, such as names of cities, organizations, days of the week, etc. Moreover, user can insert, update, or delete records of list to fit with their own application.Part-of-Speech Tagging: the process of assigning a part-of-speech marker (noun, verb, pronoun, preposition, adverb, conjunction, participle, and article) to each word in an input text. Moreover, GATE is also support for semantic tagging. Semantic Tagger process is based on JAPE language, contains rules to procedure output of annotated entities: person gender (male, female), location type (region, airport, city, country, province…), organization type (company, department, government, newspaper, team…), money, percent, date type, address kind (email, website URL, phone, postcode, IP address…).Co-reference detection: the process that determine if word A refers to the same real-world entity as word B. In GATE, co-reference divided into two pieces: Orthographic and Pronominal. Orthographic Coreference module add identity relations between named entities (person, organization, location and date) found by semantics tagger to perform coreference. Pronominal Coreference module use JAPE grammar to identify quoted speech and match pronouns to antecedents. For example: match “I”, “me”, “my” inside quoted speech to names outside quoted speech; or match pronouns with last referent.Evaluation of Language Analyzers: GATE provides a variety of tools for automatic evaluation: The Annotation Diff tool and The Corpus Benchmark tool. These tools are particularly useful not only a final measure of performance, but also to support system development by tracking progress and evaluating the impact of changes as they are made.Text processing for many human language: GATE provides many plugins available for processing non-English language: French, German, Italian, Danish, Chinese, Arabic, Romanian, Hindi, Russian, Welsh and Cebuano.