What is ALICE?

This is an Abbreviation Extraction System.

1. Background
     The rapid growth of literature in MEDLINE database gives benefit invaluably to biomedical researchers.  On the other hand, unfettered introduction of new abbreviations in the literature such as gene or protein names hinders efficient use of the database.  Because in the biomedical literature, abbreviations are highly ambiguous:  one abbreviation may represent multiple expansions.  In this situation, a support system by which those researchers identify abbreviations in the literature is strongly required.
2. ALICE system
     To extract abbreviations and their expansions from biomedical literature, we propose an algorithm called ALICE (Abbreviation LIfter using Corpus-based Extraction).  ALICE is composed of three phases, that is, the Inner Search (IS), the Outer Extraction (OE), and the Validity Judgment (VJ).  The IS phase is for searching a candidate abbreviation and recognizing whether the candidate is an abbreviation or not, the OE phase is for extracting of its expansion, and the VJ phase is for judging the propriety of the pair of an abbreviation and its expansion.
3. Evaluation
     Our algorithm solved various limitations, which other algorithms had, by carefully constructed many patterns and rules, and many stop words lists.  They are based on reiterated examinations to a vast amount of biomedical literature.  ALICE tries to recognize all patterns of abbreviations and extract their expansions in the literature with high precision and recall.  It achieved 95% precision and 96% recall on the randomly selected literature from MEDLINE database.  This achievement helps to construct a useful abbreviation dictionary, which also leads to making a new algorithm to retrieve literature from MEDLINE database.
4. How to use
     ALICE can accept two types of your request.

  • MEDLINE format file        SAMPLE
          <<< Notice >>>  More than 1M byte file will be discarded (about 450-500 entries).
  • PubMed ID list
          <<< Notice >>>  IDs need to be delimited by a single sapace (e.g., PMID1 PMID2 PMID3 ...).
  • Free text
          <<< Notice >>>  Please don't break lines.
  • 5. Publication
         ALICE: An Algorithm to Extract Abbreviations from MEDLINE.
         Ao H, Takagi T.
         J Am Med Inform Assoc 2005; 12: 576-586. PrePrint published May 19 2005; doi:10.1197/jamia.M1757

        MEDLINE ABSTRACT
    6. Contact
         We welcome comments and suggestions.  Please send an e-mail to aohiroko@hgc.jp with them.


    TOP HOME