Re: Corpora: job: computational linguistics/lexicography

From: Kees Koster (
Date: Tue Aug 22 2000 - 11:38:58 MET DST

  • Next message: Kees Koster: "Re: Corpora: job: computational linguistics/lexicography"

    Beste Jan,

    Ik weet dat je een goede baan hebt, maar wellicht wil je naar Nederland
    terugkeren. Heb je belangstelling voor bijgaand project, of weet je een
    geschikte linguist (of een hele goede programmeur)?

    Vriendelijke groet,

      -- Kees Koster


    The PEKING project (People and Knowledge Information Gathering) is a 5th
    framework project, addressing the problems of supervised and unsupervised
    classification and (cross-lingual) matching of documents in organizations.

    The proposal was submitted to the EC in May 2000 by the following partners:
     - META4 R&D (coordinator), Univ. of Barcelona, Univ. of Madrid Carlos III
       and CINDOC in Spain
     - Quinary and CRF-FIAT in Italy
     - Univ. of Nijmegen (KUN), Edmond bv and Fiscaal up to Date in The
    It has been positively recieved by the Reviewers of the Commission regarding
    its scientific and commercial merits, and contract negotaiations are
    taking place. The project will start at the end of 2000.

    In the PEKING project KUN and Edmond will address the real-life situation
    of one Dutch User (the FISCAAL firm) which is typical for many firms and
    institutions which derive their income from providing access to a large
    amount of systematically collected documents. The documents are presently
    manually classified according to a hierarchical thesaurus, which is hard to
    keep up to date and to modify. Furthermore, certain index terms have been
    added to the documents manually, and a conventional keyword-based search
    facility is available.
    Since the manual classification and index term assignment is
    expensive, inflexible and rather subjective, there is a pressing need for
    an automatic disclosure mechanism to replace or at least support the manual
    classification process.

    The key questions on the application side are:

     - Can an automatically learning system be made to provide a hierarchical
       classification which is good enough for the users?
     - Can the consistency and quality of automatic classification approximate
       the experience and insight of experts performing manual classification?
     - In reality, it is to be expected that an automatic system may process
       the bulk of the documents leaving only a few hard cases to the human
       experts. Can such a mixed system provide an economically attrative
       solution to the disclosure problems of firms like FISCAAL?

    The technical problems to be solved are

     - learning reliably from unreliably classified documents
     - exploiting the notion of uncertainty in improving classification results
     - deriving normalized phrasal representations from documents, and
     - using those phrase representations in conjunction with statistical
       learning methods to increase precision in learning.

    The use of phrases also presents new potentials and problems in
    interlinguality which have to be addressed.

    KUN proposes to extend the existing LCS prototype into a system capable of
    dealing with the requirements of the Dutch User FISCAAL, which should provide
    ample opportunity for inventing, implementing and evaluating novel ideas in
    term representations and classification strategies.

    KUN is now looking for two postdocs:
     - a computer scientist with an interest in Information Retrieval and a
       solid experience in C++ programming
     - a computational linguist with an interest in Information Retrieval and
       a specialization in syntax of natural languages.
    Contracts are for two year, with a possible extension.

    This archive was generated by hypermail 2b29 : Tue Aug 22 2000 - 11:37:07 MET DST