Corpora: Corpus frequencies for psycholinguistic experiments

From: Philip Resnik (
Date: Thu Jul 13 2000 - 21:32:52 MET DST

  • Next message: Philip Resnik: "Corpora: Corpus frequencies for psycholinguistic experiments"

    A colleague has asked me whether there are alternatives to the widely
    used Francis and Kucera frequencies that could be used in controlling
    for word frequency in a psycholinguistics experiment. Although there
    are plenty of English corpora out there from which it's easy to
    generate word counts, it occurs to me to wonder whether anyone else
    has already addressed this issue.

    It seems to me that the criteria for selecting a corpus to use as a
    basis for word frequency data in a psycholinguistics setting would be
    that it be (a) large, (b) either as unspecialized as possible or at
    least "balanced" to whatever extent is possible. The latter might
    arguably rule out most of the available corpora because they comprise
    primarily newswire. Is this why F&K is still so widely used?

    Again, let me emphasize that the question is whether or not there is
    an alternative to F&K specifically for use in psycholinguistics,
    e.g. controlling for frequency. I'd suggest that replies go to me
    personally and I can post a summary if there's interest.

      Philip Resnik, Assistant Professor
      Department of Linguistics and Institute for Advanced Computer Studies

      1401 Marie Mount Hall UMIACS phone: (301) 405-6760
      University of Maryland Linguistics phone: (301) 405-8903
      College Park, MD 20742 USA Fax : (301) 405-7104 E-mail:

    This archive was generated by hypermail 2b29 : Thu Jul 13 2000 - 21:55:25 MET DST