|
A
proposal for supplementary characters in Unicode:
Medieval Nordic
by Odd Einar
Haugen, Department of Scandinavian languages and
literature, University of Bergen
Version 2.0, 5 February 2003
1.
Background
2.
Medieval Nordic
3.
Comments on first
version
4.
Composite
characters
5.
Structure of
proposal
6.
Proposed characters
1.
Background
This is the
second version of a proposal publised 15 June 2002,
entitled A
proposal for subranges within the Private Use Area
of Unicode:Supplements to the Latin alphabet for
Medieval texts.
Please refer to this proposal for more background
information on Unicode and Medieval
characters.
In the first
proposal, a large number of characters were
proposed for a set of subranges within the private
Use Area of Unicode. In this second proposal, the
numbers of characters have been significantly
reduced, and it is now planned as a formal proposal
to Unicode.
A number of
people have responded to the first proposal; in
alphabetical order: Jim Allan, Deborah W. Anderson,
Peter S. Baker, Michael Beddow, Rick McGowan, David
J. Perry, Ken Whistler and Christian Wittern. I
would like to thank for all the advice and help I
have received. The responsibility for all
omissions, misunderstandings etc., however, remains
mine.
2.
Medieval Nordic: chronological and geographical
definition
Medieval Nordic
is here used for written sources in Danish,
Icelandic, Norwegian and Swedish up to approx. 1500
(the reformation). Note, however, that the proposal
also takes into account characters used in younger
Icelandic manuscripts of Medieval works. Unlike the
other Scandinavian countries, Medieval works were
transmitted in manuscript form in Iceland (or by
Icelandic scribes abroad) for several centuries
after the reformation.
The Latin
alphabet was introduced in Scandinavia in the 11th
century, but the oldest preserved manuscripts
belong to the 12th century. For more information on
manuscript and language, there are a number of
handbooks available, e.g. the recently published
The
Nordic languages
vol. 1 (Berlin - New York: De Gruyter,
2002).
The majority of
characters in the present proposal are also found
in other European Medieval sources. António
Emiliano, Universidade Nova de Lisboa, and Susana
Pedro, Universidade Lusófona, Lisboa, have
recently published a similar proposal for Medieval
Portuguese. For Old English, Peter S. Baker,
University of Virginia, and David J. Perry, Rye
High School, New York, have published fonts with a
selection of special characters for Old English and
other Medieval languages. Cf. links for all three
projects on the MUFI
site.
However, for
purely pragmatic reasons, this proposal only
considers characters needed for Medieval Nordic. As
soon as this proposal has been evaluated by Unicode
and it is known which characters are accepted, it
will be easier to start a process of supplementing
characters for other well-defined areas of Medieval
script.
Several
suggestions in the Medieval Portuguese project have
been taken up here, such as the names of the
subranges and the naming scheme.
3.
Comments & criticism of first version of this
proposal
The first version
of this proposal met with a number of helpful
comments. The main points were:
- The Private Use
Area ought to be used only as a last resort.
Unicode should be contacted with a formal proposal
for inclusion of characters in one or more of the
official ranges.
- Exisiting glyphs of similar or near-similar form
ought to be used, unless there are compelling
semantic or practical reasons not do to so.
- Composite characters ought to be avoided, if they
can be acceptably rendered and described by use of
existing combining marks.
- Combinations of base characters and combining
marks ought to be described for the benefit of font
developers, even if they are not included as
composite characters with their own code points.
The same applies to variant letterforms, e.g. the
various forms of "y" commonly recognized (y1, y2
and y3).
4.
The problem of composite characters
The previous
version of ths proposal included a large number of
composite characters (compare the previous
subrange
2 with
the revised subrange
2). On
the advice of several correspondents, they have
been removed from the proposal, since they can be
encoded as a sequence of an ordinary base character
and one or more combining marks.
However, the
support for combinations of base characters and
combining marks varies between fonts and operating
systems (Windows XP and Mac OS X). Some
combinations work well, other combinations works
less well (the diacritic may appear, but not in a
optimal position), and a number of combinations
does not work at all. For example, the acute accent
over "e" can be encoded as a sequence of an "e" and
a "combining acute accent". This works well in
Windows XP and Mac OS X, as shown in fig. 1 below.
The same goes for the sequence of an "i" and a
combining acute accent, fig. 2. However, if a "j"
is chosen and a combining acute accent, the accent
may not show up at all, or it is incorrectly
placed, cf. fig. 3.
|

|
Fig.
1: 0065 + 0301 works well; the acute
accent is placed correctly horizontally as
well as vertically. This character can
also be inserted as a composite character
in the range Latin-1
Supplement,
00E9.
|
|
|
|
|

|
Fig.
2: 0069 + 0301 also works well; the
dot is exchanged for an acute accent, and
the latter is placed correctly
horisontally as well as vertically. This
character can also be inserted as a
composite character in the range
Latin-1
Supplement,
00ED.
|
|
|
|
|

|
Fig.
3: 006A + 0301 does not work equally
well; the dot has not been removed, and
the vertical position of the acute accent
is incorrect. This character can not be
inserted as a composite character in the
present version of Unicode, and it is
unlikeley that it will be accepted as a
composite character.
|
For this reason,
a number of composed characters are exemplified in
this proposal. They are not part of the proposal to
Unicode, but are included so that font developers
can support them in their fonts.
5.
Structure of the proposal
The proposal is
divided into 12 subranges. In addition to a short
introduction, each subrange contains a table with
four fields for each proposed character:
(a) Glyph (i.e.
an image of the character), based on Courier
Please note that these glyphs are supplied only for
the sake of illustration. The glyphs in
subrange
11 are
based on Peter S. Baker's Junicode. Courier is very
little Medieval in style, so many glyphs may look a
little odd, but not too foreign, I hope.
(b) Suggested
entity name, for use in SGML or XML documents.
These names are actually heavily abbreviated forms
of the descriptive names suggested in (d) below.
For more information on the structure of entity
names, see the Menota
handbook
ch. 2. Note that all entity names must begin with
"&" and end with ";".
(c) Unicode code
points
This field is left blank, pending the fate of the
proposal to Unicode.
(d) Suggested
descriptive name
The vocabulary and syntax of the descriptive names
are based on Unicode. In the Unicode standard, the
small "a" in the Latin alphabet is described as
LATIN SMALL LETTER A, the small "a" with acute
accent as LATIN SMALL LETTER A WITH ACUTE, etc. By
analogy, the small "a" with a macron and breve is
described in this proposal as LATIN SMALL LETTER A
WITH MACRON AND BREVE, etc.
6.
Proposed characters
In the present
proposal, the number of proposed characters has
been reduced from 340 to 112 (+ 36). Numbers in
parantheses refer to characters not yet finally
attested (marked with grey colour in the relevant
ranges).
|