Medieval Unicode Font Initiative


Disclaimer: This site is managed by scholars in Medieval studies with the aim of establishing a consensus on the use of Unicode among medievalists. It is not affiliated with or endorsed by Unicode.


A proposal for supplementary characters in Unicode: Medieval Nordic

by Odd Einar Haugen, Department of Scandinavian languages and literature, University of Bergen
Version 2.0, 5 February 2003

 

1. Background
2. Medieval Nordic
3. Comments on first version
4. Composite characters
5. Structure of proposal
6. Proposed characters

 

1. Background

This is the second version of a proposal publised 15 June 2002, entitled A proposal for subranges within the Private Use Area of Unicode:Supplements to the Latin alphabet for Medieval texts. Please refer to this proposal for more background information on Unicode and Medieval characters.

In the first proposal, a large number of characters were proposed for a set of subranges within the private Use Area of Unicode. In this second proposal, the numbers of characters have been significantly reduced, and it is now planned as a formal proposal to Unicode.

A number of people have responded to the first proposal; in alphabetical order: Jim Allan, Deborah W. Anderson, Peter S. Baker, Michael Beddow, Rick McGowan, David J. Perry, Ken Whistler and Christian Wittern. I would like to thank for all the advice and help I have received. The responsibility for all omissions, misunderstandings etc., however, remains mine.

 

2. Medieval Nordic: chronological and geographical definition

Medieval Nordic is here used for written sources in Danish, Icelandic, Norwegian and Swedish up to approx. 1500 (the reformation). Note, however, that the proposal also takes into account characters used in younger Icelandic manuscripts of Medieval works. Unlike the other Scandinavian countries, Medieval works were transmitted in manuscript form in Iceland (or by Icelandic scribes abroad) for several centuries after the reformation.

The Latin alphabet was introduced in Scandinavia in the 11th century, but the oldest preserved manuscripts belong to the 12th century. For more information on manuscript and language, there are a number of handbooks available, e.g. the recently published The Nordic languages vol. 1 (Berlin - New York: De Gruyter, 2002).

The majority of characters in the present proposal are also found in other European Medieval sources. António Emiliano, Universidade Nova de Lisboa, and Susana Pedro, Universidade Lusófona, Lisboa, have recently published a similar proposal for Medieval Portuguese. For Old English, Peter S. Baker, University of Virginia, and David J. Perry, Rye High School, New York, have published fonts with a selection of special characters for Old English and other Medieval languages. Cf. links for all three projects on the MUFI site.

However, for purely pragmatic reasons, this proposal only considers characters needed for Medieval Nordic. As soon as this proposal has been evaluated by Unicode and it is known which characters are accepted, it will be easier to start a process of supplementing characters for other well-defined areas of Medieval script.

Several suggestions in the Medieval Portuguese project have been taken up here, such as the names of the subranges and the naming scheme.

 

3. Comments & criticism of first version of this proposal

The first version of this proposal met with a number of helpful comments. The main points were:

- The Private Use Area ought to be used only as a last resort. Unicode should be contacted with a formal proposal for inclusion of characters in one or more of the official ranges.
- Exisiting glyphs of similar or near-similar form ought to be used, unless there are compelling semantic or practical reasons not do to so.
- Composite characters ought to be avoided, if they can be acceptably rendered and described by use of existing combining marks.
- Combinations of base characters and combining marks ought to be described for the benefit of font developers, even if they are not included as composite characters with their own code points. The same applies to variant letterforms, e.g. the various forms of "y" commonly recognized (y1, y2 and y3).

 

4. The problem of composite characters

The previous version of ths proposal included a large number of composite characters (compare the previous subrange 2 with the revised subrange 2). On the advice of several correspondents, they have been removed from the proposal, since they can be encoded as a sequence of an ordinary base character and one or more combining marks.

However, the support for combinations of base characters and combining marks varies between fonts and operating systems (Windows XP and Mac OS X). Some combinations work well, other combinations works less well (the diacritic may appear, but not in a optimal position), and a number of combinations does not work at all. For example, the acute accent over "e" can be encoded as a sequence of an "e" and a "combining acute accent". This works well in Windows XP and Mac OS X, as shown in fig. 1 below. The same goes for the sequence of an "i" and a combining acute accent, fig. 2. However, if a "j" is chosen and a combining acute accent, the accent may not show up at all, or it is incorrectly placed, cf. fig. 3.

Fig. 1: 0065 + 0301 works well; the acute accent is placed correctly horizontally as well as vertically. This character can also be inserted as a composite character in the range Latin-1 Supplement, 00E9.

Fig. 2: 0069 + 0301 also works well; the dot is exchanged for an acute accent, and the latter is placed correctly horisontally as well as vertically. This character can also be inserted as a composite character in the range Latin-1 Supplement, 00ED.

Fig. 3: 006A + 0301 does not work equally well; the dot has not been removed, and the vertical position of the acute accent is incorrect. This character can not be inserted as a composite character in the present version of Unicode, and it is unlikeley that it will be accepted as a composite character.

For this reason, a number of composed characters are exemplified in this proposal. They are not part of the proposal to Unicode, but are included so that font developers can support them in their fonts.

 

5. Structure of the proposal

The proposal is divided into 12 subranges. In addition to a short introduction, each subrange contains a table with four fields for each proposed character:

(a) Glyph (i.e. an image of the character), based on Courier
Please note that these glyphs are supplied only for the sake of illustration. The glyphs in
subrange 11 are based on Peter S. Baker's Junicode. Courier is very little Medieval in style, so many glyphs may look a little odd, but not too foreign, I hope.

(b) Suggested entity name, for use in SGML or XML documents.
These names are actually heavily abbreviated forms of the descriptive names suggested in (d) below. For more information on the structure of entity names, see the
Menota handbook ch. 2. Note that all entity names must begin with "&" and end with ";".

(c) Unicode code points
This field is left blank, pending the fate of the proposal to Unicode.

(d) Suggested descriptive name
The vocabulary and syntax of the descriptive names are based on Unicode. In the Unicode standard, the small "a" in the Latin alphabet is described as LATIN SMALL LETTER A, the small "a" with acute accent as LATIN SMALL LETTER A WITH ACUTE, etc. By analogy, the small "a" with a macron and breve is described in this proposal as LATIN SMALL LETTER A WITH MACRON AND BREVE, etc.

 

6. Proposed characters

In the present proposal, the number of proposed characters has been reduced from 340 to 112 (+ 36). Numbers in parantheses refer to characters not yet finally attested (marked with grey colour in the relevant ranges).

No.

Name of range

Version 1.0

Version 2.0

1

Mixed script characters

19

12

2

Diacritical characters

183

9

3

Small capitals

19

6 (+ 13)

4

Enlarged minuscules

28

9 (+ 5)

5

Ligatures

15

12

6

Punctuation marks

4

5

7

Base line abbreviation marks

15

15

8

Combining abbreviation marks

11

10

9

Precomposed abbreviated characters

8

12

10

Superscript (interlinear) characters

22

4 (+ 18)

11

Metrical symbols

12

12

12

Critical and epigraphical signs

4

5

Total number of characters included in this proposal

340

111 (+ 36)


Version 2.0, 5 February 2003 OEH