We use cookies to give you the best experience and to help improve our website

Find out what cookies we use and how to disable them

Proposal to adopt a NWI titled "Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe"

Scope

The new CEN standard should determine the characters required for names in the broader sense. This means names of individuals in accordance with the specifications of the civil status law as well as names of legal entities, products, patents, names of countries, towns and streets, as well as titles of documents or laws. Names in the broader sense can denominate specific objects, but also virtual constructs such as product groups or musical styles.

The normative part of the standard determines the subset of the characters and character sequences included in Unicode, that are required for IT applications for the electronic processing of names (in the broader sense) based on the Latin script. Thus, it needs to be supported by all IT applications compliant to the standard at least for all data fields intended for Latin names.

The normative part should also define a mapping of the normative letters to the capital letters A to Z that is possibly required. For this purpose, the recommendations of the respective standard ICAO 9303, Part 3 for machine-readable travel documents must being applied and extended. From this standard, an impetus can be given to add still missing characters to the ICAO mapping.

The standard should primarily be aimed at authorities and organisations operating IT applications for the inter-authority data exchange or the data exchange with citizens and economy.

It should cover in its entirety

• the European official languages, so not only Latin characters, but also, for example, Greek and Cyrillic characters,

• European minority languages,

• Latin characters that may occur as a result of transliteration from another script – the basis for this must be the relevant ISO standards,

• characters used in European registers, such as civil status registers and commercial registers,

• characters required for data exchange between social insurance carriers, and

• characters required for the Europe-wide data exchange between authorities.

The standard should not include statements on historic characters nor the treatment of continuous text. It does not regulate the representation of characters (glyphs).

The standard should consider the already existing standard DIN 91379 “Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe”, which has already been bindingly stipulated in Germany for public administrations.

The legacy character sets ISO/IEC 8859-1, ISO/IEC 8859-15 and Windows 1252 should also be a basis for the compilation of characters within the standard. However, a few characters may not be added due to professional evaluation.

To ensure interoperability of all implementations of the standard, a normalisation form (i.e. NFC) and an encoding (i.e. UTF-8) for the Unicode characters must also be defined.

The standard should describe the conformity requirements for implementations of the standard.

An annex should recommend data types for restrictions on certain subsets of characters, which may be especially helpful for interface agreements. The data types help to achieve the goal, to get a conclusive list of permitted characters to implement robust algorithms and applications for personal identification and the creation of official documents and certificates.

Non-normative attached files to the standard should be given to:

• a technical implementation of data types,

• an XML file including all characters and character groups listed in the standard,

• an XML Schema including the determination of the structure of the XML file of all listed characters,

• a tabular overview of all characters and character groups listed in the standard that is easy for people to read,

• a mapping of the characters contained in the standard to legacy character sets for the implementation of transitional solutions that are as interoperable as possible, and

• a mapping of characters from legacy character sets, that are not part of the standard, into the character set of the standard for the same reason.

Purpose

For IT systems, the choice of character encoding for the recorded data is an important decision. Many Latin characters do not occur in widely used 8-bit character sets such as ISO 8859-1 or ISO 8859-15. These cannot then be processed. Various methods are used to map the missing characters to the available characters. Conflicts arise when data is exchanged between IT systems if they support different character sets. Even if they use the same character set, different handling of non-existent characters can lead to interoperability problems.

Unicode (ISO/IEC10646) is a standard that claims to be able to encode all existing characters. However, the characters must also be visualised – for display on the screen and for printing on paper. The fonts required for this are not able to display all Unicode characters correctly. Unicode allows almost any combination of basic characters and diacritical marks. The automatic generation of glyphs for fonts is faulty – not only when two diacritical marks are placed over a basic character, but especially then. Other automatic connections are also not error-free.

Historical characters that were only used in the Middle Ages, for example, are also part of Unicode. The support of characters that cannot actually occur in current names leads to incorrect data. For example, errors made by civil servants when entering names in German civil registers by hand have led to incorrect characters being used instead of the correct transliteration results for the transfer of Greek names into the Latin script. The Latin names of the persons concerned cannot be correctly recorded or displayed by many IT systems.

The similarity of different basic characters and diacritical marks opens the risk of confusion. This risk is significantly greater if all existing basic characters and diacritical marks are always available. Intentional or unintentional duplicates can arise and the correct identification of persons or companies can fail. Correct identification is essential in many fields of activity. Errors can have fatal consequences.

During the creation of the standard DIN 91379 “Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe”, which contains all the Latin characters required by the German administration for processing names, it emerged that many of these characters are not supported by widely used fonts. So-called house fonts of administrations can often only handle the characters of the known 8-bit character sets. But even fonts with broad Unicode support fail with characters that are not explicitly listed in the Unicode standard, but are formed by combinations of basic characters and sometimes several diacritical marks. The ISO standards for transliterations of other scripts into the Latin script contain such characters.

The DIN 91379 standard has revealed which additional characters must be supported by IT systems to correctly process the names of citizens, companies, products, patents, countries, streets and the titles of documents and laws. The list of required characters can be used to check whether software, hardware and interface definitions are suitable for processing names.

A standard that lists the Unicode characters required for processing names for all participating countries is required for a functioning data exchange between countries. At least these characters must be supported by all parties involved. This will also make it possible to exclude all other characters for the exchange of names. This standardisation project is intended to create such a standard for European countries.

The draft standard CEN/TS 17489-2 “European Breeder Documents” already references DIN SPEC 91379 respectively DIN 91379, which is another reason why it makes sense to continue its standardisation at CEN level.

Comment on proposal

Required form fields are indicated by an asterisk (*) character.


Please email further comments to: debbie.stead@bsigroup.com

Follow standard

You are now following this standard. Weekly digest emails will be sent to update you on the following activities:

You can manage your follow preferences from your Account. Please check your mailbox junk folder if you don't receive the weekly email.

Unfollow standard

You have successfully unsubscribed from weekly updates for this standard.

Error