Finding CJK Unicode Codes
Step 1: Finding Your Character's Encoding
- Go to the Macchiatto Unicode 4.1.0 Chart noted below.
- Click the check-box “Show Code” (upper right)
- Paste your Chinese character into the search box, and click “Find”
NOTE: It's NOT necessary to change the drop-down labeled "Aegean_Numbers."
Simply paste in your CJK character, and click "Find."
- View the chart with characters and their Unicode values.
NOTE: Although the Unicode values in the chart are each linked, the links do not work… I’ve been using this for years, and they never seem to work. Nevertheless, I think this is a useful page.
"Macchiato Unicode 4.1.0
Chart"
Search By Chinese
Character for Unicode Value
Screen Shot
Step 2: Getting More information about your character
- Use the above method to find the Unicode value
- Go to the “Unihan Database Lookup” page to get more information about
the character by searching its Unicode value
Information provided by Hisako Kotaka
- "EACC value"
EACC (East Asian Character Code) serves as the framework for the MARC 8 CJK character set
used in the MARC 21 Standard. Unihan Database users will notice that the
"EACC" field in the record for particular characters is
sometimes empty. CJK Character records in the Unihan DB lacking an EACC value are not valid in the OCLC environment
(although they are valid UTF-8 (Unicode) characters).
- "Variants"
Some Unihan Database records for CJK characters include a link to a "Variant"
character, which is another Unihan database record for a related character.
Users should check variants to see whether or not the variant has an
EACC value, and is therefore usable in the OCLC environment.
- Moving Target
Although OCLC Used Unicode 4.0 for its OCLC Connexion ver. 2.10 MARC
Verification program, Unihan is a dynamic file and its modification
continues. As MARC 21/MARC 8 standards change, issues regarding CJK characters
in cataloging and other metadata will also change.
What to do When Your Character is not in MARC 8
When doing CJK cataloging in OCLC, only the sub-set of Unicode encoded
characters known as "
MARC 8"
may be used. Occasionally, a non-MARC 8 character is used in an
item being cataloged. Even though the character may be part of the
Unicode character set, since it is not included in MARC 8, it is not
possible to use the character in an OCLC record. Following are
some suggestions for what to do in that case, provided by Hisako Kotaka
at OCLC.
Enter a note field to clarify the missing character(s). Using
"姃"
(a character not included in MARC 8) as example, compose a 500 field as
a clue to users, such as:
500 Missing Chinese character in the 700 field consists of 女 in
radical and 正, with Unicode code point 59C3.
Unicode code values can be searchable with "Character Map" in Microsoft
Windows.
PLEASE
FOLLOW:
MS Program Accessories=> System Tools=> Character Map=> Click on “Advanced
view” at this point and set Font=Arial Unicode MS ; Character
set=Unicode ; Group by=Ideographs by Radicals.
All of the Unicode ideograph characters are displayed for you to
choose from, and you
will see the code point value of the selected character at the bottom of
the table.
If the user’s ILS has been configured with the UTF-8 characters handling,
the missing character in the OCLC MARC master record should be properly
displayable as 姃 once your local edit and export is done with UTF-8.
Section Top
Page Top