2001 Annual Meeting

 
 

OCLC CJK Users Group 2001 Annual Meeting

Saturday, March 24, 2001

Holiday Inn Chicago City Center

Room Lasalle 1

300 East Ohio Street

Chicago, IL


(Continental Breakfast provided)

Agenda

  1. 8:00-8:30 a.m.    Continental Breakfast Hosted by OCLC
    8:30-8:50 a.m.    Chair's Report (Hideyuki Morimoto, U.C. Berkeley)
    8:50-10:05 a.m.  OCLC CJK Users Group Annual Meeting Program

  2. 8:50- 8:55 a.m.   Introduction (Wen-ling Liu, Indiana University)

  3. 8:55- 9:00 a.m.   10th Anniversary of the Users Group (Karl Lo, U.C. San Diego)

  4. 9:00- 9:40 a.m.   Unicode and East Asian Ideographs (John Jenkins, Unicode Consortium)

  5. 9:40- 9:45 a.m.   Update on the Harrvard-Yenching Library's Retrospective Conversion

  6.                             (James Lin, Harvard)

  7. 9:45-10:00 a.m.   Pinyin Task Force Report (Sarah Elman, U.C.L.A.)

  8. 10:00-10:05 a.m. Questions and Answers
    10:05-10:15 a.m. Break
    10:15-11:50 a.m. OCLC Report

  9. OCLC 21st Century Global Strategy: Focusing on Metadata Services (Marty Withrow, Director, Metadata Services Division)

  10. OCLC Metadata Policy and Standards: Focusing on OCLC Pinyin Conversion (Glenn Patton, Director, Metadata Policy and Standards Division)

  11. OCLC Asia Pacific Activities (Eliza Sproat, OCLC Asia Pacific Division)

  12. OCLC CJK and Z-Client Update: Focusing on User Support Issues (Hisako Kotaka, CJK Product Manager, Matadata Services Division)

  13. OCLC Contracted Services (Bing Yu, Manager, AsiaLink)

  14. OCLC Answer Sheets to User's Questions and Concerns

  15. OCLC Name Authority File Processing (Susan Westberg)

  16. OCLC Arabic vs. CJK MARC Display and Script Editing (Hisako Kotaka)



Minutes:

After a continental breakfast hosted by OCLC, Hideyuki Morimoto, Chair, convened the meeting at 8:30. He thanked OCLC for its continuous financial and logistical support of the annual meeting. In particular, we are grateful for the financial support provided so that our guest speaker, John Jenkins, could join us from California.
Convener: Hideyuki Morimoto
Recorder: Sharon Domier
Photographer: Abraham Yu


Chair's Report
Mr. Morimoto introduced the current officers, committee members, and their activities:
Current Officers (1999-2001):

  1.  Chair: Hideyuki Morimoto, University of California, Berkeley

  2.  Vice-Chair/Chair-Elect: Wen-ling Liu, Indiana University

  3.  Chinese officer: Hsi-chu Bolick, University of North Carolina, Chapel Hill

  4.  Japanese officer: Sharon Heather Domier, University of Massachusetts, Amherst

  5.  Korean officer: Joy Kim, University of Southern California

  6.  Member-at-large: Fung-yin K. Simpson, University of Illinois, Urbana-Champaign


  7. Website Management Committee

  8. Chair: Abraham Yu, University of California, Irvine

  9. Assisted by: Fung-yin Simpson, University of Illinois, Urbana-Champaign


  10. Maintained the OCLC CJK Users Group Web site and managed the electronic voting process.
    Program Committee Members:

  11. Chair: Wen-ling Liu

  12. Members: Joy Kim and Sharon Domier


  13. Membership Officer

  14. Fung-yin Simpson


  15. Expanded the membership, kept track of member moves, and generated a list of members' email addresses.
    Pinyin Task Force

  16. Chair: Hsi-chu Bolick, University of North Carolina, Chapel Hill

  17. Members: Yu-lan Chou, University of California, Berkeley,

  18. Sarah Su-erh Elman, University of California Los Angeles

  19. Wen-ling Liu, Indiana University

  20. Daphne Wang, University of Oregon

  21. The Task Force participated in reviewing OCLC pinyin conversion test files; conducted a Pinyin Conversion Survey of  OCLC CJK member libraries regarding pinyin conversion and pinyin cataloging issues for Chinese language materials; analyzed responses; reported findings.
    The Chair encouraged close communication with: OCLC by arranging for the OCLC CJK Users Group meeting, collaborating on other meetings such as the Z39.50 session, communicating Users Group concerns to OCLC and informing the Users Group membership of organizational and software changes.
    Members by communicating directly with each new member to encourage participation.
    Bylaws:  an insufficient level of support for the revision of bylaws regarding the terms of office (Section C, Article V) caused the Executive Committee to leave the bylaws as they were.

    Nominating Committee:

  22. Joy Kim

  23. Wen-ling Liu

  24. Hideyuki Morimoto


  25. Nominated 11 members to stand for office. All were thanked for their willingness to put their names up for the vote. Mr. Morimoto then announced the election results and introduced the new officers, who will serve from 2001-2003:
     

  26. Chair: Wen-ling Liu, Indiana University

  27. Vice-Chair/Chair-Elect: Philip Melzer, Library of Congress

  28. Chinese officer: Meng-fen Su, University of Texas at Austin

  29. Japanese officer: Toshie Marra, University of California, Los Angeles

  30. Korean officer: Mikyung Kang, University of California, Los Angeles

  31. Member-at-large: Vickie Fu Doll, University of Kansas, Lawrence

OCLC CJK Users Group Annual Meeting Program
Wen-ling Liu introduced the agenda, the speakers and the program committee members. As the year 2001 marks the 10th anniversary of the Users Group, to celebrate the special occasion, the Program Committee invited one of the founders of the Users Group, Karl Lo, UC San Diego, to talk about the prospects and future of the Users Group.  Dr. Lo served as the Chair of the Users Group from 1993-1995.
Thanks to the generous travel support of OCLC, especially Glenn Patton, the Program Committee was able to invite Mr. John Jenkins, who works at Apple and is one of the Technical Directors of the Unicode Consortium, to give a presentation on Unicode and East Asian ideographs.

10th Anniversary Speech -- Karl Lo, University of California, San Diego
Karl Lo graciously provided the audience with his vision of the future and how we should anticipate our future so that we can steady our course and reap its rewards. He began with a quote from Jay Jordan, President and CEO of OCLC, from the new strategic plan that is available on the OCLC web site.
In the next three years, we will extend the present OCLC library cooperative of 38,000 institutions in 76 countries into a truly global, digital community.  This will involve developing new Web based services, implementing a new technological platform, and, most important, reaffirming a commitment to library cooperation.


"Extending the OCLC cooperative: a three year strategy." http://www.oclc.org/strategy/

Available also as pdf file. (Accessed 15 April 2001). The OCLC CJK database consists of 2 million records and each record contains approximately 5,000 characters. This means that the entire OCLC CJK database could comfortably fit inside the hard drive of a normal desktop computer and still have room for the contents of Si ku quan shu. Each of these desktop computers containing personal digital libraries can be connected to the Internet. Our challenge is to find the way to unlock the power of the personal digital library. If we can meet the challenge that OCLC poses with its own strategic plan to change from a bibliographic utility to a virtual library of multilingual, multiscript, multimedia libraries, we will no longer be just catalogers but virtual library organizers and users.

Unicode & East Asian Ideographs -- John Jenkins, Apple Computers
The powerpoint presentation is available as a .pdf file <http://homepage.mac.com/jenkins/Papers/OCLC.pdf>
Unicode is a trademark owned by the Unicode Consortium <http://www.unicode.org> ; it can't be part of a product name that is trademarked by someone else.  The Unicode Standard is available both in book and online format. The most recent standard is Unicode 3.1.  The ISO/IEC 10646 is very close to Unicode, though not exactly the same.  The original work on Unicode was done by Xerox and has since gone worldwide.  With each enhancement to Unicode, the number of ideographs has increased dramatically. The most recent version (3.1) includes 43,253 new ideographs.
There are ten design principles.

    1. 1.Unicode text is simple to parse and process

    2. 2.Unicode text is not stateful (if you lose part of the text the rest can still be interpreted)

    3. 3.Unicode encodes characters not glyphs

    4. 4.Unicode defines plain text (does not deal with rich text)

    5. 5.Unicode uses logical order (e.g. Bengali, Arabic)

    6. 6.Unicode unifies characters from different scripts (e.g. Chinese, Japanese, Korean)

    7. 7.Unicode uses dynamic composition (so you can get to ideographs by using description sequences)

    8. 8.Unicode uses equivalent sequences (e +' is equivalent to é)

    9. 9.Unicode is convertible (it is a superset of most character sets in current use - but not EACC or CCCII) e.g. convert Shift-JIS to Unicode to GB. Unicode will probably tackle EACC in the future.

    10. 10.There are a variety of benefits to Unicode use

  1. How do ideographs get into the standard? The number of ideographs is huge, and the problem is how to decide which ideographs get into the standard. The problem is solved by the Ideograph Rapporteur Group (IRG)<http://www.cs.cuhk.edu.hk/~irg/>, which decides that characters in one character set are the same as characters in another set. Each character is also given a dictionary position (e.g. in Kangxi dictionary, Dai Kanwa jiten, Hanyu da ci dian,Daejaweon). Virtual positions are also assigned to characters that are missing from particular dictionaries. There are duplicate ideographs in Unicode to cover variant pronunciations and compatible ideographs. There are now 71,089 ideographs in the standard, with more unique ideographs than in any dictionary.

Retrospective Conversion Projects at Harvard-Yenching Library- James Lin, Harvard-Yenching Library
In 1996 the Harvard-Yenching Library signed a contract with OCLC to work on the second phase of retrospective conversion for its East Asian collections.  This project, the largest OCLC CJK project to date, will be completed on time by the end of June 2001. As of that date all Harvard-Yenching Library's titles will be accessible online worldwide.
In the past eight years, the Harvard-Yenching Library has undertaken two retrocon projects. In the first one, grants from the Korean Foundation in Seoul, Korea, and the United Daily News Group in Taipei, Taiwan, supported the conversion of 17,000 Korean and 42,500 Chinese card catalog records into machine-readable format, with both romanized and vernacular scripts.  OCLC was selected to work on the project, and an official contract was signed between Harvard University and OCLC on October 22, 1993.
The project started in January 1994 and was completed in January 1995 and was carried out according to contract. In fact, OCLC finished the work a week earlier than the scheduled date.
The second retrocon project started in June 1996, also contracting with OCLC to covert approximately 325,000 catalog cards to machine-readable format over a period of five years. The project was funded jointly by Harvard University and the Harvard-Yenching Institute, with each committing up to 1.1 million dollars.
The materials to be converted during this project include Chinese, Japanese, Korean, and Vietnamese monographs, serials, and microforms. CJK rare books are also included. It is worth mentioning that we try to provide analytics to every big series in our collection. For example a 1987 publication of a Korean series: Han'guk yôktae munjip ch'ongsô, collected works of 3,500 Korean authors in 3,000 volumes. We analyzed every title in that series. We completed the analytics for the Chinese "Si ku quanshu" series, and all other "Si ku" related series.
The final completion of the retrocon project enables the Harvard-Yenching Library to realize its long-standing goal: the computerization of its entire catalog in both romanized and vernacular scripts. It also enriches the OCLC WorldCat database and makes these valuable East Asian research materials readily available to scholars and researchers around the world.

Pinyin Task Force Report - by Sarah Elman, University of California, Los Angeles
The OCLC CJK Users Group Pinyin Task Force was asked by OCLC to review its first conversion test file in December 2000. A total number of 440 pairs of records (consisting of "before" and "after" images of the bibliographic records) were sent to us. The records were divided equally among the five Task Force members. Review findings and comments were sent to OCLC in early January. OCLC then worked on improving the conversion program based on our comments. The second test file, consisting of the same set of records, was sent to us in late January. We completed the second review in early February.
The second test file showed noticeable improvements, and some of the problems in the previous test file were successfully corrected. However, only about 35% of the records were error-free. The following problems remained unsolved as of the 2nd review:

1.  Some Wade-Giles elements did not convert -- This occurred mostly in

  1.  Qualifiers of conference names (x11 -- $c)

  2.  Statement of responsibility in 245 field ($c -- many records only converted partially)

  3.  Publication date (260 $c)

  4.  Name of parts and volume designation for series ($p and $v)

  5.  Notes containing mixed text or text in quotation/parentheses

  6.  Geographic subdivisions (650 $z, especially the 2nd $z)

  7.  Subordinate body in x10 $b

  8.  Wade-Giles texts in vernacular fields.

2.  Converted Pinyin subject headings in test records do not match the authority file.

  1. We hope that OCLC will devote more effort to the authority control aspect of the conversion program. Many libraries will depend on OCLC to deliver authority records based on the converted bibliographic records so it is important to ensure that access points in converted bibliographic records are accurate and conform to the national authority file.

3.  Inadvertent conversion (i.e., Wade-Giles data which should not have been converted to Pinyin) Examples include:

  1. Western language titles and names that happen to be in the Wade-Giles form, such as parallel titles and the statements of responsibility following the parallel titles

  2.  English words, such as "to", were converted

4.  The names of single-character Chinese counties have been incorrectly converted to one word. Examples:

  1. Huaxian             Correct form:  Hua Xian

  2. Lingxian            Correct form:  Ling Xian


5.  Inconsistency in capitalization and word division of names of jurisdictions and geographic features, such as Sheng, Shi, Xian, Xingzhengqu, Zhonghua Renmin Gongheguo, etc.
6.  Apostrophe is not presented when the first syllable ends with the letter n and the second begins with the letter g.
7.  Gibberish appears after conversion, especially in the 245 field.
8.  Another major problem is the "garbage in garbage out" phenomenon. Many records have typos, spelling errors, and incorrect diacritics, etc.  The inconsistent usage of hyphens in names is among the most challenging tasks that OCLC and all librarians need to grapple.

OCLC Reports
OCLC Reports to the CJK Users Group are available from: ftp://ftp.rsch.oclc.org/pub/documentation/cjk_users_group/
Due to time constraints only the first two presentations were given.  Written reports covering the rest of the intended presentations were included in the written documentation distributed by OCLC at the meeting.

OCLC 21st Century Global Strategy: Focusing on Metadata Services Roadmaps -- Marty Withrow,  Director, Metadata Services Division
Marty Withrow began by recognizing the outgoing committee members for their service. He outlined the three-year strategic plan that Karl Lo mentioned earlier.  OCLC has made a decision to change its strategic plan to reflect the changes that libraries and librarians have made from being custodians of the book to being service-oriented information managers.
WorldCat is going worldwide. Instead of waiting for libraries to bring their data to Dublin, OCLC is going to use linking technology to go out to the data. Data means more than books; it will also include images, sound files, and other data.
The Extended WorldCat will cover four major service areas:

1. Metadata

  1. Metadata reflects OCLC's move from providing support for traditional cataloging records to comprehensive metadata creation.  The metadata format structures will expand to include structures appropriate for materials held in museums, art institutes, and other institutions.  The new metadata structures will be based on standards and can be integrated with local systems and materials vendors.  It will provide multiformat, multilanguage, multistyle, comprehensive coverage. This is a change whereby OCLC is going to reach beyond its own borders to find data. If records are not in OCLC, then we will obtain records/data from other countries. OCLC will provide a variety of services such as metadata maintenance (authorities, bib notification), metalinking (tables of contents, links to vendors or publishers), Just-in-time metadata (like PromptCat), and contract work (TechPro). At the same time, OCLC will work to eliminate separate products (CatME, CJK, Passport) and move to a one-stop shopping browser for metadata creation/retrieval.  Phase One will see an enhanced CatME and a merger of CORC and CatExpress.  Phase Two will see web-based ILL software and the elimination of Passport. Phase Three will pull together all the metadata softwares and functions into one package.


2. Archiving and content management

  1. This involves creating a digital vault and making it available worldwide. Examples would include harvesting and archiving websites.


3. Discovery and navigation

  1. An example of this is acting as a Google Library Partner "find it at my library." People could set up profiles that integrate local library holdings and purchase options (Amazon etc.) into search engines. Another example would be virtual reference services (24/7 Ask-a-Reference-Librarian service).


4. Service fulfillment

  1. An example of this would be enhanced interlibrary loan and links between interlibrary loan and bookstores. Another example might be enhanced profiling, where content is delivered in your language.


  1. Marty Withrow's final message to the group was "Weave Libraries into the Web and the Web into Libraries." Questions and Answers:
    When will we be able to do NACO from within CJK software? We will have to wait until the integrated software is available.
    When will we be able to do CJK in CORC? It won't be until the summer of 2002 when OCLC will provide Unicode support.
    When will we be able to include vernacular in ILL requests? OCLC is working on the implementation of the ISO protocol for record exchange.
    OCLC Metadata Policy and Standards: Focusing on OCLC Pinyin Conversion -- Glenn Patton, Director, Metadata Policy and Standards Division)
    Glenn Patton's presentation focused on the Pinyin Conversion project.
    Details on the OCLC Pinyin Conversion Project are available from <http://www.oclc.org/oclc/pinyin/index.htm>
    Details on the Pinyin Conversion timeline and procedures are available from: <http://lcweb.loc.gov/catdir/pinyin/>
    Details on Local Catalog Conversion through OCLC are available from:
    <http://www.oclc.org/oclc/pinyin/1localcat.htm>
    Meeting participants received a summary sheet from Glen with details on the conversion process. "OCLC Metadata Standards and Quality Update"
    There were 152,000 authority records converted in September 2000.

  2. 121,000 converted cleanly

  3. 31,000 were converted and marked for manual review

  4. 8,400 non-unique names were converted and marked for manual review

  5. Generally speaking, the conversion went well, but some Wade-Giles headings were not retained in the conversion process because the cross-references were not called for (name-title cross references, subordinate references). Participants at ALA Midwinter called for those headings to be retained for automated authority control. Approximately 25,000 headings have been retrieved and added back to the records as cross references. They are coded as |w nnea (earlier established form, may not display). The Library of Congress will make all the converted headings available through its distribution. OCLC is currently working on the final and 5th test of the conversion software. Almost all problems noted by the Pinyin Conversion Task Force have been dealt with during the last round of conversion software tweaking. The only issue that OCLC can't deal with is the "garbage in garbage out" problem in the original Wade-Giles records.  Missing diacritics in Wade-Giles words will generate incorrect Pinyin words.
    First round of conversion in OCLC will be CONSER serial records. Next will be the WorldCat Chinese records, beginning with the newest records first. Then OCLC will work on other records that contain Wade-Giles. The emphasis is on fixing access points first. All conversion activity is targeted to be completed by Fall 2001.
    Local file conversion will begin once the WorldCat software is done because the same software will be used. There are six options but the three basic choices remain:

    1. 1.conversion of your local database

    2. 2.conversion based on archive records of your OCLC activity

    3. 3.delivery of converted master records


  6. See the OCLC website for prices and details. OCLC wants its users to know that OCLC has become much more proactive about doing maintenance to the database and correcting errors in bibliographic records.

Questions and Answers.

  1. When will OCLC be able to begin doing local file conversion? It will probably be May.

  2. Is there a deadline to sign up for local file conversion? No, it can be done anytime once the software is ready.

  3. What is the order for conversion? CONSER serials will be converted first. Then recent Chinese language records in WorldCat. But, there are many versions of WorldCat, so OCLC needs to be careful to synchronize conversion of the various files.

  4. What happens to records marked for review? OCLC hasn't come up with a final plan yet, but hopes to work with its Chinese language staff to review the records.  OCLC wants participants to know that they too will have their own records marked for review that will need to be dealt with at an institutional level.


The meeting adjourned at 12:30.


Respectfully submitted,

Sharon Domier, Recorder
University of Massachusetts Amherst