On Recent Activities of Oriental COCOSDA

                                                               Oct. 22, 1996
Dear Oriental COCOSDA members,

It was very nice to see some of the members of Oriental COCOSDA at ICSLP'96 
in Philadelphia.  

I am sending you the following material:
  1)summary of the talk during the meeting, 
  2)revised members' list of Oriental COCOSDA, and 
  3)list of Japanese speech/text corpora most of which are open to the public. 

Please note that E-mail addresses and telephone/faximile numbers of some of 
the members were changed.  Please tell me if there is any change in the list.

We are planning to have a meeting of Oriental COCOSDA in Hong Kong in March 
1997.  Comments are welcome.

Best regards,


Institute of Information Sciences and Electronics
University of Tsukuba
1-1-1 Tennodai, Tsukuba, Ibaraki 305, Japan

Tel.   +81-298-53-5187/5503/5382
Fax.   +81-298-53-5206
                          ORIENTAL COCOSDA

Shuichi Itahashi talked with Profs. Jialu Zhang and Renhua Wang from China, 
Prof. Hsiao-Chuan Wang From Taiwan and Profs. Yongju Lee and Cheolwoo Jo from 
Korea.  He also met Prof. Hyong-Soon Kim from Korea.

The theme of the talk was to have a preparatory meeting of Oriental COCOSDA. 
Initially, it was supposed that it will be good to have such a meeting in 
Hong Kong in April 1977, as ICCPOL'97 is scheduled to be held there during 
April 2-4, 1977. 

 * ICCPOL: International Conference on Computer Processing of Oriental 

However, it has turned out that Japan-China Joint Symposium on Spoken 
Language is going to be held during Apr. 1 - 4 in China in which some 
Japanese and Chinese researchers will participate. 

Moreover, there is only 6 months left before next April and the period of 6 
months is too short to organize an international workshop.  Therefore, it 
would be more realistic to have a preparatory one-day meeting by small number 
of people first.

The meeting date should be either before or after these conferences.  
It seems that
                    Fri. Mar. 21, 1977

will be the most possible date for the meeting.

As I have not have any contact with somebody of the hosting place, we can not 
decide the date at this moment.  I will try to establish contact with 
a suitable person as soon as possible.

The agenda of the meeting will be:
1. Speech/text corpora of oriental languages:
   Chinese (Mainland, Hong Kong and Taiwan), Japanese, Korean.

2. Speech input/output systems standardization:
   Speech synthesis, speech recognition/understanding, dialogue understanding.

3. Oriental COCOSDA Workshop
   When ?  Where ?  How ?

4. Expanding Oriental COCOSDA to West Pacific COCOSDA including Australia and 
   some other districts.

5. Miscellaneous

Participants (tentative):

 China: Jialu Zhang (Institute of Acoustics, Chinese Academy of Science)
        Renhua Wang (University of Science and Technology of China)
        Ditang Fang (Tsing Hua University)
        Taiyi Huang (Institute of Automation, Chinese Academy of Science)

 Hong Kong: Chorkin Chan (Hong Kong University)

 Taiwan: Hsiao-Chuan Wang (National Tsing Hua University)

 Korea: Cheolwoo Jo (Changwon National University)
        Hyongsoon Kim (Pusan University)
        Yongju Lee (Won Kwang University)

 Japan: Hiroya Fujisaki (Science University of Tokyo)
        Satoru Hayamizu (Electrotechnical Laboratory)
        Shuichi Itahashi (University of Tsukuba)
        Tetsunori Kobayashi (Waseda University)
        Akira Kurematsu (University of Electrocommunicationsn)
        Kiyohiro Shikano (Nara Institute of Science and Technology)
        Toshiyuki Takezawa (ATR Interpreting Telecommunications Res. Lab.)
        Nick Campbell (ATR Interpreting Telecommunications Res. Lab.)
                 Japanese Speech and Text Corpora
                                                                 Oct. 7, 1996

1. ASJ Speech Corpus: 7 Cd-ROMs, Y3,000+@/vol.
   Vols. 1 to 3: ATR PB sentences, 64 speakers, 9,600 sentences, read speech.
   Vols. 4 to 6: Guide task sentences, 36 speakers, 12,474 sentences, 
                 read speech.
   Vol. 7: Simulated dialogues with transcription, 37 speakers, 37 dialogues.
   Research purpose use only

2. ATR Speech Corpus
   Sets A, C, E: Y600,000/vol.
   Sets B, F: Y350,000/vol.
   Set D: Y270,000/vol.   
   Academic discount available

   2.1 Set A (20 volumes): 8,500 isolated words, 20 professional speakers, 
                           20/12 kHz.
   2.2 Set B (10 volumes): 503 PB sentences, 12 speakers, phoneme labeling, 
                           20/12 kHz.
   2.3 Set C (24 volumes): 750 words from Set A and 150 sentences from Set B,
                           40 speakers, 20 kHz.
   2.4 Set D (2 volumes): 400 sentences, two professional speakers, phoneme 
                          labeling with prosodic & linguistic tags.
   2.5 Set E (4 Volumes): 5,000 PB English short sentences of high frequency
   2.6 Set F (6 bolumes): PB sentences, 6 professional speakers.

3. ATR Dialogue Corpora
   Y50,000 each, Academic discount available

   3.1 International Conference (Telephone)
   3.2 International Conference (Key board)
   3.3 Travel guide (Telephone)
   3.4 Travel guide (Key board)

4. JEIDA Corpus
   4.1 JEIDA JCSD Corpus: 323 items, 150 speakers, 76 DATs, 
                          CD-ROMs being produced with LDC.
   4.2 JEIDA Noise Database: 47 sorts of noise of 17 categories, 18 DATs, 
                             CD is also available.
   4.3 JEIDA Synthetic Speech CD: 9 Japanese speech sysnthesizers.

5. RWCP corpus
   5.1 Spoken Dialogue Corpus (48 dialogues) in 4 CD-ROMs
   5.2 Text Corpus including MITI white paper with morphological analysis data 
       and Mainichi Shimbun newspaper articles with morphological analysis data
   Research purpose use only

5. Priority Area Project "Spoken Dialogue"
   Vol. 1: 31 dialogues with 29 speakers
   Vol. 2: 19 dialogues with 14 speakers
   Vol. 3: 13 dialogues with 16 speakers
   Vol. 4: 26 dialogues, 5 lectures with 23 speakers
   Tasks: secretary system, schedule management, travel guide, puzzles, 
          telephone shopping, Map Task, etc.
   Distribution restricted

6. Japanese Newspaper Corpus
6.1. Nikkei Shimbun Newspaper
     1990 - 1994 full text version, index version contains headlines only, 
     with browsing tools, one CD-ROM for one year articles, Y130,000/vol.

6.2. Mainichi Shimbun Newspaper
     Y120,000 for 1991, 1992
     Y19,800 for 1993
7. EDR Electronic Dictionaries
   Basic cost: Y1,200k for each dictionary, Y9,000k for 8 dictionaries.
   Royalties required for commercial use.
   Academic discount available
   Refer to WWW ""

   7.1. Japanese Word Dictionary (250k words)
   7.2. English Word Dictionary (190k words)
   7.3. Concept Dictionary (400k concepts)
   7.4. Japanese-English Dictionary (230k words)
   7.5. English-Japanese Dictionary (160k words)
   7.6. Japanese Coocurrence Dictionary (900 kW)
        with Japanese text corpus of 220k sntences
   7.7. English Coocurrence Dictionary (460k words)
        with English text corpus of 160k sentences
   7.8. Technical Term Dictionary: Japanese(120k words), English(80k words)

8. ETL Speech Databases for Research
   PB 1542 words with acoustic phonetic labeling
   Research purpose use only

9. Priority Area Project"Spoken Japanese"
   18 CDs and 3 CD-ROMs including various Japanese spoken dialects
   Distribution restricted

10. Priority Area Project "Spoken Language" and DSR Project :"Spoken Japanese 
    2 CD-ROMs including isolated words and continuous speech
    Distribution restricted

11. Tohoku University and Panasonic Isolated Spoken Word Database in 6 CD-ROMs
    1)212 PB words spoken by 30 males and 30 females with phonemic labeling 
    2)3285 words spoken by 6 males and 6 females
    Distribution restricted
Web site:

Existing organizations related to spoken language processing.

1)Chinese COCOSDA
  Prof. Jialu Zhang and several members
2)KCCSLP: Korean Coordinating Committee for Spoken Language Processing
  Prof. Souguil Ann and several members
3)Speech Database Committee, Acoustical Society of Japan
  Prof. S. Itahashi and 30 members
4)Speech Input/Output Systems Expert Committee, JEIDA
  Prof. S. Itahashi and 20 members
  JEIDA: Japan Electronic Industry Development Association
5)LRSI: Linguistic Resources Sharing Initiative
  Dr. T. Yokoi and 24 members
6)Database Workshop of RWCP (Real World Computing Partnership) 
  Prof. S. Itahashi and 14 members
7)ATR Interpreting Telecommunications Research Laboratories
  Dr. Yamazaki and many members
9)JSPS Research Project by Special Grant for Promotion of Science and 
  Technology for Exploration of Future 
  "Man-Machine Dialogue Systems Through Spoken Language"    
  Prof. H. Fujisaki and 13 members
10)Monbusho International Scientific Research Program: Joint Research on 
   "Spoken Language Databases and Prosoic Labeling"
  Prof. H. Fujisaki and 14 members