The “Dai-Onsen(大音泉)” or “ the series of Corpus of Spontaneous Speech in Japanese” (CMSJ/CDSJ/CCSJ/CLSJ/CKSJ/CPSJ) is a database containing a large collection of Japanese speech data with transcription originally developed by Timehill Inc. (2016-2022, Japan).
This is the world's largest speech corpus in the field of free speech and has ever been used for both academic and commercial purposes mainly in US, UK and EU.
This corpus will be used also for a wide variety of research purposes such as linguistics, phonetics, pragmatics, semantics, lexicology, cognitive science, psychology, sociology, cultural anthropology, dialect studies and for the Japanese language studies, the materials for speech therapists or translators, and a sort of creative tips for producing dramas/films.
The price of this corpus varies depending on the purpose of use, period, region, quantity, etc. If you are interested in this corpus, please contact us at:
info@timehill.biz
Then, after some necessary email back and forth, we will offer the sales price of the corpus and the contract document. Thank you for your understanding.
The list and the outline of this corpus is as follows
■48kHz/16bit/wav, Mic(Shure SM10 and others), recorded in the studio booth
★16kHz/16bit/wav, Mic(iPhone 13 or other type of smartphones), recorded in the quiet room
■CMSJ: Corpus of Monologue Speech in Japanese (300 hours in total)
1. Duration > 15-20 minutes per speech file
2. Speaker > 1 Adult/session (file) [1,200 Adults in total]
3. Topics >
① My Life Story
② My Miracle Year
③ My daily life or Favorite things
■CDSJ: Corpus of Dialogue Speech in Japanese (300 hours in total)
1. Duration > 20-30 minutes per speech file
2. Speaker > 2 Adults/session (file) [1,200 Adults in total]
① 2 Adults as a role of Shop clerk and the Customer
③ 2 Adults as Friends
② 2 Adults as Brothers or Sisters /Parents and their child
3. Topics >
① Conversation Between Shop clerk and the Customer
② Conversation Between Friends
③ Conversation Between Brothers or Sisters / Parents and their child
■CCSJ: Corpus of Conversation Speech in Japanese (100 hours in total)
1. Duration > 30-40 minutes per speech file
2. Speaker > 3 Adults/session (file) [600 Adults in total]
① 3 Adults as a role of Shop clerk and the Customers
② 3 Adults as Friends
④ 3 Adults as Brothers or Sisters / Parents and their child
3. Topics >
① Conversation Among Shop clerk and the Customers
② Conversation Among Friends
③ Conversation Among Brothers or Sisters / Parents and their child
■CLSJ: Corpus of Lecture Speech in Japanese (80 hours in total)
1. Duration > 15-30 minutes per speech file
2. Speaker > 1 Professor/session (file) [ 5 Professors in total]
3. Topics >
① Humanities
② Economics
③ Advanced science and technology
■CKSJ: Corpus of Kids Speech in Japanese (50 hours in total)
1. Duration > 15-20 minutes per speech file
2. Speaker > 1 Kid (Age4 to10) /session (file) [200 Kids in total]
3. Topics >
◎ Daily life or Favorite things
■CPSJ: Corpus of Telephony Speech in Japanese (300 hours in total)
1. Duration > 10-30 minutes per speech file
2. Speaker > 2 Adults/session (file) [1,200 Adults in total]
① 2 Adults as a role of Shop clerk and the Customer
② 2 Adults as Friends
④ 2 Adults as Brothers or Sisters /Parents and their child
3. Topics >
① Conversation Between Shop clerk and the Customer
③ Conversation Between Friends
③ Conversation Between Brothers or Sisters / Parents and their child