This corpus is a set of 70 files of interviews with 70 children ages 2;6 to 5;6. These data were collected and processed at Hong Kong University by Zehava Weizman, Faul Fletcher, and Emily Ma. This corpus consists of 70 transcripts of audio-recordings from a cross-sectional study of 70 Cantonese-speaking children. This naturalistic spoken language data involve 10 children, five boys and five girls, at each 6 months interval between 2.5 years and 5.5 years of age to cover the whole preschool range. The children were recruited from a Cantonese-speaking pre-school in Hong Kong. Although socioeconomic status was not taken into account with respect to recruitment, the children were predominantly middle-class.Each child in the sample was prescreened, pretested using the Reynell Developmental Language Scales (RDLS Cantonese Version), and audio taped in conversation for a total time of about one hour. The adult-child language sampling was carried out in the child’s preschool. A warm-up task was conducted at the beginning of the session to insure that the child was comfortable with the investigator and the task. The language sample aimed to elicit a minimum of 100 utterances, usually achieved within 20 minutes. To ensure sufficient opportunities for verbalization and diversity of syntactic and lexical forms, but also to achieve as much comparability as possible across children, the conversation was organized around familiar bath/dress/feed/sleep routines. The children’s dates of birth and ages are available in the headers to each transcript. Digitized audio is also available.