Evaluating corpus-based speech synthesis systems
Brazil is a continental country, the largest and most populous in South America. The Speech Synthesis Workshop (SSW) 2027 will bring up the issue of linguistic diversity to highlight two of several Brazilian accents. In line with this theme, the Blizzard Challenge 2027 will focus on synthesizing speech for two accents of Brazilian Portuguese and for one other main variant, European Portuguese from Portugal.
Brazilian Portuguese and European Portuguese differ in pronunciation, vocabulary, grammar, and formal usage. While Brazilian Portuguese has a more "open", melodic, and nasal sound, European Portuguese is more "closed", often omitting unstressed vowels.
We will provide participating teams with data covering the three accents, São Paulo, Recife, and European Portuguese from Portugal, totalling ~790 hours.
The task this year is accented zero-shot TTS for spontaneous Portuguese speech with a mandatory hub and an optional spoke task, described in the Rules.
Accented zero-shot TTS in spontaneous Portuguese speech, restricted to publicly available datasets.
Accented zero-shot TTS in spontaneous Portuguese speech, with unrestricted data sources.
The main difficulty will be to disentangle accent and timbre, allowing the model to reproduce accentual variations without compromising speaker identity.
This challenge is compounded by the limited availability of Portuguese data, as well as the fact that we will provide only spontaneous speech recordings, which may be of lower quality than typical read speech recordings, making the task even more demanding. Participating teams will be free to enhance audio quality, should they choose to. The challenge will provide a large dataset of approximately 790 hours of speech, covering the 3 accents, but teams are not required to use all (or indeed any) of this. The target speakers for the zero-shot Hub and Spoke tasks are unseen and not included in this dataset, but they will be from one of the three accents represented in the dataset.
Another challenge is the diverse annotation of the provided datasets. For all accents, a transcription will be provided, but teams are allowed to re-transcribe the speech if they wish. Data in the Recife subset has a manual prosodic segmentation; data in the São Paulo subset has been automatically segmented by WhisperX; data in the European Portuguese (EP) subset has also been automatically segmented by state-of-the-art EP CAMÕES ASR model for transcription-reference alignment.
Please note the deadlines and milestone dates below for participating in the Blizzard Challenge 2027.
| July 1, 2026 | Challenge announcement and registration open |
| July 1, 2026 | Training dataset released |
| January 12, 2027 | Team registration closes |
| February 1, 2027 | Test dataset released to participants |
| April 12, 2027 | Deadline for participants to submit their systems (23:59 AoE) |
| April 13, 2027 | Last day for registration fee payment |
| April 19, 2027 | Evaluation systems go live |
| June 19, 2027 | End of evaluation period |
| June 21, 2027 | Results announced |
| July 21, 2027 | Deadline for workshop submissions (23:59 AoE) |
| August 1, 2027 | Notification of acceptance |
| August 16, 2027 | Camera ready version |
| Aug 29 - Sep 2, 2027 | INTERSPEECH 2027, São Paulo, Brazil |
| September 6-8, 2027 | SSW14, Maceió, Alagoas, Brazil |
| September 9, 2027 | Blizzard Challenge 2027, Maceió, Alagoas, Brazil |
ICMC-USP
ICMC-USP
Venturus, ICMC-USP
IST, University of Lisbon
ICMC-USP
IST, Univ. of Lisbon, INESC-ID
INESC-ID
UFMT, CEIA
For inquiries about the conference, please send an email to: here
The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.