spych.data.dataset.io

class spych.data.dataset.io.SpychDatasetLoader(main_features=None)
class spych.data.dataset.io.LegacySpychDatasetLoader(main_features=None)

Loads dataset from the old spych format.

wavs.txt: [wav-id] [relative-wav-path]

utterances.txt : [utt-id] [wav-id] [start] [end]

transcriptions.txt : [utt-id] [transcription]

transcriptions_raw.txt : [utt-id] [transcription raw] Transcription with punctuation.

utt2spk.txt : [utt-id] [speaker-id]

speaker_info.json : {

“speaker_id” : {
“gender” : “m”/”f”, …

}

class spych.data.dataset.io.KaldiDatasetLoader(main_features=None)
static feature_scp_generator(path)

Return a generator over all feature matrices defined in a scp.

static read_float_matrix(rx_specifier)

Return float matrix as np array for the given rx specifier.

static write_float_matrices(scp_path, ark_path, matrices)

Write the given dict matrices (utt-id/float ndarray) to the given scp and ark files.

class spych.data.dataset.io.TudaDatasetLoader(main_features=None)

Loads german tuda corpus.