spych.data.dataset¶

Data structure¶

class spych.data.dataset.Dataset(path=None, loader=None)¶

Represents an audio dataset.

Notes on paths: All paths stored in the dataset object (audio files, features) are absolute.

Parameters:	path – Path to a directory on the filesystem, which acts as root folder for the dataset. If no path is given the dataset cannot be saved on disk. loader (`spych.data.dataset.io.DatasetLoader`) – This object is used to save the dataset. By default `spych.data.dataset.io.SpychDatasetLoader` is used.

add_features(utterance_idx, feature_matrix, feature_container)¶

Adds the given features to the dataset. Features are stored directly to the filesystem, so this dataset has to have a path set.

Parameters:	utterance_idx – Utterance to which the features correspond. feature_matrix – A numpy array containing the features. feature_container – Name of the container to store the features in.

add_file(path, file_idx=None, copy_file=False)¶

Adds a new file to the dataset.

Parameters:	path – Path of the file to add. file_idx – The id to associate the file with. If None or already exists, one is generated. copy_file – If True the file is copied to the dataset folder, otherwise the given path is used directly.
Returns:	File object

add_segmentation(utterance_idx, segments=None, key=None)¶

Adds a new segmentation.

Parameters:	utterance_idx – Utterance id the segmentation is associated with. segments – Segments can be a string (will be space separated into tokens) or a list of segments. key – A key this segmentation is assiciated with. (If None the default key is used.)
Returns:	Segmentation object

add_speaker(speaker_idx=None, gender=None)¶

Adds a new speaker to the dataset.

Parameters:	speaker_idx – The id to associate the speaker with. If None or already exists, one is generated. gender – Gender of the speaker.
Returns:	Speaker object

add_subview(name, subview)¶: Add the subview to this dataset.

add_utterance(file_idx, utterance_idx=None, speaker_idx=None, start=0, end=-1)¶

Adds a new utterance to the dataset.

Parameters:	file_idx – The file id the utterance is in. utterance_idx – The id to associate with the utterance. If None or already exists, one is generated. speaker_idx – The speaker id to associate with the utterance. start – Start of the utterance within the file [seconds]. end – End of the utterance within the file [seconds]. -1 equals the end of the file.
Returns:	Utterance object

create_feature_container(name, path=None)¶: Create a new feature container

export_subview(name)¶: Return a subview as a standalone dataset.

generate_features(feature_pipeline, target_feature_name, source_feature_name=None)¶: Creates new feature container with features generated with the given pipeline. If source_feature_name is not given the pipeline needs an extraction stage.

import_dataset(import_dataset, copy_files=False)¶

Merges the given dataset into this dataset.

Parameters:	import_dataset – Dataset to merge copy_files – If True moves the wavs to this datasets folder.

import_file(file, copy_file=False)¶: Import a copy the given file and return the new file obj.

import_segmentation(segmentation)¶: Adds an existing segmentation to the dataset. Uses key and utterance-id from the segmentation object.

import_speaker(speaker)¶: Import a copy of the given speaker and return the new speaker.

import_utterance(utterance)¶: Import a copy of the given utterance and return the new utterance.

classmethod load(path, loader=None)¶: Loads the dataset from the given path, using the given loader. If no loader is given the spych loader is used.

name¶: Get the name of the dataset (Equals basename of the path, if not None.) :return: name

num_subviews¶: Return number of subviews.

remove_files(file_ids, delete_files=False)¶

Deletes the given wavs.

Parameters:	file_ids – List of file_idx’s or fileobj’s delete_files – Also delete the files

remove_utterances(utterance_ids)¶

Removes the given utterances by id.

Parameters:	utterance_ids – List of utterance ids

save()¶: Save this dataset at self.path.

save_at(path, loader=None, copy_files=False)¶

Save this dataset at the given path. If the path differs from the current path set, the path gets updated.

Parameters:	path – Path to save the dataset to. loader – If you want to use another loader (e.g. to export to another format). Otherwise it uses the loader associated with this dataset. copy_files – If true the files are also stored in the new path, if not already there.

subdivide_speakers(target_number_of_speakers)¶

Divide the available speakers in the dataset into different speakers so the number of speakers is target_number_of_speakers.

Parameters:	target_number_of_speakers – Target number of speakers

class spych.data.dataset.Subview(filtered_utterances=set(), filtered_speakers=set(), dataset=None)¶

A subview is a filtered view on a dataset. For example it only uses a subset of utterance-id’s.

does_utterance_match(utterance)¶: Return True if the given utterance matches all filter criteria. Otherwise return False.

Validation¶

class spych.data.dataset.Validator(metrics=[], expected_file_format=<spych.audio.format.AudioFileFormat object>)¶

Class to validate a dataset.

classmethod full_validator(expected_file_format=<spych.audio.format.AudioFileFormat object>)¶: Returns a validator, that checks all metrics.

static get_features_with_missing_file(dataset)¶: Return a dictionary (feature-container, list of missing utterance-ids) where there is no feature file.

static get_files_empty(dataset)¶: Return a list of file-idx’s that contain no data.

static get_files_missing(dataset)¶: Return a list of file-idx’s where the actual file is missing.

static get_files_with_wrong_format(dataset, expected_format=<spych.audio.format.AudioFileFormat object>)¶: Return a list of file-idx’s that don’t conform the given audio format.

static get_files_without_utterances(dataset)¶: Return a list of file-idx’s that don’t reference any utterances.

static get_speakers_without_gender(dataset)¶: Return a list of speaker-idx’s, where the gender is not defined.

static get_utterances_with_invalid_start_end(dataset)¶

Check if there are any utterances that have invalid start/end time.

Must be:

float
can be empty –> 0 -1
end >= start
start >= 0
end >= 0 or end = -1

Parameters:	dataset – Dataset to check.
Returns:	List of utterance-ids with invalid start/end.

static get_utterances_with_missing_file_idx(dataset)¶: Return a list of utterances that reference a wav-id that isn’t existing.

static get_utterances_with_missing_speaker(dataset)¶: Return list without or invalid speaker.

static get_utterances_without_segmentation(dataset)¶: Return a dictionary of key/utterance-idx’s where no segmentation is available.

validate(dataset)¶: Return the validation results for the selected validation metrics. (dictionary metric/results)

Rectification¶

class spych.data.dataset.Rectifier(tasks=[], expected_file_format=<spych.audio.format.AudioFileFormat object>)¶

Provides functionality to fix invalid dataset parts.

static remove_empty_wavs(dataset_to_fix)¶: Delete all files that have no content.

static remove_files_missing(dataset_to_fix)¶: Delete all files references, where the referenced wav file doesn’t exist.

static remove_files_with_wrong_format(dataset_to_fix, expected_format=<spych.audio.format.AudioFileFormat object>)¶: Delete all wav files with the wrong format (Sampling rate, sample width, …).

static remove_utterances_with_missing_file(dataset_to_fix)¶: Remove all utterances where the referenced file doesn’t exist.

Iteration¶

class spych.data.dataset.BatchGenerator(datasets)¶

Class that provides functions to generate batches from a single or multiple datasets. If multiple datasets only utterances are considered, that exist in all of the given datasets.

batches_with_features(feature_name, batch_size, feature_pipeline=None)¶

Return a generator which yields batches. One batch contains concatenated features from batch_size utterances.

Parameters:	feature_name – Name of the feature container in the dataset to use. batch_size – Number of utterances in one batch feature_pipeline – If not None processes the features with the pipeline.
Returns:	generator

batches_with_spliced_features(feature_name, batch_size, splice_sizes=0, splice_step=1, repeat_border_frames=True)¶

Return a generator which yields batches. One batch contains the concatenated features of batch_size utterances. Yields list with length equal to the number of datasets in this generator. The list contains ndarrays with the concatenated features.

Parameters:	feature_name – Name of the feature container in the dataset to use. batch_size – Number of utterances in one batch splice_sizes – Appends x previous and next features to the sample. (e.g. if splice_size is 4 the sample contains 9 features in total).

If a list of ints is given the different splice-sizes for the different datasets are used. :param splice_step: Number of features to move forward for the next sample. :param repeat_border_frames: If True repeat the first and last frame for splicing. Otherwise pad with zeroes. :returns: generator

batches_with_utterance_idx_and_features(feature_name, batch_size, feature_pipeline=None)¶

Return a generator which yields batches. One batch contains features from batch_size utterances. Yields a list of lists. Every sublist contains first the utterance id and following the feature arrays (ndarray) of all datasets.

e.g. If there are two source datasets:

[: [utt_id, dataset_1_feature, dataset_2_feature], [utt_id2, dataset_1_feature2, dataset_2_feature2], …

]

Parameters:	feature_name – Name of the feature container in the dataset to use. batch_size – Number of utterances in one batch feature_pipeline – If not None processes the features with the pipeline.
Returns:	generator

batches_with_utterance_idxs(batch_size)¶: Return a generator which yields batches. One batch is a list of utterance-ids of size batch-size.