spych.data.dataset

Data structure

class spych.data.dataset.Dataset(path=None, loader=None)

Represents an audio dataset.

Notes on paths: All paths stored in the dataset object (audio files, features) are absolute.

Parameters:
  • path – Path to a directory on the filesystem, which acts as root folder for the dataset. If no path is given the dataset cannot be saved on disk.
  • loader (spych.data.dataset.io.DatasetLoader) – This object is used to save the dataset. By default spych.data.dataset.io.SpychDatasetLoader is used.
add_features(utterance_idx, feature_matrix, feature_container)

Adds the given features to the dataset. Features are stored directly to the filesystem, so this dataset has to have a path set.

Parameters:
  • utterance_idx – Utterance to which the features correspond.
  • feature_matrix – A numpy array containing the features.
  • feature_container – Name of the container to store the features in.
add_file(path, file_idx=None, copy_file=False)

Adds a new file to the dataset.

Parameters:
  • path – Path of the file to add.
  • file_idx – The id to associate the file with. If None or already exists, one is generated.
  • copy_file – If True the file is copied to the dataset folder, otherwise the given path is used directly.
Returns:

File object

add_segmentation(utterance_idx, segments=None, key=None)

Adds a new segmentation.

Parameters:
  • utterance_idx – Utterance id the segmentation is associated with.
  • segments – Segments can be a string (will be space separated into tokens) or a list of segments.
  • key – A key this segmentation is assiciated with. (If None the default key is used.)
Returns:

Segmentation object

add_speaker(speaker_idx=None, gender=None)

Adds a new speaker to the dataset.

Parameters:
  • speaker_idx – The id to associate the speaker with. If None or already exists, one is generated.
  • gender – Gender of the speaker.
Returns:

Speaker object

add_subview(name, subview)

Add the subview to this dataset.

add_utterance(file_idx, utterance_idx=None, speaker_idx=None, start=0, end=-1)

Adds a new utterance to the dataset.

Parameters:
  • file_idx – The file id the utterance is in.
  • utterance_idx – The id to associate with the utterance. If None or already exists, one is generated.
  • speaker_idx – The speaker id to associate with the utterance.
  • start – Start of the utterance within the file [seconds].
  • end – End of the utterance within the file [seconds]. -1 equals the end of the file.
Returns:

Utterance object

create_feature_container(name, path=None)

Create a new feature container

export_subview(name)

Return a subview as a standalone dataset.

generate_features(feature_pipeline, target_feature_name, source_feature_name=None)

Creates new feature container with features generated with the given pipeline. If source_feature_name is not given the pipeline needs an extraction stage.

import_dataset(import_dataset, copy_files=False)

Merges the given dataset into this dataset.

Parameters:
  • import_dataset – Dataset to merge
  • copy_files – If True moves the wavs to this datasets folder.
import_file(file, copy_file=False)

Import a copy the given file and return the new file obj.

import_segmentation(segmentation)

Adds an existing segmentation to the dataset. Uses key and utterance-id from the segmentation object.

import_speaker(speaker)

Import a copy of the given speaker and return the new speaker.

import_utterance(utterance)

Import a copy of the given utterance and return the new utterance.

classmethod load(path, loader=None)

Loads the dataset from the given path, using the given loader. If no loader is given the spych loader is used.

name

Get the name of the dataset (Equals basename of the path, if not None.) :return: name

num_subviews

Return number of subviews.

remove_files(file_ids, delete_files=False)

Deletes the given wavs.

Parameters:
  • file_ids – List of file_idx’s or fileobj’s
  • delete_files – Also delete the files
remove_utterances(utterance_ids)

Removes the given utterances by id.

Parameters:utterance_ids – List of utterance ids
save()

Save this dataset at self.path.

save_at(path, loader=None, copy_files=False)

Save this dataset at the given path. If the path differs from the current path set, the path gets updated.

Parameters:
  • path – Path to save the dataset to.
  • loader – If you want to use another loader (e.g. to export to another format). Otherwise it uses the loader associated with this dataset.
  • copy_files – If true the files are also stored in the new path, if not already there.
subdivide_speakers(target_number_of_speakers)

Divide the available speakers in the dataset into different speakers so the number of speakers is target_number_of_speakers.

Parameters:target_number_of_speakers – Target number of speakers
class spych.data.dataset.Subview(filtered_utterances=set(), filtered_speakers=set(), dataset=None)

A subview is a filtered view on a dataset. For example it only uses a subset of utterance-id’s.

does_utterance_match(utterance)

Return True if the given utterance matches all filter criteria. Otherwise return False.

Validation

class spych.data.dataset.Validator(metrics=[], expected_file_format=<spych.audio.format.AudioFileFormat object>)

Class to validate a dataset.

classmethod full_validator(expected_file_format=<spych.audio.format.AudioFileFormat object>)

Returns a validator, that checks all metrics.

static get_features_with_missing_file(dataset)

Return a dictionary (feature-container, list of missing utterance-ids) where there is no feature file.

static get_files_empty(dataset)

Return a list of file-idx’s that contain no data.

static get_files_missing(dataset)

Return a list of file-idx’s where the actual file is missing.

static get_files_with_wrong_format(dataset, expected_format=<spych.audio.format.AudioFileFormat object>)

Return a list of file-idx’s that don’t conform the given audio format.

static get_files_without_utterances(dataset)

Return a list of file-idx’s that don’t reference any utterances.

static get_speakers_without_gender(dataset)

Return a list of speaker-idx’s, where the gender is not defined.

static get_utterances_with_invalid_start_end(dataset)

Check if there are any utterances that have invalid start/end time.

Must be:
  • float
  • can be empty –> 0 -1
  • end >= start
  • start >= 0
  • end >= 0 or end = -1
Parameters:dataset – Dataset to check.
Returns:List of utterance-ids with invalid start/end.
static get_utterances_with_missing_file_idx(dataset)

Return a list of utterances that reference a wav-id that isn’t existing.

static get_utterances_with_missing_speaker(dataset)

Return list without or invalid speaker.

static get_utterances_without_segmentation(dataset)

Return a dictionary of key/utterance-idx’s where no segmentation is available.

validate(dataset)

Return the validation results for the selected validation metrics. (dictionary metric/results)

Rectification

class spych.data.dataset.Rectifier(tasks=[], expected_file_format=<spych.audio.format.AudioFileFormat object>)

Provides functionality to fix invalid dataset parts.

static remove_empty_wavs(dataset_to_fix)

Delete all files that have no content.

static remove_files_missing(dataset_to_fix)

Delete all files references, where the referenced wav file doesn’t exist.

static remove_files_with_wrong_format(dataset_to_fix, expected_format=<spych.audio.format.AudioFileFormat object>)

Delete all wav files with the wrong format (Sampling rate, sample width, …).

static remove_utterances_with_missing_file(dataset_to_fix)

Remove all utterances where the referenced file doesn’t exist.

Iteration

class spych.data.dataset.BatchGenerator(datasets)

Class that provides functions to generate batches from a single or multiple datasets. If multiple datasets only utterances are considered, that exist in all of the given datasets.

batches_with_features(feature_name, batch_size, feature_pipeline=None)

Return a generator which yields batches. One batch contains concatenated features from batch_size utterances.

Parameters:
  • feature_name – Name of the feature container in the dataset to use.
  • batch_size – Number of utterances in one batch
  • feature_pipeline – If not None processes the features with the pipeline.
Returns:

generator

batches_with_spliced_features(feature_name, batch_size, splice_sizes=0, splice_step=1, repeat_border_frames=True)

Return a generator which yields batches. One batch contains the concatenated features of batch_size utterances. Yields list with length equal to the number of datasets in this generator. The list contains ndarrays with the concatenated features.

Parameters:
  • feature_name – Name of the feature container in the dataset to use.
  • batch_size – Number of utterances in one batch
  • splice_sizes – Appends x previous and next features to the sample. (e.g. if splice_size is 4 the sample contains 9 features in total).

If a list of ints is given the different splice-sizes for the different datasets are used. :param splice_step: Number of features to move forward for the next sample. :param repeat_border_frames: If True repeat the first and last frame for splicing. Otherwise pad with zeroes. :returns: generator

batches_with_utterance_idx_and_features(feature_name, batch_size, feature_pipeline=None)

Return a generator which yields batches. One batch contains features from batch_size utterances. Yields a list of lists. Every sublist contains first the utterance id and following the feature arrays (ndarray) of all datasets.

e.g. If there are two source datasets:

[
[utt_id, dataset_1_feature, dataset_2_feature], [utt_id2, dataset_1_feature2, dataset_2_feature2], …

]

Parameters:
  • feature_name – Name of the feature container in the dataset to use.
  • batch_size – Number of utterances in one batch
  • feature_pipeline – If not None processes the features with the pipeline.
Returns:

generator

batches_with_utterance_idxs(batch_size)

Return a generator which yields batches. One batch is a list of utterance-ids of size batch-size.