spych.data.dataset¶
Data structure¶
-
class
spych.data.dataset.
Dataset
(path=None, loader=None)¶ Represents an audio dataset.
Notes on paths: All paths stored in the dataset object (audio files, features) are absolute.
Parameters: - path – Path to a directory on the filesystem, which acts as root folder for the dataset. If no path is given the dataset cannot be saved on disk.
- loader (
spych.data.dataset.io.DatasetLoader
) – This object is used to save the dataset. By defaultspych.data.dataset.io.SpychDatasetLoader
is used.
-
add_features
(utterance_idx, feature_matrix, feature_container)¶ Adds the given features to the dataset. Features are stored directly to the filesystem, so this dataset has to have a path set.
Parameters: - utterance_idx – Utterance to which the features correspond.
- feature_matrix – A numpy array containing the features.
- feature_container – Name of the container to store the features in.
-
add_file
(path, file_idx=None, copy_file=False)¶ Adds a new file to the dataset.
Parameters: - path – Path of the file to add.
- file_idx – The id to associate the file with. If None or already exists, one is generated.
- copy_file – If True the file is copied to the dataset folder, otherwise the given path is used directly.
Returns: File object
-
add_segmentation
(utterance_idx, segments=None, key=None)¶ Adds a new segmentation.
Parameters: - utterance_idx – Utterance id the segmentation is associated with.
- segments – Segments can be a string (will be space separated into tokens) or a list of segments.
- key – A key this segmentation is assiciated with. (If None the default key is used.)
Returns: Segmentation object
-
add_speaker
(speaker_idx=None, gender=None)¶ Adds a new speaker to the dataset.
Parameters: - speaker_idx – The id to associate the speaker with. If None or already exists, one is generated.
- gender – Gender of the speaker.
Returns: Speaker object
-
add_subview
(name, subview)¶ Add the subview to this dataset.
-
add_utterance
(file_idx, utterance_idx=None, speaker_idx=None, start=0, end=-1)¶ Adds a new utterance to the dataset.
Parameters: - file_idx – The file id the utterance is in.
- utterance_idx – The id to associate with the utterance. If None or already exists, one is generated.
- speaker_idx – The speaker id to associate with the utterance.
- start – Start of the utterance within the file [seconds].
- end – End of the utterance within the file [seconds]. -1 equals the end of the file.
Returns: Utterance object
-
create_feature_container
(name, path=None)¶ Create a new feature container
-
export_subview
(name)¶ Return a subview as a standalone dataset.
-
generate_features
(feature_pipeline, target_feature_name, source_feature_name=None)¶ Creates new feature container with features generated with the given pipeline. If source_feature_name is not given the pipeline needs an extraction stage.
-
import_dataset
(import_dataset, copy_files=False)¶ Merges the given dataset into this dataset.
Parameters: - import_dataset – Dataset to merge
- copy_files – If True moves the wavs to this datasets folder.
-
import_file
(file, copy_file=False)¶ Import a copy the given file and return the new file obj.
-
import_segmentation
(segmentation)¶ Adds an existing segmentation to the dataset. Uses key and utterance-id from the segmentation object.
-
import_speaker
(speaker)¶ Import a copy of the given speaker and return the new speaker.
-
import_utterance
(utterance)¶ Import a copy of the given utterance and return the new utterance.
-
classmethod
load
(path, loader=None)¶ Loads the dataset from the given path, using the given loader. If no loader is given the spych loader is used.
-
name
¶ Get the name of the dataset (Equals basename of the path, if not None.) :return: name
-
num_subviews
¶ Return number of subviews.
-
remove_files
(file_ids, delete_files=False)¶ Deletes the given wavs.
Parameters: - file_ids – List of file_idx’s or fileobj’s
- delete_files – Also delete the files
-
remove_utterances
(utterance_ids)¶ Removes the given utterances by id.
Parameters: utterance_ids – List of utterance ids
-
save
()¶ Save this dataset at self.path.
-
save_at
(path, loader=None, copy_files=False)¶ Save this dataset at the given path. If the path differs from the current path set, the path gets updated.
Parameters: - path – Path to save the dataset to.
- loader – If you want to use another loader (e.g. to export to another format). Otherwise it uses the loader associated with this dataset.
- copy_files – If true the files are also stored in the new path, if not already there.
-
subdivide_speakers
(target_number_of_speakers)¶ Divide the available speakers in the dataset into different speakers so the number of speakers is target_number_of_speakers.
Parameters: target_number_of_speakers – Target number of speakers
-
class
spych.data.dataset.
Subview
(filtered_utterances=set(), filtered_speakers=set(), dataset=None)¶ A subview is a filtered view on a dataset. For example it only uses a subset of utterance-id’s.
-
does_utterance_match
(utterance)¶ Return True if the given utterance matches all filter criteria. Otherwise return False.
-
Validation¶
-
class
spych.data.dataset.
Validator
(metrics=[], expected_file_format=<spych.audio.format.AudioFileFormat object>)¶ Class to validate a dataset.
-
classmethod
full_validator
(expected_file_format=<spych.audio.format.AudioFileFormat object>)¶ Returns a validator, that checks all metrics.
-
static
get_features_with_missing_file
(dataset)¶ Return a dictionary (feature-container, list of missing utterance-ids) where there is no feature file.
-
static
get_files_empty
(dataset)¶ Return a list of file-idx’s that contain no data.
-
static
get_files_missing
(dataset)¶ Return a list of file-idx’s where the actual file is missing.
-
static
get_files_with_wrong_format
(dataset, expected_format=<spych.audio.format.AudioFileFormat object>)¶ Return a list of file-idx’s that don’t conform the given audio format.
-
static
get_files_without_utterances
(dataset)¶ Return a list of file-idx’s that don’t reference any utterances.
-
static
get_speakers_without_gender
(dataset)¶ Return a list of speaker-idx’s, where the gender is not defined.
-
static
get_utterances_with_invalid_start_end
(dataset)¶ Check if there are any utterances that have invalid start/end time.
- Must be:
- float
- can be empty –> 0 -1
- end >= start
- start >= 0
- end >= 0 or end = -1
Parameters: dataset – Dataset to check. Returns: List of utterance-ids with invalid start/end.
-
static
get_utterances_with_missing_file_idx
(dataset)¶ Return a list of utterances that reference a wav-id that isn’t existing.
-
static
get_utterances_with_missing_speaker
(dataset)¶ Return list without or invalid speaker.
-
static
get_utterances_without_segmentation
(dataset)¶ Return a dictionary of key/utterance-idx’s where no segmentation is available.
-
validate
(dataset)¶ Return the validation results for the selected validation metrics. (dictionary metric/results)
-
classmethod
Rectification¶
-
class
spych.data.dataset.
Rectifier
(tasks=[], expected_file_format=<spych.audio.format.AudioFileFormat object>)¶ Provides functionality to fix invalid dataset parts.
-
static
remove_empty_wavs
(dataset_to_fix)¶ Delete all files that have no content.
-
static
remove_files_missing
(dataset_to_fix)¶ Delete all files references, where the referenced wav file doesn’t exist.
-
static
remove_files_with_wrong_format
(dataset_to_fix, expected_format=<spych.audio.format.AudioFileFormat object>)¶ Delete all wav files with the wrong format (Sampling rate, sample width, …).
-
static
remove_utterances_with_missing_file
(dataset_to_fix)¶ Remove all utterances where the referenced file doesn’t exist.
-
static
Iteration¶
-
class
spych.data.dataset.
BatchGenerator
(datasets)¶ Class that provides functions to generate batches from a single or multiple datasets. If multiple datasets only utterances are considered, that exist in all of the given datasets.
-
batches_with_features
(feature_name, batch_size, feature_pipeline=None)¶ Return a generator which yields batches. One batch contains concatenated features from batch_size utterances.
Parameters: - feature_name – Name of the feature container in the dataset to use.
- batch_size – Number of utterances in one batch
- feature_pipeline – If not None processes the features with the pipeline.
Returns: generator
-
batches_with_spliced_features
(feature_name, batch_size, splice_sizes=0, splice_step=1, repeat_border_frames=True)¶ Return a generator which yields batches. One batch contains the concatenated features of batch_size utterances. Yields list with length equal to the number of datasets in this generator. The list contains ndarrays with the concatenated features.
Parameters: - feature_name – Name of the feature container in the dataset to use.
- batch_size – Number of utterances in one batch
- splice_sizes – Appends x previous and next features to the sample. (e.g. if splice_size is 4 the sample contains 9 features in total).
If a list of ints is given the different splice-sizes for the different datasets are used. :param splice_step: Number of features to move forward for the next sample. :param repeat_border_frames: If True repeat the first and last frame for splicing. Otherwise pad with zeroes. :returns: generator
-
batches_with_utterance_idx_and_features
(feature_name, batch_size, feature_pipeline=None)¶ Return a generator which yields batches. One batch contains features from batch_size utterances. Yields a list of lists. Every sublist contains first the utterance id and following the feature arrays (ndarray) of all datasets.
e.g. If there are two source datasets:
- [
- [utt_id, dataset_1_feature, dataset_2_feature], [utt_id2, dataset_1_feature2, dataset_2_feature2], …
]
Parameters: - feature_name – Name of the feature container in the dataset to use.
- batch_size – Number of utterances in one batch
- feature_pipeline – If not None processes the features with the pipeline.
Returns: generator
-
batches_with_utterance_idxs
(batch_size)¶ Return a generator which yields batches. One batch is a list of utterance-ids of size batch-size.
-