TensorFlow和Pytorch中的音频增强

TensorFlow和Pytorch中的音频增强
对于图像相关的任务，对图像进行旋转、模糊或调整大小是常见的数据增强的方法。因为图像的自身属性与其他数据类型数据增强相比，图像的数据增强是非常直观的，我们只需要查看图像就可以看到特定图像是如何转换的，并且使用肉眼就能对效果有一个初步的评判结果。尽管增强在图像域中很常见，但在其他的领域中也是可以进行数据增强的操作的，本篇文章将介绍音频方向的数据增强方法。

在这篇文章中，将介绍如何将增强应用到 TensorFlow 中的数据集的两种方法。第一种方式直接修改数据；第二种方式是在网络的前向传播期间这样做的。除此以外我们还会介绍使用torchaudio的内置方法实现与TF相同的功能。

直接音频增强

首先需要生成一个人工音频数据集。我们不需要加载预先存在的数据集，而是根据需要重复 librosa 库中的一个样本：
1. import librosa
2. import tensorflow as tf
4. def build_artificial_dataset(num_samples: int):
5. data = []
6. sampling_rates = []
8. for i in range(num_samples):
9. y, sr = librosa.load(librosa.ex('nutcracker'))
10. data.append(y)
11. sampling_rates.append(sr)
12. features_dataset = tf.data.Dataset.from_tensor_slices(data)
13. labels_dataset = tf.data.Dataset.from_tensor_slices(sampling_rates)
14. dataset = tf.data.Dataset.zip((features_dataset, labels_dataset))
16. return dataset
18. ds = build_artificial_dataset(10)
在此过程中创建了一个 Dataset 对象，我们也可以使用纯 NumPy 数组这个可以根据实际需求选择。

现在小数据集已经可以使用，可以开始应用增强了。对于这一步，为了简单起见，本文中使用 audiomentations 库，我们只使用三个增强方式， PitchShift、Shift 和 ApplyGaussianNoise。前两个移动音高（PitchShift）和数据（Shift，可以认为是滚动数据；例如，狗的叫声将移动 + 5 秒）。最后一次转换使信号更嘈杂，增加了神经网络的挑战。接下来，将所有三个增强功能组合到一个管道中：
1. from audiomentations import Compose, AddGaussianNoise, PitchShift, Shift
3. augmentations_pipeline = Compose(
4. [
5. AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
6. PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
7. Shift(min_fraction=-0.5, max_fraction=0.5, p=0.5),
8. ]
9. )
在输入数据之前，必须编写一些额外的代码。这因为我们正在使用一个 Dataset 对象，这些代码告诉 TensorFlow 临时将张量转换为 NumPy 数组，然后再输入到数据增强的处理流程中：
1. def apply_pipeline(y, sr):
2. shifted = augmentations_pipeline(y, sr)
3. return shifted
6. @tf.function
7. def tf_apply_pipeline(feature, sr, ):
8. """
9. Applies the augmentation pipeline to audio files
10. @param y: audio data
11. @param sr: sampling rate
12. @return: augmented audio data
13. """
14. augmented_feature = tf.numpy_function(
15. apply_pipeline, inp=[feature, sr], Tout=tf.float32, name="apply_pipeline"
16. )
18. return augmented_feature, sr
21. def augment_audio_dataset(dataset: tf.data.Dataset):
22. dataset = dataset.map(tf_apply_pipeline)
24. return dataset
有了这些辅助函数，就可以扩充我们的数据集了。最后，还需要再末尾添加维度来添加一个维度，这会将单个音频样本从 (num_data_point,) 转换为 (num_data_points, 1)，表明我们有单声道音频：
1. ds = augment_audio_dataset(ds)
2. ds = ds.map(lambda y, sr: (tf.expand_dims(y, axis=-1), sr))
这样就完成了直接的音频数据增强

完整文章：

https://www.overfit.cn/post/5b6e5fe4acd84e4ca18a444b522b1c05
相关阅读:
php 手动搭建环境
 C#开源大全项目
 基于Aforge的物体运动识别-入门
 第二阶段站立会议02
第二阶段站立会议01
第一次绩效评估
 意见评论
 团队项目评论
 意见汇总
 对“小小之植物人”的博客检查结果
原文地址：https://www.cnblogs.com/deephub/p/16048644.html

TensorFlow和Pytorch中的音频增强

直接音频增强