Skip to content

frednam93/FilterAugSED

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sound Event Detection with FilterAugment

Official implementation of

  • Heavily Augmented Sound Event Detection utilizing Weak Predictions (DCASE2021 Challenge Task 4 technical report)
    by Hyeonuk Nam, Byeong-Yun Ko, Gyeong-Tae Lee, Seong-Hu Kim, Won-Ho Jung, Sang-Min Choi, Yong-Hwa Park
    DCASE arXiv
    - arXiv version has updates on some minor errors
  • FilterAugment: An Acoustic Environmental Data Augmentation Method (Accepted to ICASSP 2022)
    by Hyeonuk Nam, Seong-Hu Kim, Yong-Hwa Park
    ICASSP2022 arXiv

Ranked on [3rd place] in IEEE DCASE 2021 Task 4, and accepted to ICASSP 2022.c

Also, refer to Frequency Dynamic Convolution SED which is upgraded SED model from this work!

FilterAugment

Filter Augment is an audio data augmentation method newly proposed on the above papers for training acoustic models in audio/speech tasks. It applies random weights on randomly selected frequency bands. For more details, refer to the papers mentioned above.

  • This example shows two types of FilterAugment applied on log mel spectrogram of a 10-second audio clip. (a) shows original log mel spectrogram, (b) shows log mel spectrogram applied by step type FilterAugment (c) shows log mel spectrogram applied by linear type Filter Augment.
  • Applied filters are shown below. Filter (d) is applied on (a) to result in (b), and filter (e) is applied on (a) to result in (c)











  • Step type FilterAugment shows several frequency bands that are uniformly increased or decreased in amplitude, while linear type FilterAugment shows continous filter that shows certain peaks and dips.
  • On our participation on DCASE2021 challenge task 4, we used prototype FilterAugment which is step type FilterAugment without hyperparameter minimum bandwith. The code for this prototype is defiend as "filt_aug_prototype" at utils/data_aug.py @ line 52
  • Code for updated FilterAugment including step and linear type for ICASSP submission is defiend as "filt_aug" at utils/data_aug.py @ line 7

Requirements

Python version of 3.7.10 is used with following libraries

  • pytorch==1.8.0
  • pytorch-lightning==1.2.4
  • pytorchaudio==0.8.0
  • scipy==1.4.1
  • pandas==1.1.3
  • numpy==1.19.2

other requrements in requirements.txt

Datasets

You can download datasets by reffering to DCASE 2021 Task 4 description page or DCASE 2021 Task 4 baseline. Then, set the dataset directories in config yaml files accordingly. You need DESED real datasets (weak/unlabeled in domain/validation/public eval) and DESED synthetic datasets (train/validation).

Training

You can train and save model in exps folder by running:

python main.py

model settings:

There are 5 configuration files in this repo. Default setting is optimized linear FilterAugment with ICASSP setting described in paper submitted to ICASSP. There are 4 other model settings from DCASE tech report with prototype FilterAugment. To train for model 1, 2, 3 or 4 from the DCASE tech report, you can run the following code instead.

# for example, to train model 3:
python main.py --confing dcase_model3

Results of prototype FilterAugment with DCASE settings (model 1~4) on DESED Real Validation dataset:

Model PSDS1 PSDS2 Collar-based F1
DCASE2021 Task4 baseline 0.353 0.553 42.1%
proto-FiltAug model 1 0.408 0.628 49.0%
proto-FiltAug model 2 0.414 0.608 49.2%
proto-FiltAug model 3 0.381 0.660 31.8%
proto-FiltAug model 4 0.052 0.783 19.8%
  • These results are based on train models with single run for each setting

Results of updated FilterAugment with ICASSP settings on DESED Real Validation dataset:

Model PSDS1 PSDS2 Collar-based F1 Intersection-based F1
w/o FiltAug 0.387 0.598 47.7% 70.8%
optimized step FiltAug 0.412 0.634 47.4% 71.2%
optimized linear FiltAug 0.413 0.636 49.0% 73.5%
  • These results are based on max values of each metric for 3 separate runs on each setting (refer to paper for details).

Reference

DCASE 2021 Task 4 baseline

Citation & Contact

If this repository helped your works, please cite papers below!

@techreport{nam2021heavilyaugmnetedsed,
    Author = "Nam, Hyeonuk and Ko, Byeong-Yun and Lee, Gyeong-Tae and Kim, Seong-Hu and Jung, Won-Ho and Choi, Sang-Min and Park, Yong-Hwa",
    title = "Heavily Augmented Sound Event Detection utilizing Weak Predictions",
    institution = "DCASE2021 Challenge",
    year = "2021",
    month = "June",
}

@INPROCEEDINGS{nam2021filteraugment,
    author={Nam, Hyeonuk and Kim, Seong-Hu and Park, Yong-Hwa},
    booktitle={ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
    title={Filteraugment: An Acoustic Environmental Data Augmentation Method}, 
    year={2022},
    pages={4308-4312},
    doi={10.1109/ICASSP43922.2022.9747680}
}

Please contact Hyeonuk Nam at frednam@kaist.ac.kr for any query.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages