Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing

Alistair Carson, Alec Wright, Jatin Chowdhury, Vesa Välimäki and Stefan Bilbao

Welcome to the accompanying web-page for our DAFx24 submission.

For example code see here.

Abstract

In recent years, machine learning approaches to modelling guitar amplifiers and effects pedals have been widely investigated and have become standard practice in some consumer products. In particular, recurrent neural networks (RNNs) are a popular choice for modelling non-linear devices such as vacuum tube amplifiers and distortion circuitry. One limitation of such models is that they are trained on audio at a specific sample rate and therefore give unreliable results when operating at another rate. Here, we investigate several methods of modifying RNN structures to make them approximately sample rate independent, with a focus on oversampling. In the case of integer oversampling, we demonstrate that a previously proposed delay-based approach provides high fidelity sample rate conversion whilst additionally reducing aliasing. For non-integer sample rate adjustment, we propose two novel methods and show that one of these, based on cubic Lagrange interpolation of a delay-line, provides a significant improvement over existing methods. To our knowledge, this work provides the first in-depth study into this problem.

Audio Examples

Below are examples of five LSTM RNN models from the GuitarML Tone Library. These models are designed for operation at a sample rate (SR) of 44.1kHz. The audio examples below are the output signals when operating at different inference SRs, using the methods outlined in the paper:

  • Naive (operating original RNN at different SRs, first column)
  • State-trajectory network (STN)
  • Linearly interpolated delay line (LIDL)
  • All-pass filter delay line (APDL)
  • Cubic interpolated delay-line (CIDL) -- this is our recommended method for the highest quality sample rate conversion.

Note that for integer oversampling (e.g. 44.1kHz to 88.2kHz), the latter three methods produce an identical output, so only one is shown.


1) Peavey 6505+ tube amp – high gain

Input signal Delay-based methods
Inference SR Original RNN
(trained at 44.1kHz)
STN LIDL APDL* (ours) CIDL (ours)
44.1kHz - - - -
48kHz
88.2kHz
96kHz

*here the APDL method produces unwanted artefacts at 48kHz, as noted in the paper. This audio clip was specifically chosen to demonstrate these artefacts.


2) Blackstar HT40 tube amp – clean

Input signal Delay-based methods
Inference SR Original RNN
(trained at 44.1kHz)
STN LIDL APDL (ours) CIDL (ours)
44.1kHz - - - -
48kHz
88.2kHz
96kHz


3) Blackstar HT40 tube amp – high gain

Input signal Delay-based methods
Inference SR Original RNN
(trained at 44.1kHz)
STN LIDL APDL (ours) CIDL (ours)
44.1kHz - - - -
48kHz
88.2kHz
96kHz


4) Rockman acoustic simulator pedal

Input signal Delay-based methods
Inference SR Original RNN
(trained at 44.1kHz)
STN LIDL APDL (ours) CIDL (ours)
44.1kHz - - - -
48kHz
88.2kHz
96kHz


5) Xotic SP compressor pedal

Input signal Delay-based methods
Inference SR Original RNN
(trained at 44.1kHz)
STN LIDL APDL (ours) CIDL (ours)
44.1kHz - - - -
48kHz
88.2kHz
96kHz