Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing
Alistair Carson, Alec Wright, Jatin Chowdhury, Vesa Välimäki and Stefan Bilbao
Welcome to the accompanying web-page for our DAFx24 submission.
For example code see here.
Abstract
In recent years, machine learning approaches to modelling guitar amplifiers and effects pedals have been widely investigated and have become standard practice in some consumer products. In particular, recurrent neural networks (RNNs) are a popular choice for modelling non-linear devices such as vacuum tube amplifiers and distortion circuitry. One limitation of such models is that they are trained on audio at a specific sample rate and therefore give unreliable results when operating at another rate. Here, we investigate several methods of modifying RNN structures to make them approximately sample rate independent, with a focus on oversampling. In the case of integer oversampling, we demonstrate that a previously proposed delay-based approach provides high fidelity sample rate conversion whilst additionally reducing aliasing. For non-integer sample rate adjustment, we propose two novel methods and show that one of these, based on cubic Lagrange interpolation of a delay-line, provides a significant improvement over existing methods. To our knowledge, this work provides the first in-depth study into this problem.
Audio Examples
Below are examples of five LSTM RNN models from the GuitarML Tone Library. These models are designed for operation at a sample rate (SR) of 44.1kHz. The audio examples below are the output signals when operating at different inference SRs, using the methods outlined in the paper:
- Naive (operating original RNN at different SRs, first column)
- State-trajectory network (STN)
- Linearly interpolated delay line (LIDL)
- All-pass filter delay line (APDL)
- Cubic interpolated delay-line (CIDL) -- this is our recommended method for the highest quality sample rate conversion.
Note that for integer oversampling (e.g. 44.1kHz to 88.2kHz), the latter three methods produce an identical output, so only one is shown.
1) Peavey 6505+ tube amp – high gain
Input signal | Delay-based methods | ||||
---|---|---|---|---|---|
Inference SR | Original RNN (trained at 44.1kHz) |
STN | LIDL | APDL* (ours) | CIDL (ours) |
44.1kHz | - | - | - | - | |
48kHz | |||||
88.2kHz | |||||
96kHz |
*here the APDL method produces unwanted artefacts at 48kHz, as noted in the paper. This audio clip was specifically chosen to demonstrate these artefacts.
2) Blackstar HT40 tube amp – clean
Input signal | Delay-based methods | ||||
---|---|---|---|---|---|
Inference SR | Original RNN (trained at 44.1kHz) |
STN | LIDL | APDL (ours) | CIDL (ours) |
44.1kHz | - | - | - | - | |
48kHz | |||||
88.2kHz | |||||
96kHz |
3) Blackstar HT40 tube amp – high gain
Input signal | Delay-based methods | ||||
---|---|---|---|---|---|
Inference SR | Original RNN (trained at 44.1kHz) |
STN | LIDL | APDL (ours) | CIDL (ours) |
44.1kHz | - | - | - | - | |
48kHz | |||||
88.2kHz | |||||
96kHz |
4) Rockman acoustic simulator pedal
Input signal | Delay-based methods | ||||
---|---|---|---|---|---|
Inference SR | Original RNN (trained at 44.1kHz) |
STN | LIDL | APDL (ours) | CIDL (ours) |
44.1kHz | - | - | - | - | |
48kHz | |||||
88.2kHz | |||||
96kHz |
5) Xotic SP compressor pedal
Input signal | Delay-based methods | ||||
---|---|---|---|---|---|
Inference SR | Original RNN (trained at 44.1kHz) |
STN | LIDL | APDL (ours) | CIDL (ours) |
44.1kHz | - | - | - | - | |
48kHz | |||||
88.2kHz | |||||
96kHz |