This page hosts the material for my Master’s thesis research, which was awarded Outstanding Work in Graduate Research by the faculty of NYU’s Music Technology department. The thesis, video defense, and associated compositions and software are linked below.
Network Modulation Synthesis: New Algorithms for Generating Musical Audio Using Autoencoder Networks
A new framework is presented for generating musical audio using autoencoder neural networks. With the presented framework, called network modulation synthesis, users can create synthesis architectures and use novel generative algorithms to more easily move through the complex latent parameter space of an autoencoder model to create musical audio.
The algorithms provide mechanisms for making subtle or drastic changes to generated audio without searching for new audio encodings. Additionally, autoencoder networks without autoregressive generation can use the proposed predictive feedback algorithm to create audio that changes over time, a necessity for music composition. Spectrograms and time-series encoding analysis demonstrate that the new algorithms provide simple mechanisms for users to generate time-varying parameter combinations, and therefore auditory possibilities, that are difficult to create by generating audio from handcrafted encodings.
Three compositions were created as a proof of concept for the algorithms’ compositional efficacy. The CANNe autoencoder network was chosen as the base generative model. The compositions utilize the new framework in three musical contexts: offline algorithmic composition and audio generation, interactive performance using words to generate synthesis parameters, and sample-generation for composition and live performance using DAWs or other musical software. The compositions were reviewed by an expert panel of computer music composers for their demonstration of the algorithms’ use in music composition. Reviewers noted a sonic similarity to granular synthesis, and praised the framework's capability for complex and layered sounds while noting a lack of sonic flexibility and frequency range available in more common synthesis methods.
Implementations of the new algorithms are provided for the open-source CANNe synthesizer network, alongside three new pre-trained CANNe models using popular data sets for generative models. The general framework can be adapted to other autoencoder networks for audio synthesis.
link to document
Due to the COVID-19 pandemic, 2020 Master's defenses were conducted as pre-recorded video defenses with a Live Q&A session. Below, you can view my pre-recorded defense.
to: alex, with regret (coded composition, 2019)
This composition showcases network modulation synthesis in a classic computer music compositional setting. The piece was created entirely offline using Python code. The features of note in this composition are heavy use of the basic network modulation synthesis algorithm and heavy use of feedback and time-varying parameters.
bloviation encouraged! (audio-video composition, 2019)
This piece showcases the compositional algorithms in an interactive performance system. The parameters of the network model are determined based on the makeup of the input words, so the music depends entirely on the player. The audio is mixed in stereo, but could be extended to a multichannel setup; consequently, this work could be extended to be an interactive installation piece. This piece heavily features the use of pitch shifting and amplitude envelopes to differentiate sounds. An example performance can be viewed below.
bell / boom (2020)
This composition showcases the network modulation synthesis software as a method of generating interesting samples. Samples are played back through a Max/MSP patch and a MIDI controller. This composition heavily features predictive feedback to create the steadier droning notes, and a complex synthesis tree. The tree creates related samples in five voices, and the voices are mixed together so that every note played on the MIDI controller blossoms into a layered texture. Sometimes groups and voices are changed mid-note to introduce some timbral chaos.
All relevant code is available on Github.