🎬 Python Audio Visualization Cheat Sheet

🎬 Python Audio Visualization Cheat Sheet #

Create stunning audio visualizations in Python using librosa for audio analysis and moviepy for video creation. This guide provides a complete workflow from loading an audio file to exporting a video.


πŸ› οΈ 1. Installation #

Install the necessary libraries. librosa is for audio processing, moviepy for video editing, and matplotlib for plotting.

pip install librosa moviepy matplotlib numpy pandas

🎡 2. Audio Processing with Librosa #

librosa is the core library for analyzing audio and extracting features.

Loading Audio #

Load an audio file as a floating-point time series (y) and get its native sample rate (sr).

import librosa

file_path = 'your_audio.mp3'
y, sr = librosa.load(file_path)

# y: numpy array with the audio waveform
# sr: sample rate (e.g., 22050 Hz)

Feature Extraction #

Analyze the audio to extract meaningful features that can drive the visualization.

  • Spectrogram: A visual representation of the spectrum of frequencies as they vary with time.

    import numpy as np
    D = librosa.stft(y)
    S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)
    
  • Beat Tracking: Find the tempo and the frames where beats occur.

    tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
    beat_times = librosa.frames_to_time(beat_frames, sr=sr)
    
  • Harmonic-Percussive Separation: Separate the audio into harmonic (tonal) and percussive (rhythmic) components.

    y_harmonic, y_percussive = librosa.effects.hpss(y)
    

πŸ“Š 3. Generating Visualization Frames #

Use matplotlib to create an image for each frame of the audio. These images will be compiled into a video.

Plotting with librosa.display #

librosa.display provides easy-to-use functions for plotting audio data.

import matplotlib.pyplot as plt
import librosa.display

fig, ax = plt.subplots(figsize=(10, 4))
librosa.display.waveshow(y, sr=sr, ax=ax)
ax.set_title('Waveform')
plt.show()

Creating a Custom Visualizer Frame #

For a dynamic video, you’ll generate one image per audio frame in a loop. Here’s how to capture a Matplotlib figure as a NumPy array without displaying it.

from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas

def create_frame(S_frame):
    fig = plt.Figure(figsize=(5, 5), dpi=100)
    canvas = FigureCanvas(fig)
    ax = fig.gca()
    
    # Your custom plotting logic here
    ax.plot(S_frame, color='cyan')
    ax.axis('off')
    fig.tight_layout(pad=0)
    
    # Redraw the canvas and get the image as a numpy array
    canvas.draw()
    image = np.frombuffer(canvas.tostring_rgb(), dtype='uint8')
    image = image.reshape(fig.canvas.get_width_height()[::-1] + (3,))
    
    plt.close(fig)
    return image

🎞️ 4. Video Creation with MoviePy #

moviepy can take a sequence of images (as file paths or NumPy arrays) and compile them into a video file.

Creating a Clip from Images #

The ImageSequenceClip class is perfect for this task. It takes a list of image frames and a desired frames-per-second (fps) rate.

from moviepy.editor import ImageSequenceClip, AudioFileClip

# Assume `image_frames` is a list of numpy arrays from your visualizer
fps = 30 # Frames per second
video_clip = ImageSequenceClip(image_frames, fps=fps)

Adding Audio and Exporting #

Set the audio of your video clip to the original audio file and write the final output.

audio_clip = AudioFileClip(file_path)
final_clip = video_clip.set_audio(audio_clip)

final_clip.write_videofile(
    'output_video.mp4', 
    codec='libx264', 
    audio_codec='aac',
    fps=fps
)

πŸ”„ 5. Complete Workflow Example #

Here’s a simplified workflow to tie everything together.

import librosa
import numpy as np
from moviepy.editor import ImageSequenceClip, AudioFileClip
from tqdm import tqdm
# (Include the create_frame function from above)

# 1. Load Audio
audio_path = librosa.example('nutcracker')
y, sr = librosa.load(audio_path)

# 2. Analyze Audio
S = np.abs(librosa.stft(y))

# 3. Generate Frames
fps = 30
frame_len = int(sr / fps)

video_frames = []
for i in tqdm(range(0, S.shape[1])):
    # This is a simplified example; a real visualizer would be more complex
    # and might not map 1-to-1 with STFT frames.
    frame_data = S[:, i]
    frame_image = create_frame(frame_data)
    video_frames.append(frame_image)

# 4. Create Video
video_clip = ImageSequenceClip(video_frames, fps=fps)
audio_clip = AudioFileClip(audio_path)
final_clip = video_clip.set_audio(audio_clip.subclip(0, video_clip.duration))

final_clip.write_videofile('animusic.mp4', codec='libx264', audio_codec='aac', fps=fps)