🧬 Detect, Visualize & Use Valence & Arousal #

This guide provides a complete, practical blueprint for detecting musical emotion (Valence & Arousal), visualizing it, and using it to drive intelligent video synchronization.

🔬 Part 1: Detection with Python #

Forget training your own models. We can use essentia, a powerful audio analysis library with pre-trained models. This is the most direct path to getting emotional data from a song.

The Workflow #

Setup Your Environment: It’s highly recommended to use a Python virtual environment.

# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

Install Correct Essentia & Dependencies: You need the special essentia-tensorflow package for model inference, plus matplotlib for visualization.
```
# Install the required packages
pip install essentia-tensorflow matplotlib
```

Download Pre-trained Models: Create a directory for your project and download the models into it.

mkdir emotion_analyzer && cd emotion_analyzer
wget https://essentia.upf.edu/models/classifiers/mood_acoustic/mood_acoustic-musicnn-msd-1.pb
wget https://essentia.upf.edu/models/classifiers/mood_aggressive/mood_aggressive-musicnn-msd-1.pb
wget https://essentia.upf.edu/models/classifiers/mood_electronic/mood_electronic-musicnn-msd-1.pb
wget https://essentia.upf.edu/models/classifiers/mood_happy/mood_happy-musicnn-msd-1.pb
wget https://essentia.upf.edu/models/classifiers/mood_party/mood_party-musicnn-msd-1.pb
wget https://essentia.upf.edu/models/classifiers/mood_relaxed/mood_relaxed-musicnn-msd-1.pb
wget https://essentia.upf.edu/models/classifiers/mood_sad/mood_sad-musicnn-msd-1.pb

Create the Analysis Script: This Python script (analyze_emotion.py) loads a song, analyzes it using the models, and saves the emotional journey to a JSON file.

# analyze_emotion.py
import essentia
import essentia.standard as es
import numpy as np
import json
import sys

# --- Models and their corresponding V-A weights ---
MODELS = {
    'mood_happy':       (1.0, 0.2),
    'mood_sad':         (-1.0, -0.5),
    'mood_aggressive':  (-0.3, 1.0),
    'mood_electronic':  (0.0, 0.8),
    'mood_relaxed':     (0.2, -1.0),
    'mood_acoustic':    (0.1, -0.8),
    'mood_party':       (0.6, 0.9)
}

def get_song_emotion_journey(audio_file):
    """
    Analyzes a song and returns its emotional journey over time.
    """
    # 1. Load audio at the correct sample rate for the models
    audio = es.MonoLoader(filename=audio_file, sampleRate=16000)()

    # 2. Get the activation journey for each mood
    predictions = {}
    for name in MODELS.keys():
        model_path = f"{name}-musicnn-msd-1.pb"
        # The TensorflowPredictMusiCNN algorithm handles framing internally
        model = es.TensorflowPredictMusiCNN(graphFilename=model_path)
        activations = model(audio)
        # The model returns a (num_frames, 2) array; the second column is the activation probability.
        predictions[name] = activations[:, 1]

    # 3. Calculate V-A proxies from mood activations
    num_frames = len(predictions['mood_happy'])
    valence_journey = np.zeros(num_frames)
    arousal_journey = np.zeros(num_frames)

    for name, (v_weight, a_weight) in MODELS.items():
        activations = predictions[name]
        valence_journey += activations * v_weight
        arousal_journey += activations * a_weight

    return valence_journey, arousal_journey

if __name__ == '__main__':
    if len(sys.argv) != 2:
        print("Usage: python analyze_emotion.py <path_to_audio_file>")
        sys.exit(1)

    song_path = sys.argv[1]
    v, a = get_song_emotion_journey(song_path)

    output_data = {
        'valence': v.tolist(),
        'arousal': a.tolist()
    }

    with open('emotion_data.json', 'w') as f:
        json.dump(output_data, f, indent=4)

    print("Emotion data saved to emotion_data.json")
    print(f"Generated {len(v)} data points.")
    print(f"Average Valence: {np.mean(v):.2f}")
    print(f"Average Arousal: {np.mean(a):.2f}")

🎨 Part 2: Visualization #

Once you have the emotion_data.json file, you can visualize the results to better understand the song’s emotional arc.

Create the Visualization Script: This Python script (visualize_emotion.py) reads the JSON data and plots the Valence and Arousal values over time using matplotlib.

# visualize_emotion.py
import json
import matplotlib.pyplot as plt
import numpy as np
import sys

def visualize_emotion(data_file, output_image):
    """
    Visualizes the emotional journey from a JSON file.
    """
    with open(data_file, 'r') as f:
        data = json.load(f)

    valence = np.array(data['valence'])
    arousal = np.array(data['arousal'])
    time = np.arange(len(valence))

    fig, ax1 = plt.subplots(figsize=(15, 7))

    # Plot Valence
    color = 'tab:blue'
    ax1.set_xlabel('Time (frames)')
    ax1.set_ylabel('Valence (Pleasure)', color=color)
    ax1.plot(time, valence, color=color, label='Valence')
    ax1.tick_params(axis='y', labelcolor=color)
    ax1.grid(True, linestyle='--', alpha=0.6)

    # Create a second y-axis for Arousal
    ax2 = ax1.twinx()
    color = 'tab:red'
    ax2.set_ylabel('Arousal (Energy)', color=color)
    ax2.plot(time, arousal, color=color, label='Arousal')
    ax2.tick_params(axis='y', labelcolor=color)

    fig.suptitle('Emotional Journey of the Song', fontsize=16)
    fig.tight_layout()
    plt.savefig(output_image)
    print(f"Visualization saved to {output_image}")

if __name__ == '__main__':
    if len(sys.argv) != 2:
        print("Usage: python visualize_emotion.py <path_to_json_file>")
        sys.exit(1)

    data_path = sys.argv[1]
    visualize_emotion(data_path, 'emotion_visualization.png')

Run the Full Pipeline:

# Make sure you have a test audio file (e.g., my_song.mp3) in the same directory

# 1. Analyze the song
python analyze_emotion.py my_song.mp3

# 2. Visualize the results
python visualize_emotion.py emotion_data.json

Example Output #

This will produce an emotion_visualization.png file that looks something like this, clearly showing the interplay between the song’s pleasure (Valence) and energy (Arousal) over time.

Emotional Journey Visualization if name == ‘main’: song_path = ‘path/to/your/song.mp3’ # <— CHANGE THIS v, a = get_song_emotion_journey(song_path)

    # Now you have two arrays, v and a, representing the emotional
    # journey of the song over time. You can visualize or map these.
    print("Analysis complete!")
    print(f"Generated {len(v)} data points.")
    
    # Example: Print the average emotion for the whole song
    print(f"Average Valence: {np.mean(v):.2f}")
    print(f"Average Arousal: {np.mean(a):.2f}")
```

🎨 Part 2: Visualization Strategies #

Raw numbers aren’t intuitive. Visualizing the V-A data is key to understanding a song’s emotional journey.

The V-A Plane (Thayer’s Model) #

This is the classic 2D emotional map. It’s the best way to visualize the overall mood.

Top-Right (High V, High A): 🤩 Excited, Euphoric
Bottom-Right (High V, Low A): 😊 Content, Relaxed
Top-Left (Low V, High A): 😠 Anxious, Angry
Bottom-Left (Low V, Low A): 😔 Sad, Depressed

Dynamic Visualizations (The Emotional Journey) #

To see how emotion changes over time, you need to analyze the song in short segments (e.g., every 1 second) and plot the results.

📈 Timeline Plot: Two simple line graphs plotted over the song’s duration. One line for Valence, one for Arousal. This clearly shows emotional shifts, like the build-up to a chorus.
🌈 Color-Coded Timeline: A single horizontal bar that represents the song. The bar’s color changes over time based on the V-A values. This gives an at-a-glance emotional summary.
- Mapping Idea: Map the 2D V-A plane to a color wheel (e.g., HSV/HSL color space). Arousal can be Lightness and Valence can be Hue.
✨ Particle System: A dynamic screensaver-like visualization.
- Arousal controls the speed and quantity of particles.
- Valence controls their color and direction (e.g., upward for positive, downward for negative).

🗺️ Part 3: Creative Mapping to Video #

This is the final step: using the detected V-A data to select video clips.

Strategy 1: Direct Distance Mapping #

This is the foundational approach. For each moment in the song, find the video clip with the “closest” emotional fingerprint.

Analyze Your Library: First, run your V-A detection on every video clip in your library to get its average (clip_valence, clip_arousal) score.
Find the Closest Match: For each segment of the song, calculate the Euclidean distance between the song’s V-A and every clip’s V-A. Pick the clip with the smallest distance.

Strategy 2: Quadrant-Based Pools #

This adds variety and prevents the same “best” clip from being used over and over.

Create Pools: Assign each of your video clips to one of the four V-A quadrants (Excited, Calm, Anxious, Sad).
Map to the Zone: As the song’s V-A moves into a quadrant, randomly select a video from that quadrant’s pool.
Benefit: This feels more organic and less robotic than a pure distance calculation.

Strategy 3: Threshold-Based Triggers #

This strategy focuses on reacting to significant emotional changes.

Set Thresholds: Define what counts as “High Arousal” or “Low Valence” (e.g., Arousal > 0.7).
Trigger Events: When the song’s V-A crosses a threshold, trigger a specific video event.
- Example: If Arousal crosses the “High” threshold for the first time (the chorus hits), trigger a pre-defined, high-impact video clip.
- Example: If Valence drops below the “Low” threshold, switch to a monochrome or slow-motion video effect.

Strategy 4: Derivative Mapping (Rate of Change) #

This is a more advanced, subtle technique that maps the change in emotion.

Calculate the Derivative: Instead of looking at the V-A value, look at how fast it’s changing between segments.
Map the Change:
- A rapid increase in Arousal could map to a fast camera zoom, a quick fade-in, or a build-up of visual effects.
- A sudden drop in Valence could trigger an instant cut to a darker or more abstract scene.
- A period of no change could map to a long, static shot.