🧬 Detect, Visualize & Use Valence & Arousal

🧬 Detect, Visualize & Use Valence & Arousal #

This guide provides a complete, practical blueprint for detecting musical emotion (Valence & Arousal), visualizing it, and using it to drive intelligent video synchronization.


πŸ”¬ Part 1: Detection with Python #

Forget training your own models. We can use essentia, a powerful audio analysis library with pre-trained models. This is the most direct path to getting emotional data from a song.

The Workflow #

  1. Setup Your Environment: It’s highly recommended to use a Python virtual environment.

    # Create and activate a virtual environment
    python3 -m venv .venv
    source .venv/bin/activate
    
  2. Install Correct Essentia & Dependencies: You need the special essentia-tensorflow package for model inference, plus matplotlib for visualization.

    # Install the required packages
    pip install essentia-tensorflow matplotlib
    
  3. Download Pre-trained Models: Create a directory for your project and download the models into it.

    mkdir emotion_analyzer && cd emotion_analyzer
    wget https://essentia.upf.edu/models/classifiers/mood_acoustic/mood_acoustic-musicnn-msd-1.pb
    wget https://essentia.upf.edu/models/classifiers/mood_aggressive/mood_aggressive-musicnn-msd-1.pb
    wget https://essentia.upf.edu/models/classifiers/mood_electronic/mood_electronic-musicnn-msd-1.pb
    wget https://essentia.upf.edu/models/classifiers/mood_happy/mood_happy-musicnn-msd-1.pb
    wget https://essentia.upf.edu/models/classifiers/mood_party/mood_party-musicnn-msd-1.pb
    wget https://essentia.upf.edu/models/classifiers/mood_relaxed/mood_relaxed-musicnn-msd-1.pb
    wget https://essentia.upf.edu/models/classifiers/mood_sad/mood_sad-musicnn-msd-1.pb
    
  4. Create the Analysis Script: This Python script (analyze_emotion.py) loads a song, analyzes it using the models, and saves the emotional journey to a JSON file.

    # analyze_emotion.py
    import essentia
    import essentia.standard as es
    import numpy as np
    import json
    import sys
    
    # --- Models and their corresponding V-A weights ---
    MODELS = {
        'mood_happy':       (1.0, 0.2),
        'mood_sad':         (-1.0, -0.5),
        'mood_aggressive':  (-0.3, 1.0),
        'mood_electronic':  (0.0, 0.8),
        'mood_relaxed':     (0.2, -1.0),
        'mood_acoustic':    (0.1, -0.8),
        'mood_party':       (0.6, 0.9)
    }
    
    def get_song_emotion_journey(audio_file):
        """
        Analyzes a song and returns its emotional journey over time.
        """
        # 1. Load audio at the correct sample rate for the models
        audio = es.MonoLoader(filename=audio_file, sampleRate=16000)()
    
        # 2. Get the activation journey for each mood
        predictions = {}
        for name in MODELS.keys():
            model_path = f"{name}-musicnn-msd-1.pb"
            # The TensorflowPredictMusiCNN algorithm handles framing internally
            model = es.TensorflowPredictMusiCNN(graphFilename=model_path)
            activations = model(audio)
            # The model returns a (num_frames, 2) array; the second column is the activation probability.
            predictions[name] = activations[:, 1]
    
        # 3. Calculate V-A proxies from mood activations
        num_frames = len(predictions['mood_happy'])
        valence_journey = np.zeros(num_frames)
        arousal_journey = np.zeros(num_frames)
    
        for name, (v_weight, a_weight) in MODELS.items():
            activations = predictions[name]
            valence_journey += activations * v_weight
            arousal_journey += activations * a_weight
    
        return valence_journey, arousal_journey
    
    if __name__ == '__main__':
        if len(sys.argv) != 2:
            print("Usage: python analyze_emotion.py <path_to_audio_file>")
            sys.exit(1)
    
        song_path = sys.argv[1]
        v, a = get_song_emotion_journey(song_path)
    
        output_data = {
            'valence': v.tolist(),
            'arousal': a.tolist()
        }
    
        with open('emotion_data.json', 'w') as f:
            json.dump(output_data, f, indent=4)
    
        print("Emotion data saved to emotion_data.json")
        print(f"Generated {len(v)} data points.")
        print(f"Average Valence: {np.mean(v):.2f}")
        print(f"Average Arousal: {np.mean(a):.2f}")
    

🎨 Part 2: Visualization #

Once you have the emotion_data.json file, you can visualize the results to better understand the song’s emotional arc.

  1. Create the Visualization Script: This Python script (visualize_emotion.py) reads the JSON data and plots the Valence and Arousal values over time using matplotlib.

    # visualize_emotion.py
    import json
    import matplotlib.pyplot as plt
    import numpy as np
    import sys
    
    def visualize_emotion(data_file, output_image):
        """
        Visualizes the emotional journey from a JSON file.
        """
        with open(data_file, 'r') as f:
            data = json.load(f)
    
        valence = np.array(data['valence'])
        arousal = np.array(data['arousal'])
        time = np.arange(len(valence))
    
        fig, ax1 = plt.subplots(figsize=(15, 7))
    
        # Plot Valence
        color = 'tab:blue'
        ax1.set_xlabel('Time (frames)')
        ax1.set_ylabel('Valence (Pleasure)', color=color)
        ax1.plot(time, valence, color=color, label='Valence')
        ax1.tick_params(axis='y', labelcolor=color)
        ax1.grid(True, linestyle='--', alpha=0.6)
    
        # Create a second y-axis for Arousal
        ax2 = ax1.twinx()
        color = 'tab:red'
        ax2.set_ylabel('Arousal (Energy)', color=color)
        ax2.plot(time, arousal, color=color, label='Arousal')
        ax2.tick_params(axis='y', labelcolor=color)
    
        fig.suptitle('Emotional Journey of the Song', fontsize=16)
        fig.tight_layout()
        plt.savefig(output_image)
        print(f"Visualization saved to {output_image}")
    
    if __name__ == '__main__':
        if len(sys.argv) != 2:
            print("Usage: python visualize_emotion.py <path_to_json_file>")
            sys.exit(1)
    
        data_path = sys.argv[1]
        visualize_emotion(data_path, 'emotion_visualization.png')
    
  2. Run the Full Pipeline:

    # Make sure you have a test audio file (e.g., my_song.mp3) in the same directory
    
    # 1. Analyze the song
    python analyze_emotion.py my_song.mp3
    
    # 2. Visualize the results
    python visualize_emotion.py emotion_data.json
    

Example Output #

This will produce an emotion_visualization.png file that looks something like this, clearly showing the interplay between the song’s pleasure (Valence) and energy (Arousal) over time.

Emotional Journey Visualization if name == ‘main’: song_path = ‘path/to/your/song.mp3’ # <— CHANGE THIS v, a = get_song_emotion_journey(song_path)

    # Now you have two arrays, v and a, representing the emotional
    # journey of the song over time. You can visualize or map these.
    print("Analysis complete!")
    print(f"Generated {len(v)} data points.")
    
    # Example: Print the average emotion for the whole song
    print(f"Average Valence: {np.mean(v):.2f}")
    print(f"Average Arousal: {np.mean(a):.2f}")
```

🎨 Part 2: Visualization Strategies #

Raw numbers aren’t intuitive. Visualizing the V-A data is key to understanding a song’s emotional journey.

The V-A Plane (Thayer’s Model) #

This is the classic 2D emotional map. It’s the best way to visualize the overall mood.

  • Top-Right (High V, High A): 🀩 Excited, Euphoric
  • Bottom-Right (High V, Low A): 😊 Content, Relaxed
  • Top-Left (Low V, High A): 😠 Anxious, Angry
  • Bottom-Left (Low V, Low A): πŸ˜” Sad, Depressed

Dynamic Visualizations (The Emotional Journey) #

To see how emotion changes over time, you need to analyze the song in short segments (e.g., every 1 second) and plot the results.

  • πŸ“ˆ Timeline Plot: Two simple line graphs plotted over the song’s duration. One line for Valence, one for Arousal. This clearly shows emotional shifts, like the build-up to a chorus.

  • 🌈 Color-Coded Timeline: A single horizontal bar that represents the song. The bar’s color changes over time based on the V-A values. This gives an at-a-glance emotional summary.

    • Mapping Idea: Map the 2D V-A plane to a color wheel (e.g., HSV/HSL color space). Arousal can be Lightness and Valence can be Hue.
  • ✨ Particle System: A dynamic screensaver-like visualization.

    • Arousal controls the speed and quantity of particles.
    • Valence controls their color and direction (e.g., upward for positive, downward for negative).

πŸ—ΊοΈ Part 3: Creative Mapping to Video #

This is the final step: using the detected V-A data to select video clips.

Strategy 1: Direct Distance Mapping #

This is the foundational approach. For each moment in the song, find the video clip with the “closest” emotional fingerprint.

  1. Analyze Your Library: First, run your V-A detection on every video clip in your library to get its average (clip_valence, clip_arousal) score.
  2. Find the Closest Match: For each segment of the song, calculate the Euclidean distance between the song’s V-A and every clip’s V-A. Pick the clip with the smallest distance.

Strategy 2: Quadrant-Based Pools #

This adds variety and prevents the same “best” clip from being used over and over.

  1. Create Pools: Assign each of your video clips to one of the four V-A quadrants (Excited, Calm, Anxious, Sad).
  2. Map to the Zone: As the song’s V-A moves into a quadrant, randomly select a video from that quadrant’s pool.
  3. Benefit: This feels more organic and less robotic than a pure distance calculation.

Strategy 3: Threshold-Based Triggers #

This strategy focuses on reacting to significant emotional changes.

  1. Set Thresholds: Define what counts as “High Arousal” or “Low Valence” (e.g., Arousal > 0.7).
  2. Trigger Events: When the song’s V-A crosses a threshold, trigger a specific video event.
    • Example: If Arousal crosses the “High” threshold for the first time (the chorus hits), trigger a pre-defined, high-impact video clip.
    • Example: If Valence drops below the “Low” threshold, switch to a monochrome or slow-motion video effect.

Strategy 4: Derivative Mapping (Rate of Change) #

This is a more advanced, subtle technique that maps the change in emotion.

  1. Calculate the Derivative: Instead of looking at the V-A value, look at how fast it’s changing between segments.
  2. Map the Change:
    • A rapid increase in Arousal could map to a fast camera zoom, a quick fade-in, or a build-up of visual effects.
    • A sudden drop in Valence could trigger an instant cut to a darker or more abstract scene.
    • A period of no change could map to a long, static shot.