𧬠Detect, Visualize & Use Valence & Arousal #
This guide provides a complete, practical blueprint for detecting musical emotion (Valence & Arousal), visualizing it, and using it to drive intelligent video synchronization.
π¬ Part 1: Detection with Python #
Forget training your own models. We can use essentia, a powerful audio analysis library with pre-trained models. This is the most direct path to getting emotional data from a song.
The Workflow #
-
Setup Your Environment: It’s highly recommended to use a Python virtual environment.
# Create and activate a virtual environment python3 -m venv .venv source .venv/bin/activate -
Install Correct Essentia & Dependencies: You need the special
essentia-tensorflowpackage for model inference, plusmatplotlibfor visualization.# Install the required packages pip install essentia-tensorflow matplotlib -
Download Pre-trained Models: Create a directory for your project and download the models into it.
mkdir emotion_analyzer && cd emotion_analyzer wget https://essentia.upf.edu/models/classifiers/mood_acoustic/mood_acoustic-musicnn-msd-1.pb wget https://essentia.upf.edu/models/classifiers/mood_aggressive/mood_aggressive-musicnn-msd-1.pb wget https://essentia.upf.edu/models/classifiers/mood_electronic/mood_electronic-musicnn-msd-1.pb wget https://essentia.upf.edu/models/classifiers/mood_happy/mood_happy-musicnn-msd-1.pb wget https://essentia.upf.edu/models/classifiers/mood_party/mood_party-musicnn-msd-1.pb wget https://essentia.upf.edu/models/classifiers/mood_relaxed/mood_relaxed-musicnn-msd-1.pb wget https://essentia.upf.edu/models/classifiers/mood_sad/mood_sad-musicnn-msd-1.pb -
Create the Analysis Script: This Python script (
analyze_emotion.py) loads a song, analyzes it using the models, and saves the emotional journey to a JSON file.# analyze_emotion.py import essentia import essentia.standard as es import numpy as np import json import sys # --- Models and their corresponding V-A weights --- MODELS = { 'mood_happy': (1.0, 0.2), 'mood_sad': (-1.0, -0.5), 'mood_aggressive': (-0.3, 1.0), 'mood_electronic': (0.0, 0.8), 'mood_relaxed': (0.2, -1.0), 'mood_acoustic': (0.1, -0.8), 'mood_party': (0.6, 0.9) } def get_song_emotion_journey(audio_file): """ Analyzes a song and returns its emotional journey over time. """ # 1. Load audio at the correct sample rate for the models audio = es.MonoLoader(filename=audio_file, sampleRate=16000)() # 2. Get the activation journey for each mood predictions = {} for name in MODELS.keys(): model_path = f"{name}-musicnn-msd-1.pb" # The TensorflowPredictMusiCNN algorithm handles framing internally model = es.TensorflowPredictMusiCNN(graphFilename=model_path) activations = model(audio) # The model returns a (num_frames, 2) array; the second column is the activation probability. predictions[name] = activations[:, 1] # 3. Calculate V-A proxies from mood activations num_frames = len(predictions['mood_happy']) valence_journey = np.zeros(num_frames) arousal_journey = np.zeros(num_frames) for name, (v_weight, a_weight) in MODELS.items(): activations = predictions[name] valence_journey += activations * v_weight arousal_journey += activations * a_weight return valence_journey, arousal_journey if __name__ == '__main__': if len(sys.argv) != 2: print("Usage: python analyze_emotion.py <path_to_audio_file>") sys.exit(1) song_path = sys.argv[1] v, a = get_song_emotion_journey(song_path) output_data = { 'valence': v.tolist(), 'arousal': a.tolist() } with open('emotion_data.json', 'w') as f: json.dump(output_data, f, indent=4) print("Emotion data saved to emotion_data.json") print(f"Generated {len(v)} data points.") print(f"Average Valence: {np.mean(v):.2f}") print(f"Average Arousal: {np.mean(a):.2f}")
π¨ Part 2: Visualization #
Once you have the emotion_data.json file, you can visualize the results to better understand the song’s emotional arc.
-
Create the Visualization Script: This Python script (
visualize_emotion.py) reads the JSON data and plots the Valence and Arousal values over time usingmatplotlib.# visualize_emotion.py import json import matplotlib.pyplot as plt import numpy as np import sys def visualize_emotion(data_file, output_image): """ Visualizes the emotional journey from a JSON file. """ with open(data_file, 'r') as f: data = json.load(f) valence = np.array(data['valence']) arousal = np.array(data['arousal']) time = np.arange(len(valence)) fig, ax1 = plt.subplots(figsize=(15, 7)) # Plot Valence color = 'tab:blue' ax1.set_xlabel('Time (frames)') ax1.set_ylabel('Valence (Pleasure)', color=color) ax1.plot(time, valence, color=color, label='Valence') ax1.tick_params(axis='y', labelcolor=color) ax1.grid(True, linestyle='--', alpha=0.6) # Create a second y-axis for Arousal ax2 = ax1.twinx() color = 'tab:red' ax2.set_ylabel('Arousal (Energy)', color=color) ax2.plot(time, arousal, color=color, label='Arousal') ax2.tick_params(axis='y', labelcolor=color) fig.suptitle('Emotional Journey of the Song', fontsize=16) fig.tight_layout() plt.savefig(output_image) print(f"Visualization saved to {output_image}") if __name__ == '__main__': if len(sys.argv) != 2: print("Usage: python visualize_emotion.py <path_to_json_file>") sys.exit(1) data_path = sys.argv[1] visualize_emotion(data_path, 'emotion_visualization.png') -
Run the Full Pipeline:
# Make sure you have a test audio file (e.g., my_song.mp3) in the same directory # 1. Analyze the song python analyze_emotion.py my_song.mp3 # 2. Visualize the results python visualize_emotion.py emotion_data.json
Example Output #
This will produce an emotion_visualization.png file that looks something like this, clearly showing the interplay between the song’s pleasure (Valence) and energy (Arousal) over time.
if name == ‘main’:
song_path = ‘path/to/your/song.mp3’ # <— CHANGE THIS
v, a = get_song_emotion_journey(song_path)
# Now you have two arrays, v and a, representing the emotional
# journey of the song over time. You can visualize or map these.
print("Analysis complete!")
print(f"Generated {len(v)} data points.")
# Example: Print the average emotion for the whole song
print(f"Average Valence: {np.mean(v):.2f}")
print(f"Average Arousal: {np.mean(a):.2f}")
```
π¨ Part 2: Visualization Strategies #
Raw numbers aren’t intuitive. Visualizing the V-A data is key to understanding a song’s emotional journey.
The V-A Plane (Thayer’s Model) #
This is the classic 2D emotional map. It’s the best way to visualize the overall mood.
- Top-Right (High V, High A): π€©
Excited, Euphoric - Bottom-Right (High V, Low A): π
Content, Relaxed - Top-Left (Low V, High A): π
Anxious, Angry - Bottom-Left (Low V, Low A): π
Sad, Depressed
Dynamic Visualizations (The Emotional Journey) #
To see how emotion changes over time, you need to analyze the song in short segments (e.g., every 1 second) and plot the results.
-
π Timeline Plot: Two simple line graphs plotted over the song’s duration. One line for Valence, one for Arousal. This clearly shows emotional shifts, like the build-up to a chorus.
-
π Color-Coded Timeline: A single horizontal bar that represents the song. The bar’s color changes over time based on the V-A values. This gives an at-a-glance emotional summary.
- Mapping Idea: Map the 2D V-A plane to a color wheel (e.g., HSV/HSL color space). Arousal can be
Lightnessand Valence can beHue.
- Mapping Idea: Map the 2D V-A plane to a color wheel (e.g., HSV/HSL color space). Arousal can be
-
β¨ Particle System: A dynamic screensaver-like visualization.
Arousalcontrols the speed and quantity of particles.Valencecontrols their color and direction (e.g., upward for positive, downward for negative).
πΊοΈ Part 3: Creative Mapping to Video #
This is the final step: using the detected V-A data to select video clips.
Strategy 1: Direct Distance Mapping #
This is the foundational approach. For each moment in the song, find the video clip with the “closest” emotional fingerprint.
- Analyze Your Library: First, run your V-A detection on every video clip in your library to get its average
(clip_valence, clip_arousal)score. - Find the Closest Match: For each segment of the song, calculate the Euclidean distance between the song’s V-A and every clip’s V-A. Pick the clip with the smallest distance.
Strategy 2: Quadrant-Based Pools #
This adds variety and prevents the same “best” clip from being used over and over.
- Create Pools: Assign each of your video clips to one of the four V-A quadrants (Excited, Calm, Anxious, Sad).
- Map to the Zone: As the song’s V-A moves into a quadrant, randomly select a video from that quadrant’s pool.
- Benefit: This feels more organic and less robotic than a pure distance calculation.
Strategy 3: Threshold-Based Triggers #
This strategy focuses on reacting to significant emotional changes.
- Set Thresholds: Define what counts as “High Arousal” or “Low Valence” (e.g., Arousal > 0.7).
- Trigger Events: When the song’s V-A crosses a threshold, trigger a specific video event.
- Example: If Arousal crosses the “High” threshold for the first time (the chorus hits), trigger a pre-defined, high-impact video clip.
- Example: If Valence drops below the “Low” threshold, switch to a monochrome or slow-motion video effect.
Strategy 4: Derivative Mapping (Rate of Change) #
This is a more advanced, subtle technique that maps the change in emotion.
- Calculate the Derivative: Instead of looking at the V-A value, look at how fast it’s changing between segments.
- Map the Change:
- A rapid increase in Arousal could map to a fast camera zoom, a quick fade-in, or a build-up of visual effects.
- A sudden drop in Valence could trigger an instant cut to a darker or more abstract scene.
- A period of no change could map to a long, static shot.