Python Data Analysis to Automatically Detect and Mute Television Commercials
Project Evolution
Initially I made a Raspberry Pi infrared remote control for a television sound bar. Then I added a Flask web service for a network enabled Raspberry Pi tv remote control. Then Raspberry Pi Mute TV Commercials Automatically added stored pre-programmed television commercial times. At scheduled times, the system made requests to the remote control service. This approach fails if a show (perhaps a live show) varies the times it plays commercials.
Automatic Commercial Detection
I considered several ways to automatically detect commercials, and then wrote tv_commercial_silencer. It uses audio analysis "acoustic fingerprinting" similar to Shazam or SoundHound. Then it makes requests to the remote control service to control the sound.
Ways to Automatcially Detect Commercials
- supervised machine learning
- closed-caption subtitle text
- video analysis
- audio analysis
supervised machine learning
Record hundreds or thousands of short video files. One or more humans categorize each video as “commercial” or “not commercial”. Use the files and categorization as training data for supervised machine learning. The computer can “watch” the video (perhaps by sampling images) and “listen” to the audio. Then the model can generalize and categorize any video as commercial or not commercial.
Someone told me they thought it would be difficult to train a model to be fast and accurate. I realized perhaps this approach was unnecessarily ambitious! I generally watch a small number of television shows. The program doesn’t need to be able to recognize ANY television commercial, just the ones I’ve been seeing.
closed-caption subtitle text
Create a set of words or phrases from commercials (e.g. ‘Lexus dealer’, ‘ask your doctor if’). Parse the text stream and check phrases or words against the pre-defined set. I think the text stream may be easier to access from NTSC television broadcasts than from cable HDMI. CCExtractor may be able to extract text from pre-recorded video. I don’t know if the captions need to be visible in the video.
video analysis
Projects like Comskip analyze video files for things like black frames. I think currently it requires a pre-recorded file as input.
audio analysis
Shazam and SoundHound listen to microphone input to identify an audible song. First they process each of many songs into “acoustic fingerprints” and store them in a database. The acoustic fingerprint is based on a spectogram of sound intensity at different frequencies vs time. Then to identify one song from microphone input they calculate its fingerprint and check for a match in the database.
Dejavu
This wonderful library implements audio fingerprinting and recognition in Python. It has recognizers for recorded files and for live microphone input. Originally by worldveil/dejavu, forked by DataWookie/dejavu and bcollazo/dejavu. In 2019-03 bcollazo/dejavu appeared to be the most active fork.
Implementation- Detecting Commercials via Audio Analysis
I wrote tv_commercial_silencer, it uses Dejavu. https://github.com/beepscore/tv_commercial_silencer
I added audio_recognizer.py with methods like recognize_audio_from_microphone:
def recognize_audio_from_microphone(self, djv, seconds=5):
"""
method samples 'seconds' number of seconds
:param djv: a dejavu instance, preconfigured by having run fingerprint_directory
:param seconds: number of seconds to recognize audio
:return: match_dict if confidence is >= confidence_minimum, else None
"""
match_dict = djv.recognize(MicrophoneRecognizer, seconds=seconds)
logger.debug('match_dict: {}'.format(match_dict))
# example output
# 2019-04-22 17:47:34 DEBUG recognize_audio_from_microphone line:79 match_dict:
# {'song_id': 4, 'song_name': 'google-help-cooper', 'confidence': 146,
# 'offset': 17, 'offset_seconds': 0.78948, 'file_sha1': '5b2709b5d22011c18f9a7b6ab7f04f0e89da4d41'}
if match_dict is None:
# "Nothing recognized -- did you play the song out loud so your mic could hear it? :)"
return None
else:
# use confidence_minimum to help avoid false positives,
# e.g. avoid algorithm accidentally matching to background noise with confidence ~ 10
# by manual observation of logs, 100 seems overly conservative
confidence_minimum = 40
confidence = match_dict.get('confidence')
if confidence is not None and confidence >= confidence_minimum:
commercial_name = match_dict.get('song_name')
duration_seconds = media_duration_dict.get(commercial_name)
offset_seconds = match_dict.get('offset_seconds')
duration_remaining_seconds = AudioRecognizer.time_remaining_seconds(duration_seconds,
offset_seconds, seconds)
duration_remaining_seconds_min = 8
if duration_remaining_seconds >= duration_remaining_seconds_min:
# duration_remaining_seconds is long enough for tv service
# to emulate multiple remote control button presses.
logger.debug('is_call_tv_service_enabled: {}'.format(self.is_call_tv_service_enabled))
if self.is_call_tv_service_enabled:
# Don't call mute, too easy for app to get toggle confused
tv_service.volume_decrease_increase(duration_seconds=duration_remaining_seconds)
# disable calling tv service again until duration_remaining_seconds has elapsed
# this prevents multiple calls for one commercial
self.is_call_tv_service_enabled = False
# at run_date, scheduler will re-enable calling tv service
run_date = datetime.now() + timedelta(seconds=duration_remaining_seconds)
# add_job, implicitly create the trigger
# args is for function enable_call_tv_service
self.background_scheduler.add_job(self.enable_call_tv_service, 'date',
run_date=run_date, args=[True])
return match_dict
return None
acquiring audio
How to Record Audio to a Computer Using an Audio Interface describes recording and monitoring television audio. I stored the files locally in directory data/commercial_mp3. The project .gitignore ignores the data directory and the recordings aren’t commited to the repository.
first test- play mp3 file through computer speakers, use computer microphone as input
I selected computer microphone as input. I used QuickTime to loop play one of the commercial mp3 files through the computer speakers. Then while the commercial was playing I ran:
python3 audio_recognizer.py
It successfully recognizes commercials! Example output:
2019-04-20 21:01:40 DEBUG recognize_audio_from_microphone line:79
From mic with 5 seconds we recognized: {"song_id": 13, "song_name": "chantix", "confidence": 376,
"offset": 525, "offset_seconds": 24.38095,
"file_sha1": "7050797273712b325559706c4d6878594238583866486d4b4371493d0a"}
second test- play recorded television video use audio interface as input
I selected sound input Scarlett 2i4 USB audio interface. I played a recorded show on the television. The program successfully recognized multiple different commercials and muted the audio.
TODO:
Consider run commercial detection on Raspberry Pi
Currently the commercial detection code is running on macOS. See if a Raspberry Pi is fast enough to run commercial detection. Could run code on the same Raspberry Pi running remy_python, or on a separate Raspberry Pi.
References
tv_commercial_silencer
https://github.com/beepscore/tv_commercial_silencer
remy_python
A Raspberry Pi infrared remote control. The Python app has three parts: Functions to send commands to the infrared transmitter. A Flask web service to accept television command requests (e.g. volume decrease, volume increase). A scheduler that automatically sends remote control commands at programmed times (e.g. mute during TV commercials). https://github.com/beepscore/remy_python
audio analysis
Audio Fingerprinting with Python and Numpy
https://willdrevo.com/fingerprinting-and-audio-recognition-with-python/
Acoustic Fingerprint
https://en.wikipedia.org/wiki/Acoustic_fingerprint
An Industrial-Strength Audio Search Algorithm
https://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
Dejavu
https://github.com/beepscore/dejavu/tree/dejavu
Audio interface
How to Record Audio to a Computer Using an Audio Interface
Focusrite Scarlett 2i4
https://focusrite.com/usb-audio-interface/scarlett/scarlett-2i4
Recording multiple microphones in python
Discusses using PyAudio and Scarlett audio interface https://stackoverflow.com/questions/25620285/recording-multiple-microphones-in-python
Convert multi-channel PyAudio into NumPy array
https://stackoverflow.com/questions/22636499/convert-multi-channel-pyaudio-into-numpy-array
Closed captioning
Enough Already by Matt Richardson
https://makezine.com/2011/08/16/enough-already-the-arduino-solution-to-overexposed-celebs/
Video Experimenter Shield for Arduino
https://nootropicdesign.com/ve/
CCExtractor
https://www.ccextractor.org/start
Cool external projects that use subtitles or do sorcery with a video stream
https://www.ccextractor.org/public:general:coollinkswithsubfs
Supervised machine learing
https://en.wikipedia.org/wiki/Supervised_learning
video analysis
Comskip
https://www.kaashoek.com/comskip/
https://github.com/erikkaashoek/Comskip
How to Automatically Skip Commercials in NextPVR with Comskip
https://www.howtogeek.com/251405/how-to-automatically-skip-commercials-in-nextpvr-with-comskip/