television, sound bar, audio interface, computer, raspberry pi — Television Commercial Muting System

Project Evolution

Initially I made a Raspberry Pi infrared remote control for a television sound bar. Then I added a Flask web service for a network enabled Raspberry Pi tv remote control. Then Raspberry Pi Mute TV Commercials Automatically added stored pre-programmed television commercial times. At scheduled times, the system made requests to the remote control service. This approach fails if a show (perhaps a live show) varies the times it plays commercials.

Automatic Commercial Detection

I considered several ways to automatically detect commercials, and then wrote tv_commercial_silencer. It uses audio analysis "acoustic fingerprinting" similar to Shazam or SoundHound. Then it makes requests to the remote control service to control the sound.

Ways to Automatcially Detect Commercials

supervised machine learning
closed-caption subtitle text
video analysis
audio analysis

supervised machine learning

Record hundreds or thousands of short video files. One or more humans categorize each video as “commercial” or “not commercial”. Use the files and categorization as training data for supervised machine learning. The computer can “watch” the video (perhaps by sampling images) and “listen” to the audio. Then the model can generalize and categorize any video as commercial or not commercial.

Someone told me they thought it would be difficult to train a model to be fast and accurate. I realized perhaps this approach was unnecessarily ambitious! I generally watch a small number of television shows. The program doesn’t need to be able to recognize ANY television commercial, just the ones I’ve been seeing.

closed-caption subtitle text

Create a set of words or phrases from commercials (e.g. ‘Lexus dealer’, ‘ask your doctor if’). Parse the text stream and check phrases or words against the pre-defined set. I think the text stream may be easier to access from NTSC television broadcasts than from cable HDMI. CCExtractor may be able to extract text from pre-recorded video. I don’t know if the captions need to be visible in the video.

video analysis

Projects like Comskip analyze video files for things like black frames. I think currently it requires a pre-recorded file as input.

audio analysis

Shazam and SoundHound listen to microphone input to identify an audible song. First they process each of many songs into “acoustic fingerprints” and store them in a database. The acoustic fingerprint is based on a spectogram of sound intensity at different frequencies vs time. Then to identify one song from microphone input they calculate its fingerprint and check for a match in the database.

Dejavu

This wonderful library implements audio fingerprinting and recognition in Python. It has recognizers for recorded files and for live microphone input. Originally by worldveil/dejavu, forked by DataWookie/dejavu and bcollazo/dejavu. In 2019-03 bcollazo/dejavu appeared to be the most active fork.

Implementation- Detecting Commercials via Audio Analysis

I wrote tv_commercial_silencer, it uses Dejavu. https://github.com/beepscore/tv_commercial_silencer

I added audio_recognizer.py with methods like recognize_audio_from_microphone:

def recognize_audio_from_microphone(self, djv, seconds=5):
    """
    method samples 'seconds' number of seconds
    :param djv: a dejavu instance, preconfigured by having run fingerprint_directory
    :param seconds: number of seconds to recognize audio
    :return: match_dict if confidence is >= confidence_minimum, else None
    """
    match_dict = djv.recognize(MicrophoneRecognizer, seconds=seconds)
    logger.debug('match_dict: {}'.format(match_dict))
    # example output
    # 2019-04-22 17:47:34 DEBUG    recognize_audio_from_microphone line:79 match_dict:
    # {'song_id': 4, 'song_name': 'google-help-cooper', 'confidence': 146,
    # 'offset': 17, 'offset_seconds': 0.78948, 'file_sha1': '5b2709b5d22011c18f9a7b6ab7f04f0e89da4d41'}

    if match_dict is None:
        # "Nothing recognized -- did you play the song out loud so your mic could hear it? :)"
        return None

    else:
        # use confidence_minimum to help avoid false positives,
        # e.g. avoid algorithm accidentally matching to background noise with confidence ~ 10
        # by manual observation of logs, 100 seems overly conservative
        confidence_minimum = 40
        confidence = match_dict.get('confidence')

        if confidence is not None and confidence >= confidence_minimum:
            commercial_name = match_dict.get('song_name')
            duration_seconds = media_duration_dict.get(commercial_name)
            offset_seconds = match_dict.get('offset_seconds')
            duration_remaining_seconds = AudioRecognizer.time_remaining_seconds(duration_seconds,
                                                                                offset_seconds, seconds)

            duration_remaining_seconds_min = 8
            if duration_remaining_seconds >= duration_remaining_seconds_min:
                # duration_remaining_seconds is long enough for tv service
                # to emulate multiple remote control button presses.

                logger.debug('is_call_tv_service_enabled: {}'.format(self.is_call_tv_service_enabled))

                if self.is_call_tv_service_enabled:
                    # Don't call mute, too easy for app to get toggle confused
                    tv_service.volume_decrease_increase(duration_seconds=duration_remaining_seconds)

                    # disable calling tv service again until duration_remaining_seconds has elapsed
                    # this prevents multiple calls for one commercial
                    self.is_call_tv_service_enabled = False
                    # at run_date, scheduler will re-enable calling tv service
                    run_date = datetime.now() + timedelta(seconds=duration_remaining_seconds)
                    # add_job, implicitly create the trigger
                    # args is for function enable_call_tv_service
                    self.background_scheduler.add_job(self.enable_call_tv_service, 'date',
                                                      run_date=run_date, args=[True])

            return match_dict

    return None

acquiring audio

How to Record Audio to a Computer Using an Audio Interface describes recording and monitoring television audio. I stored the files locally in directory data/commercial_mp3. The project .gitignore ignores the data directory and the recordings aren’t commited to the repository.

first test- play mp3 file through computer speakers, use computer microphone as input

I selected computer microphone as input. I used QuickTime to loop play one of the commercial mp3 files through the computer speakers. Then while the commercial was playing I ran:

python3 audio_recognizer.py

It successfully recognizes commercials! Example output:

2019-04-20 21:01:40 DEBUG    recognize_audio_from_microphone line:79
    From mic with 5 seconds we recognized: {"song_id": 13, "song_name": "chantix", "confidence": 376,
    "offset": 525, "offset_seconds": 24.38095,
    "file_sha1": "7050797273712b325559706c4d6878594238583866486d4b4371493d0a"}

second test- play recorded television video use audio interface as input

I selected sound input Scarlett 2i4 USB audio interface. I played a recorded show on the television. The program successfully recognized multiple different commercials and muted the audio.

TODO:

Consider run commercial detection on Raspberry Pi

Currently the commercial detection code is running on macOS. See if a Raspberry Pi is fast enough to run commercial detection. Could run code on the same Raspberry Pi running remy_python, or on a separate Raspberry Pi.

References

tv_commercial_silencer

https://github.com/beepscore/tv_commercial_silencer

remy_python

A Raspberry Pi infrared remote control. The Python app has three parts: Functions to send commands to the infrared transmitter. A Flask web service to accept television command requests (e.g. volume decrease, volume increase). A scheduler that automatically sends remote control commands at programmed times (e.g. mute during TV commercials). https://github.com/beepscore/remy_python