Splitting video with ffmpeg and Python

I had a project to build a simple website that split uploaded video into parts that have same duration (except the last one if the division has remainder). Almost everyone in the internet suggest ffmpeg which is so far considered the best open-source swiss-army knife for video manipulation.

After hours of browsing and trial-and-errors, I found best 2 solutions to do this with ffmpeg. Each solution has its own advantage/disadvantage. I used Python to “glue” the whole process because it’s simplicity.

Solution #1

The first solution, which is suggested by a helpful Stack Overflow answer, uses this command as the base:

ffmpeg -ss {start_time} -t {duration} -i {input_path} {output_path}

For example, this command below will extract video from 10th second to 15th second (5 seconds duration):

ffmpeg -ss 10 -t 5 -i "video.mp4" "video_1.mp4"

Most references in the internet suggested adding –c copy option which will not re-encode the video so the process will be much faster. But the trade off is the duration of the extracted video will not always be precise (1-5 seconds shorter/longer). On the other hand without -c copy the process is very slow (due to re-encoding) but the result is very precise.

This is how I use the first solution in Python code:

import re
import math
from subprocess import check_call, PIPE, Popen
import shlex

re_metadata = re.compile('Duration: (\d{2}):(\d{2}):(\d{2})\.\d+,.*\n.* (\d+(\.\d+)?) fps')

def get_metadata(filename):
    '''
    Get video metadata using ffmpeg
    '''
    p1 = Popen(["ffmpeg", "-hide_banner", "-i", filename], stderr=PIPE, universal_newlines=True)
    output = p1.communicate()[1]
    matches = re_metadata.search(output)
    if matches:
        video_length = int(matches.group(1)) * 3600 + int(matches.group(2)) * 60 + int(matches.group(3))
        video_fps = float(matches.group(4))
        # print('video_length = {}\nvideo_fps = {}'.format(video_length, video_fps))
    else:
        raise Exception("Can't parse required metadata")
    return video_length, video_fps

def split_cut(filename, n, by='size'):
    '''
    Split video by cutting and re-encoding: accurate but very slow
    Adding "-c copy" speed up the process but causes imprecise chunk durations
    Reference: https://stackoverflow.com/a/28884437/1862500
    '''
    assert n > 0
    assert by in ['size', 'count']
    split_size = n if by == 'size' else None
    split_count = n if by == 'count' else None
    
    # parse meta data
    video_length, video_fps = get_metadata(filename)

    # calculate split_count
    if split_size:
        split_count = math.ceil(video_length / split_size)
        if split_count == 1:        
            raise Exception("Video length is less than the target split_size.")    
    else: #split_count
        split_size = round(video_length / split_count)

    output = []
    for i in range(split_count):
        split_start = split_size * i
        pth, ext = filename.rsplit(".", 1)
        output_path = '{}-{}.{}'.format(pth, i+1, ext)
        cmd = 'ffmpeg -hide_banner -loglevel panic -ss {} -t {} -i "{}" -y "{}"'.format(
            split_start, 
            split_size, 
            filename, 
            output_path
        )
        # print(cmd)
        check_call(shlex.split(cmd), universal_newlines=True)
        output.append(output_path)
    return output

The idea is just to calculate the exact start time of each video chunks and call the ffmpeg few times according to the number of chunks.

Solution #2

The second solution, which I found in a great Medium post, uses this command to automatically segment a video:

ffmpeg -i {input_path} -c copy -map 0 -segment_time {duration} -reset_timestamps 1 -g {frame_group} -sc_threshold 0 -force_key_frames "expr:gte(t,n_forced*{duration})" -f segment "{output_path}-%d.{output_extension}"

As you may have noticed, this command also uses -c copy that make it very fast. On top of that, this command splits the video into multiple parts, so we don’t need to call it multiple time in the Python script. The catch is that even the -c copy option is removed, somehow the precision stays low. I’m guessing this is because without re-encoding cut can only be done on predefined key frames. This method also seems to try adding new key frames on the fly by force, but the result doesn’t seem to be different.

Here is the Python code to use the second solution:

def split_segment(filename, n, by='size'):
    '''
    Split video using segment: very fast but sometimes innacurate
    Reference https://medium.com/@taylorjdawson/splitting-a-video-with-ffmpeg-the-great-mystical-magical-video-tool-%EF%B8%8F-1b31385221bd
    '''
    assert n > 0
    assert by in ['size', 'count']
    split_size = n if by == 'size' else None
    split_count = n if by == 'count' else None
    
    # parse meta data
    video_length, video_fps = get_metadata(filename)

    # calculate split_count
    if split_size:
        split_count = math.ceil(video_length / split_size)
        if split_count == 1:        
            raise Exception("Video length is less than the target split_size.")    
    else: #split_count
        split_size = round(video_length / split_count)

    pth, ext = filename.rsplit(".", 1)
    cmd = 'ffmpeg -hide_banner -loglevel panic -i "{}" -c copy -map 0 -segment_time {} -reset_timestamps 1 -g {} -sc_threshold 0 -force_key_frames "expr:gte(t,n_forced*{})" -f segment -y "{}-%d.{}"'.format(filename, split_size, round(split_size*video_fps), split_size, pth, ext)
    check_call(shlex.split(cmd), universal_newlines=True)

    # return list of output (index start from 0)
    return ['{}-{}.{}'.format(pth, i, ext) for i in range(split_count)]

Since this method handles the splitting internally, we just need to compute proper parameters to be passed into the command. The only extra computation needed compared to the first method is the computation for theframe_group = round(split_size*video_fps).

Conclusions

So there are two solutions that I found and used in my last video splitting project. The first one is precise but slow, while the second one is fast but not precise. I decided to implemented both and allow user to choose what they need. I hope this post help anyone working on similar project.

I’m still very inexperienced in this video manipulation business. So if you know a better solution, please let me know in the comment. I’d be very interested to test it.

Cheers! ?

5 3 votes
Article Rating
Subscribe
Notify of
guest
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments