A match in the FIRST Robotics Competition is a 2.5 minute game where two alliances of three teams each compete using their robots to score points and complete objectives. Knowing the strength and strategy of other teams is a vital part in coordinating match strategy and drafting alliances for playoffs. AutoScout aims to be a tool to track the paths and game-elements scored of each team’s robot.

Each match is recorded to be livestreamed, and that livestream contains the score and play field from three different camera angles. My hypothesis was that this video feed was enough to accurately track the position of each FRC robot throughout a match.

I initially tried OpenCV’s built in trackers, like CSRT and KCF. Although it processed quickly and worked quite well and the robot was in the open, the tracks required initialization and they would lose the robot from minimal obstruction. So, I wanted to take a different approach that could take advantage of all three cameras. I theorized that you could do this by detecting robots in each frame using a YOLO model for each frame, then combine those detections into full paths with an algorithm.

To create a YOLO model, I needed to make a dataset of annotated images. I wrote a script that randomly downloads random match videos using the TBA API and YT-DLP, then I uploaded those frames to Roboflow, where I annotated ~200 images. After that, I trained a YOLO v8 model on a Google Colab notebook and downloaded the weights to my computer.

Next, I began to work on the tracking part. After detecting all the robots on screen, I would project those points to a top-down map of the field so that all the robot positions would be on the same coordinate space. I marked four points on each camera feed so that I could use a perspective transform to move the points from the camera perspective to the top-down field.

Then, I wrote a set of rules in order to decide on the actual positions of robots by looking for clusters of points. I wrote a set of rules that grouped pairs of detections from the side-view camera and full-field camera, and also accounted for edge cases like double detections. This outputted list of fairly-accurate robot detections.

In order to make detections into tracks, I ran looped through each frame of the video and assigned the new detections to be part of the tracks of detections from the past frame that were the closest.

However, there were some moments where the robots were completely obstructed from all cameras. So instead of trying to get the full tracks of each robot throughout the entire match, I just made small fragments of paths. If a tracklet didn’t detect a robot for several frames, it would be terminated. Afterwards, I went through again and stitched together the tracklets. Once a tracklet ended it would look for the next tracklet that popped up closely. This let me get tracks of all six robots throughout an entire match.

This worked quite well for some matches, but struggled with matches where there were frequent false positive detections or the robots were obstructed for long periods of time. In those matches, it would lose the robot and be unable to recover. I set out to make a few improvements:

First, I improved the reliability of tracking the robots between frames by implementing a Kalman filter with a Hungarian matching algorithm (instead of greedy closest-pair matching).

Also, I made improvements to how tracklets were being stitched together. In the current system, some tracks sometimes got lost chasing the small false positives and discarding the long, real tracks. To improve this, I made the objective to minimize a loss function, where big jumps in robot position or not using long tracklets would be penalized. Then, I ran a solver to determine the best assignment of tracklets to full-match tracks of robots. Finally, I added some smoothing over the missing frames and got results like this.

I wanted to make this more polished, so I made an interface with Tkinter that created a pipeline, so matches could be processed in a queue. I also experimented with reading bumper numbers with EasyOCR and determining when robots scored game elements by detecting score increments.