Building an object recognition model with YOLO algorithm

Object detection is one of the most important technological developments in recent history. the process of object detection is simply a computer vision-based approach to detect and identify objects by sending visual input data through advanced machine learning algorithms. This is AI-powered technology that is very versatile in the modern highly digitalized world.

Object detection is currently used in several applications including:

Autonomous vehicles
surveillance and security
automation
advanced robotics
military and defence

There are several approaches to building a object detection model/system. In this article, we are going to discuss about building a YOLO-based object detection model.

Object classification:

First, we must realise what object classification means. object classification simply means the process of identifying and categorizing objects based on visual data patterns. Object classification is generally conducted through machine learning methodologies, wherein a model is developed using a labeled dataset comprising images and their corresponding class labels. Once trained, the model is capable of classifying new images by attributing a class label based on the features it has learned. Examples of object classification encompass the recognition of traffic signs or the identification of various plant species within an image.

Object detection:

Object is also an interesting concept. It is generally a computer vision-based process that mainly participating in in detecting and identifying objects in a given visual input frame.

YOLO algorithm:

The You Only Look Once which commonly known as YOLO is an algorithm that is commonly used for real-time object recognition purpose. Its high accuracy and faster functionality is the main reason for YOLO algorithms common use.

the basic process of YOLO algorithm is follows. when an image (visual input) is sent to YOLO algorithm, it will dived the image into smaller cells. Each cell in the grid forecasts the likelihood of an object’s presence along with the coordinates of its bounding box. Additionally, it identifies the object’s class. In contrast to two-stage object detectors like R-CNN and its derivatives, YOLO analyses the complete image in a single pass, resulting in enhanced speed and efficiency.

YOLO is extensively utilized across a range of applications, including autonomous vehicles and monitoring systems. Additionally, it plays a significant role in real-time object detection tasks, such as those found in real-time video analysis and surveillance.

Installing YOLO:

TO install the YOLO algorithm, following command can be used.

pip install ultralytics

YOLO is a part of ultralytics python library package. But there are several versions of YOLO. So, you can directly install YOLO on your device using following commands.

for YOLO -version 8:

pip install yolov8

for YOLO – version 7:

pip install yolov7

These two latest versions better qualities than previous versions.

Other requirements:

to develop a detection model, we need following requirements in our computer.

First, we need to install python package in the computer. click on this link and download the python package and install it by running the installer.

To execute the code we will also need an IDE (integrated development environment). The most common IDE is VS code. You can download VS code by clicking on this link.

For building a model that can detect objects in real time, we also need open-cv Python package and math library. run following command on the terminal.

pip install cv2
pip install math

Building the model:

Now, we are ready to build object detection model.

Open your IDE and create a Python file and name it ‘main.py’ and save it in a desired folder.

open the Python file and import following libraries first.

import cv2
import math
from ultralytics import YOLO
# alternately you can also import specific YOLO library by:
# import yolov8

Then we have to insert the input visual data into algorithm. In this model we are using webcam of computer as the input device. SO, we have to include following lines to the main.py file. For webcam, we are using default camera

#step -2 video capturing function.
#starting the webcam
cap = cv2.VideoCapture(0) #defualt camera(0) is used to capture frames
#resolution 640x480
cap.set(3, 640) 
cap.set(4, 480)

Then we must load the YOLO model weights file as our pre-trained model.

#building the model:
model = YOLO('./yolo-Weights/yolov8n.pt')

Then we must classify the the objects we want to identify and detect using the algorithm model.

#defining the object classes:
classNames = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck", "boat",
              "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat",
              "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella",
              "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat",
              "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup",
              "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli",
              "carrot", "hot dog", "pizza", "donut", "cake", "chair", "sofa", "pottedplant", "bed",
              "diningtable", "toilet", "tvmonitor", "laptop", "mouse", "remote", "keyboard", "cell phone",
              "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors",
              "teddy bear", "hair drier", "toothbrush"
              ]

Now we have load and classified our data models and input data. SO, now we must build the detection model by creating the detection loop.

#buildng the loop:
while True:
    success, img = cap.read()
    results = model(img, stream=True)

    #coordinates
    for r in results:
        boxes = r.boxes

        for box in boxes:
            #bounding box
            x1, y1, x2, y2 = box.xyxy[0]
            #converting to integer 
            x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)

            #put box in cam
            cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 255), 3)

            #cinfidence
            confidence = math.ceil((box.conf[0]*100))/100
            print("Confidence --->", confidence)

            #class name
            cls = int(box.cls[0])

            #object details 
            org = [x1, y1]
            font =cv2.FONT_HERSHEY_SIMPLEX
            fontScale = 1
            color = (255, 0, 0)
            thickness = 2

            cv2.putText(img, classNames[cls], org, font, fontScale, color, thickness)
    
    cv2.imshow('WebCam', img)
    if cv2.waitKey(1) == ord('q') :
        break

Now we have build our detection algorithm model. the only thing left is to call the model by releasing the capturing process.

cap.release()

then we can close the detection camera model window by using following at last.

cv2.destroyAllWindows()

Now we have completely built the algorithm model for object detection process. then we can execute Python file (main.py) to start the model.

Then the algorithm will be executed and it will start the webcam. then you can see the live visual data feed while algorithm is trying detect and classify objects continuously.

previous figure shows an perfect example how the detection model will analyse live data feed and detect objects in each and every frame of the live video.

To download the full code with weights files, you can click in this link and download the source file.