What is computer vision?

Artificial Intelligence is one of the major technological revolutions in modern history. Perhaps it can be the most important technological development that humans have achieved. It is a scientific discipline focused on creating machines capable of learning, reasoning, and performing tasks that typically necessitate human intelligence. This expansive field encompasses various areas of study, including computer science, data analysis, linguistics, and neuroscience. There are several subfields of AI including machine learning, deep learning, reinforcement learning, and computer vision.

introduction:

Computer vision is one of the major subfields in AI development. Just like the name itself, the main purpose of Computer vision is to give computers the ability to understand their surroundings and physical world on their own computational power (without human intervention) by analysing visual data (images, videos) and a decision-making process (algorithms).

Computer vision is a collective field with several subfield including:

image processing
object tracking / detection / identification
facial recognition
anomaly detection
motion detection / esimation
sequence building / reconstuction
Virtual reality
3D building

After undergoing training, the computer is capable of utilizing this acquired knowledge to recognize and categorize objects in unfamiliar images and videos. The precision of these classifications can be enhanced progressively through additional training and increased exposure to diverse data. Computer vision, alongside machine learning, may also employ methodologies such as deep learning. This approach entails the training of artificial neural networks on extensive datasets to identify patterns and features, mirroring the cognitive processes of the human brain.

Background:

The history of the modern CV dates back to the beginning of the Cold War era. especially after World War II, there were numerous experiments and research on human brain functions. the research was conducted to understand how the human brain processes visual information leading to the development of image-scanning technology in 1959. In the 1960s, the field of artificial intelligence began to take shape within academia, leading to advancements in computer technology that enabled the conversion of two-dimensional images into three-dimensional representations. During the 1970s, optical character recognition technology was developed. Then during the 1980s, Neuroscientist David Marr determined that the process of vision operates in a hierarchical manner and proposed algorithms that enable machines to identify edges, corners, curves, and other fundamental shapes.

In the 1990s and 2000s, applications for real-time facial recognition emerged, accompanied by the standardization of tagging and annotating visual datasets. The introduction of the ImageNet dataset in 2010 marked a significant milestone, as it comprised millions of labelled images spanning a thousand object categories, thereby establishing a crucial foundation for the convolutional neural networks (CNNs) and deep learning models that are prevalent today. The year 2012 witnessed a significant advancement with the introduction of the Alex Net model, which achieved a remarkable reduction in the error rate for image recognition to merely a few percentage. These advancements have facilitated the extensive application of computer vision across various fields in contemporary society.

Importance of Computer Vision:

While visual information processing technology has existed for some time, it traditionally depended heavily on human intervention, resulting in a process that was both labour-intensive and prone to inaccuracies. For example, earlier facial recognition systems required developers to manually label thousands of images with essential data points, such as the width of the nose bridge and the distance between the eyes. The automation of these tasks necessitated substantial computing resources, as image data is inherently unstructured and difficult for computers to process. As a result, vision applications were expensive and not easily accessible to most organizations. However, recent advancements in the field, coupled with a significant increase in computational power, have improved both the scale and accuracy of image data processing. Cloud computing resources now provide access to computer vision systems for all organizations. This technology can be utilized for a variety of applications, including identity verification, content moderation, streaming video analysis, fault detection, and more.

Computer vision workflow:

There are three major stages of computer vision.

Acquiring videos, images from source
image processing / analysing
understand the input

To create an algorithm for computer vision, advanced technologies employ deep learning, a subset of machine learning. Many successful methods in modern computer vision applications are based on convolutional neural networks (CNNs). These layered neural networks enable computers to comprehend the context of visual data extracted from images. With an adequate amount of data, the computer can differentiate between various images. As the image data is processed within the model, a Convolutional Neural Network (CNN) is applied to evaluate the information. This network aids machine learning and deep learning models in interpreting images by breaking them down into labeled pixels, a technique referred to as image annotation. The AI model leverages these labels to perform convolutions and make predictions about its visual input, continuously refining its predictions through iterative adjustments until they correspond with the anticipated results.

Computer vision is an important subfield of Artificial Intelligence and a major contributor to several modern fields of science and technology. In this article, we reviewed an introduction to computer vision. Upcoming articles will take a more deep analyse on this topic.

Vishwa GW