Artificial Intelligence also known as AI is the most important technological revolution in this century. So, it is important to understand about AI.
There are several subfields in AI such as Machine learning, Robotics, Neural network architectures, speech recognition, and Computer vision. In this short article, let’s look at a simple overview of computer vision.
Computer vision is an area within computer science that aims to allow computers to recognize and comprehend objects and individuals in pictures and videos. Similar to other forms of artificial intelligence, computer vision aspires to carry out and automate functions that mimic human abilities. In this context, computer vision strives to emulate both the process by which humans perceive and the manner in which they interpret what they perceive.
Basic idea of computer vision is giving computers the ability to see the physical world using visual input data.
History:
The history of computer vision, as a branch of AI, is more contemporary than we often realize. The actual history goes back to 60 years. In 1959, there were several experiments happened to to understand how the human brain processes visual information leading to the development of image-scanning technology. At the same years, Artificial intelligence started to develop as a major technological research area.
Then in 1970s the optical character recognition system was built. In the 1980s, neuroscientist David Marr demonstrated that vision operates in a hierarchical manner and proposed algorithms for machines to identify edges, corners, curves, and other fundamental shapes. At the same time, Japanese scientist Kunihiko Fukoshima designed the Neocognitron; a cell network used for recognizing patterns. After 1990s, with the rapid advancing technology, facial recognition systems first came to public attention. the developments were made continuously, specially using Convolutional Neural Networks.
Now, computer vision plays a vital part in modern approaches into Artificial Intelligence.
Basic work mechanism:
As mentioned before, the basic concept of computer vision is to give computers the ability to see the world. First part is to collect input data. Most of these data are images or videos that can be either pre-recorded or real time. imagine you need the computer to recognize photos of special object such as a bird out of thousands of photos of various objects.
For this we are using a subfield of machine learning called deep learning algorithms and their fundamental architecture named Neural Networks. A basic concept behind ML algorithms is to analyse for data patterns and make decisions. An Algorithm simply means a list of instructions to follow for achieving best outcome possible. So, the deep learning algorithms employ algorithmic models that allow a computer to learn about the context of visual information independently. When a sufficient amount of data is introduced to the model, the computer will analyse the data and autonomously learn to differentiate between images. Algorithms facilitate self-learning for the machine, as opposed to being programmed by a human to identify an image.
A CNN assists a machine learning or deep learning model in “seeing” by deconstructing images into pixels that are assigned tags or labels. It utilizes these labels to carry out convolutions (a mathematical operation involving two functions to yield a third function) and make predictions regarding what it is “observing.” The neural network performs convolutions and evaluates the accuracy of its predictions through multiple iterations until the predictions begin to align with reality. Consequently, it is able to recognize or “see” images in a manner reminiscent of human perception.
This is basically mimicking the human interaction for seeing and identifying object around them.
Applications of Computer Vision:
Artificial Intelligence and its subfields are continuously advancing in real time with a rapid speed. Also, it is already used in current applications across many sectors including military, public safety, entertainment, transportation, healthcare and many more. A significant factor contributing to the expansion of these applications is the influx of visual data generated by smartphones, security cameras, traffic monitors, and other devices equipped with visual capabilities. This information holds the potential to greatly impact operations in various sectors, yet it remains largely untapped.
Following is an example for major application using computer vision.
- Google translator : This application leverages a smartphone camera along with computer vision techniques to examine and convert text found in images, including signs or documents written in different languages.
Here are real world examples of implementing Computer vision-based applications.
- Image classification: Ability to predict accurate image and classify them according to cluster data patterns.
- Optical pattern recognition: OCR is a major application area of CV. this is the use of computer vision to inspect and evaluate the quality or condition of various components or products.
- 3D modelling: This is using computer vision to analyse multiple images of an object or environment and construct a 3D model of it.
- Motion capture: this is using computer vision to capture and analyse the movement of actors or other objects, typically for use in animation or virtual reality applications
- Object tracking and detection: We can use image classification with CV to identify certain objects and track the object throughout specific time period.
- Biometrics: using computer vision to analyse and recognize unique physical characteristics, such as fingerprints, for identity verification and other applications.
This is the basic concept of computer vision. There are many tutorials available about the topic of application of computer vision and building modern technological devices.
Following are some of the major projects I have done in the area of computer vision and Machine learning.
- Facial recognition algorithm using computer vision:
2. Facial expressions/emotions identification algorithm: