Computer vision or image recognition, which is the process of detecting the objects and other features of a digital image, is all set to radically transform our life and work environments. The technology is finding applications in totally different domains. The long strides made by the machine learning technology have hugely improved the image detection capabilities of modern digital devices.
Machine learning is learning from data- here, in the image detection context, thousands of images are fed to the learning system and by analysing the features of these images the system builds an image classification model. This means, by feeding millions of images of the letters of an alphabet of a language, we can make a system that can recognise text from images. The availability of free image databases (like Imagenet- http://www.image-net.org/), which contains millions of images tagged with keywords about the content of these images, further facilitate this process.
To get a feel of the progress made in the image detection front, take a look at the online ‘Image Identification Project’ (https://www.imageidentify.com/) from Wolfram Alpha. The application helps you identify the content of an image by just dragging and dropping the picture into its input box.
Explore what the Facebook sees in your photographs
In an earlier column, we pointed out (http://corporateethos.com/opinion/deep-learning-projects-galore/) how Facebook is able to recognise the different objects in a photograph. For instance, if you find a picture of a huge garden with lots of flowers, trees, sky and so on, Facebook will immediately identify these objects from the digital image of the photograph. When you upload an image, Facebook, using its computer vision technology (AAT-Automatic Alt Text), identifies faces, objects and themes in the photograph and automatically add these tags to it (https://research.fb.com/publications/automatic-alt-text-computer-generated-image-descriptions-for-blind-users-on-a-social-network-service/).
One obvious advantage is that screen readers can read out these tags (in text form) and this could help blind people recognise the objects in an image. This also means, without your knowledge, Facebook can quietly identify your likings. If you are doubtful or getting curious, the Chrome extension, ‘Show Facebook Computer Vision Tags’ (https://github.com/ageitgey/show-facebook-computer-vision-tags) will unravel the power of Facebook in recognising different objects in an image. Don’t worry if you are a Firefox user, the extension is available for this browser too.
To get a feel of the kind of information extracted by Facebook from the images uploaded to it, simply install the extension and then visit a Facebook page with lots of images. Once installed, all the photographs you see on any Facebook page will automatically be overlaid with their tags (see the figure shown above). Of course, this technology is not restricted to Facebook alone. Almost all similar services (like Google, Amazon, etc.) also use this type of technology to extract information from images.
Applications in other domains
This ability to develop a machine learning model to identify patterns in the images could even be used in detecting malignant diseases like skin cancer. To detect skin cancer, a doctor generally looks at your skin to determine if the changes in it are likely to be due to any malignant cells. Now researchers are trying to apply the machine learning techniques to develop a classification system that can detect skin cancer by using the image of the patient’s skin. The study report appeared in the Nature magazine (http://www.nature.com/nature/journal/v542/n7639/full/nature21056.html) describes an attempt in this direction.
Yet another domain in which the image based machine learning technology shows great promise is the insurance industry (http://venturebeat.com/2017/02/04/how-ai-is-changing-the-way-we-assess-vehicle-repair/). The idea is to create a model that can automatically determine vehicle damages using photographs taken at the accident spot. The customer can take photographs from all sides of the vehicle and then upload/feed them directly to the insurance’s image classification model, which in turn will assess the damages.
Computer vision in machine translation
It is common knowledge that the service Google Translate lets you translate text content between many languages. The tool can be used via multiple platforms- browser, mobile apps etc. Apart from text and web pages, Google Translate lets you translate real-time speech also.
This means if you know English and your friend knows only Hindi, you don’t need to worry anymore. Simply invoke the Google Translate app- it will automatically translate your friend’s spoken words to English.
Aside objects- people, tree, animal etc-, computer vision technology can be used to identify images that have letters too. The letters thus identified, can be turned into text and translated into another language. Linking this visual translation technology with a language translation service like Google Translate has tremendous implications.
Now, with the developments in the image recognition technology, you can use the Google Translate app to translate images of text in different languages. It can recognise and extract characters and text in multiple languages inside an image. For instance, if you have an image of text in Japanese, by simply focusing your camera on the image you can get the content translated into any other language on the fly. Though this feature is currently available only for a few selected languages (like English, Japanese, Dutch, etc), in the not too distant feature Google may cover other languages too.
Before winding up, let us point out a fun project that uses the face recognition technology. All of us have a tendency to predict the age of a person by simply looking at her face. Now, do you wish to know how a web application predicts the age of a person from her photograph? If so, just access the Microsoft “How old do I look?” project (http://how-old.net/) and upload the person’s photograph. Of course, it may not be accurate always- but in this regard, we also make mistakes, no?