Share this article.

Cutting edge technology gives new vision for old problems

ArticleQ&ADetailDownload pdf
Professor Roberto Manduchi from the University of California, Santa Cruz, is employing artificial intelligence and augmented reality to develop cutting edge technology designed to assist blind and visually impaired people with access to information.

Loss of vision impacts the vast majority of activities in a person’s daily life, rendering the world less accessible. One major impact is not being able to read, which has a huge range of consequences as so much of our information is communicated through text. Roberto Manduchi is a professor of computer engineering at the University of California, Santa Cruz, who is at the forefront of the development of assistive computer vision technology. His work focusses on providing those with visual impairments better tools for accessing text.

The outcomes of his research have the potential to be life changing for many people. In the United States over 25 million people are affected by visual impairments, ranging from difficulty seeing despite the aid of glasses to total blindness. 1.3 million are legally blind, 290,000 of those are completely without sight. Access to printed information is vital if an individual is to live an active and productive life with a high degree of independence. As well as books, magazines and webpages, essential domestic items such as packaging labels on groceries, medicine packets and utility bills, all need to be deciphered. Text also communicates essential information to us on sign posts, the numbers on office doors and menus in restaurants.

Difficult to see the solution
Computer vision is a form of artificial intelligence that aims to mimic human vision so computers can “see” with technology that acts as an “artificial eye”. Manduchi’s recent work has focused on the use of optical character recognition (OCR) technology. OCR is not new, with the first demonstration of a prototype “reading machine”, specifically designed for use by blind people, in 1946. The earliest commercially available OCR machines required imaging of text on a large scanner. While businesses were quick to widely implement OCR for automatic text processing, the large size and immobility of the systems rendered them of limited day-to-day use for visually-impaired users.

The introduction of smartphones with high-resolution cameras and the increase of processing power within them, have yielded mobile applications designed to assist a visually-impaired user. However, although computer vision technology has been incorporated into some mobile applications, they are still in their infancy. The first mobile OCR system was released in 2005 with several more released since then. However, current applications have many limitations and difficulties for a blind user.

Figure 1. Requiring all four corners of the document to be visible can be a good strategy in many cases (a), but fails in case the edges are not visible (e.g., white paper on white background (b)) and can be too restrictive (an image could be OCR readable even if not all corners are visible (c).) Screenshots from the iOS Prizmo app. The yellow quadrilateral seen in (a) indicates that the image was captured for OCR processing.
Figure 1. Requiring all four corners of the document to be visible can be a good strategy in many cases (a), but fails in case the edges are not visible (e.g., white paper on white background (b)) and can be too restrictive (an image could be OCR readable even if not all corners are visible (c).) Screenshots from the iOS Prizmo app. The yellow quadrilateral seen in (a) indicates that the image was captured for OCR processing.

userstudy_document_guidance5

Assistive technology for those with visual impairments must be user-centred to work successfully. The current lack of well-designed, efficient systems is, in part, due to many previous attempts focusing more on the technological aspects of development rather than the prioritisation of the target user’s needs. Manduchi has learnt from the failures of previous decades and is now userstudy_document_guidance6taking a user-driven research and development approach.

Currently available OCR apps rely on the ability of the user to capture a well-framed and resolved image of the printed text. A user can point a phone camera at a text document and have it read out loud in a few seconds. However, the software needs to first capture an image that it is able to interpret, which is achieved by suitably aiming the camera at the text to be decoded. This can be extremely difficult for a visually impaired user. The camera cannot be so far away that resolution is lost or so close that all of the text will not be in the field of view. In addition, the camera needs to be orientated correctly. Even sighted people can struggle with capturing the right image, so a solution is necessary before a truly effective technology can be created.

Computer vision is a form of artificial intelligence that aims to mimic human vision so computers can ‘see’ with technology that acts as an ‘artificial eye’Quote_brain

Adapting technology
Professor Manduchi has been working on solving this problem, with the aim of creating a system with a user-friendly interface and rapid processing time that would enable “access to printed matter for blind people anywhere, anytime”. With the aid of his student, Michael Cutter, he has designed and tested a prototype iOS app that aids the user in taking an OCR-readable image. To achieve this, they tested two different modalities; auto-shot and guidance based. Auto-shot works with the system constantly scanning the camera feed and will take a picture of the target text once conditions for resolution and framing are detected to be satisfactory. This is the method currently in use in most apps. Guidance based systems employ synthetic speech to direct the user to position the camera to take a shot that can be read by the OCR software. Current incorporation of this into existing apps is very limited.

Their NIH-funded study, with blind participants testing the different approaches, found that a combination of both systems gave the fastest and easiest assistance to the user. The new algorithm for guidance combined with auto-shot was on average 3 times faster than auto-shot alone. Manduchi is now planning on further developing the app based on the results and feedback from visually-impaired users to enable on-the-go OCR of printed documents.

Surprisingly, not only did users find the system much more efficient than existing ones, they also found that after using the app they had learnt to take better images without feedback from the system. Manduchi suggests that users increasingly build up a “mental map” that helps them coordinate the camera in relation to the documents they are imaging.

Smart reading glasses
Currently, there is also no system capable of carrying out mobile OCR of text in complex environments. Therefore, the next challenge Professor Manduchi is embarking on is to incorporate the new OCR technology and guidance system into Augmented Reality (AR) glasses for use in domestic surroundings. This system will be designed to magnify text that the person with limited vision would otherwise be unable to see. Text will be automatically recognised and presented magnified, scrolling across the user’s field of vision. The aim is to enable comfortable reading of not only printed material with complex backgrounds but also text on the screens of appliances and TV subtitles. Importantly, the glasses will also identify text even if the person wouldn’t detect it themselves without magnification.

Another project he is working on, funded by Research to Prevent Blindness, is to improve technology that magnifies text on computer screens. Current systems require continual scrolling using the mouse or trackpad. His research aims to combine eye movement recognition software with the built-in camera in computers to magnify text, in real time, in response to where the person fixes their gaze.

With researchers now recognising the importance of user-driven research and design, the potential of technology is increasingly being unlocked to develop tools that enable people with visual impairments to have more independent access to information in the world around them.

Could you explain what motivated you to apply your expertise in computer engineering to the development of assistive technology for those with visual impairments?
I have always been interested in applications of technology for social good, in particular to support the quality of life of people living with disabilities. Computer vision technology, whose goal is to mimic the functionality of human visual systems, has concrete opportunities to help those who cannot see, or who cannot see well.

Since the successful testing of your new mobile OCR system, have you taken its development further? If so, how?
We have conducted several experiments with improved mobile OCR functionalities that have shown the promise of this new approach. We are now looking at opportunities to integrate this technology into existing commercial systems.

Your research showed that after using the app users learnt to take better images, without feedback from the system. Do you think there is a way that future systems could be designed to make use of this type of learning?
This is, in my opinion, one of the most interesting results of our research. It suggests that our system can be used for self-training, meaning that after using it for a while, users may learn the proprioceptive ability that is necessary to take a good picture of a document without sight. The idea that a sensorial system could provide important missing feedback to blind users and support learning of tasks that normally require visual input, is very exciting. It could be applied in a number of domains, including mobility, orientation, and tactile information access.

What do you feel is currently the most exciting aspect of your research?
Although I love developing technology and working in the lab with my students, the most rewarding aspect of this research has certainly been interacting with blind participants, who very patiently test our prototypes and give us critical feedback and advice. This continuous interaction also helps us to focus on the most pressing problems faced by this community, and to appreciate the wide range of abilities that these individuals have, in spite of their blindness.

How do you see your research employing augmented reality technology developing over the next 5 years?
There is a lot of hype about augmented and virtual reality these days. I believe that wearable camera and augmented reality systems have tremendous opportunities for assistive technology. Realistically, though, these opportunities may only be realised if these systems become mainstream in the computer market, and thus well supported and used by everybody – not just by people with visual impairment.

Research Objectives
Professor Manduchi’s research focuses on assistive technology for persons with visual impairments. Specifically, he is exploring the use of mobile computer vision and wearable sensors for increased spatial awareness and information access. Manduchi is currently collaborating with SKERI, FBK, IBM and CICATA-IPN. He is also a consultant with Aquifi, is on the scientific advisory board of Aira, and served on the BNVT Study Section of NIH.

Funding

  • National Institutes of Health (NIH)
  • Research to Prevent Blindness Foundation
  • National Science Foundation (NSF)

Collaborators
Dr. Michael Cutter, who developed the assisted OCR system during his time as a PhD student in Professor Manduchi’s group.

BIO
Roberto Manduchi is a Professor of Computer Engineering at the University of California, Santa Cruz, which he joined in 2001. Previously, he held positions at Apple, Inc. and at the NASA Jet Propulsion Laboratory. He holds a “Dottorato di ricerca” electrical engineering degree from the University of Padova, Italy.

Contact
University of California Santa Cruz
Baskin School of Engineering
CA 95064
USA

E: [email protected]
T: +1 831 459-1479
W: www.soe.ucsc.edu/people/manduchi

LinkedIn:Roberto Manduchi

Creative Commons Licence

(CC BY-NC-ND 4.0) This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Creative Commons License

What does this mean?
Share: You can copy and redistribute the material in any medium or format
Related posts.