The Minds of the New Machines
By T.J. Becker | Illustration by Jonathan Bartlett | Published March 8, 2018
Machine learning has been around for decades, but the advent of big data and more powerful computers has increased its impact significantly — moving machine learning beyond pattern recognition and natural language processing into a broad array of scientific disciplines.
A subcategory of artificial intelligence, machine learning deals with the construction of algorithms that enable computers to learn from and react to data rather than following explicitly programmed instructions. “Machine-learning algorithms build a model based on inputs and then use that model to make other hypotheses, predictions, or decisions,” explained Irfan Essa, professor and associate dean in Georgia Tech’s College of Computing who also directs the Institute’s Center for Machine Learning.
Established in June 2016, the Center for Machine Learning is comprised of researchers from six colleges and 13 schools at Georgia Tech — a number that keeps growing. “Among our goals is to better coordinate research efforts across campus, serve as a home for machine learning leaders, and train the next generation of leaders,” Essa said, referring to Georgia Tech’s new Ph.D. program in machine learning.
Within the center, researchers are striving to advance both basic and applied science. “For example, one foundational goal is to really understand deep learning at its core,” Essa said. “We want to develop new theories and innovative algorithms, rather than just using deep learning as a black box for inputs and outputs.” And on the applied research front, the center has seven focal areas: health care, education, logistics, social networks, the financial sector, information security, and robotics.
Automating the art of interruption
Today’s robots are heavily programmed and don’t learn very much. This can work well enough in factories, but if robots are going to operate in offices, schools, and homes, they must be able to adapt to certain environments — and the specifications of particular users, points out Sonia Chernova, an assistant professor in Georgia Tech’s School of Interactive Computing, who specializes in machine learning and human-robot interactions.
Among projects in Chernova’s lab, graduate students are investigating interruptibility, an area where little robotics research has been conducted.
Interactive robots are more effective if they can occasionally ask for direction or verify whether something should be done, Chernova points out. “Suppose you have a robot that cleans houses. Not everyone has the same type of house — and not everyone wants their house cleaned the same way.”
“Yet while humans are amazingly good at knowing when to interrupt, robots can be pretty rude. If robots perceive a human in the environment, they will approach, regardless of what the person is doing. And that interruption can take a toll if the person is engaged in a challenging task.”
With that in mind, the researchers are looking at when robots should interrupt a human and who they should interrupt if there are multiple people nearby.
To collect training and testing data for the project, the researchers asked five people to engage in a series of activities such as drinking coffee, talking, or working on laptops. During each instance, the robot moved through a series of waypoints that enabled it to observe the group from different perspectives. Based on such social cues as body position and face gaze, along with sounds and what kind of objects the humans were holding, the robot determined on a scale of one to four how interruptible people were.
Four different types of temporal machine-learning models were tested: three types of conditional random fields (CRFs) and a hidden Markov model (HMM). Of these, the researchers showed that latent dynamic CRFs did the best job of predicting interruptibility. CRFs were expected to have an upper hand, because they are discriminative models and perform better for classification tasks.
“Yet we were surprised how badly the HMMs did, because they have been used in a lot of prior work,” said graduate student Siddhartha Banerjee, who led the study. “The LDCRFs, which have hidden states, are useful in modeling unknown dependencies between what you observe and what you are trying to classify,” he explained. “So they can help you classify a sub-sequence of whatever movements are observed, and based on the classification of the sequences, you classify the actual interruptibility state.”
Another novel finding was using object labels, which enabled the robot to identify objects either being used by a person or located near them. These provided an extra boost of performance — up to 10 percent more accuracy. “If a person is holding a cell phone versus a cup of coffee, it can help suggest what activity they are engaged in and how interruptible they might be,” Chernova said. “This may sound intuitive, but it’s never been demonstrated before.”
Moving forward, the researchers now are trying to gauge the impact of having interruption-savvy robots around. Among variables, they are measuring how much better these interactive robots might perform, and how much longer people might need to complete tasks they were doing after being interrupted.
Funded by NASA, the research could have implications in joint human-robot space missions. “Figuring out when to interrupt a busy astronaut is an important consideration,” Chernova pointed out.
Smarter, safer cars
Operating safely in the external environment is especially critical for robots on wheels, and researchers in IC and the Daniel Guggenheim School of Aerospace Engineering (AE) are investigating advanced perception and control for autonomous cars.
In one project, Assistant Professor Byron Boots (IC), professor James Rehg (IC), Assistant Professor Evangelos Theodorou (AE), and graduate students Grady Williams, Nolan Wagener, Brian Goldfain, and Paul Drews are developing technologies that enable robotic cars to perform better on rough terrain.
The small autonomous cars, known as AutoRally Robots, are about three feet long and equipped with electric motors, onboard computing resources, cameras, GPS, and a variety of sensors. “They have to be able to see the environment, calculate information about how they relate to the environment, and then make a sequence of decisions extremely quickly in order to drive as fast as possible and stay upright,” said Boots, director of Georgia Tech’s Robot Learning Lab. No easy feat, because the surface of the dirt test track causes the cars to drift and slide and have a lot of nonlinear interactions, which are tough to model.
The team combined traditional machine-learning algorithms with a control system that enabled the robotic cars to continuously sample data from sensors and generate optimal trajectories for both performance and control. “The algorithm allows the robot to predict the future,” explained Williams, lead author on two published papers. “Assessing its state, the robot tries to predict what will happen if it tries a set of actions — something it does millions of times per second.”
In a more recent advance, the team replaced the physics-based model (a linear regression approach) with multi-layer neural networks to handle the prediction. “This is a significant step forward because it enables a purely data-driven approach to model learning,” said Williams. “Instead of writing down equations of motions, we apply a deep neural network, and it learns the forward model from data.”
With this approach, the researchers achieved better results, increasing speed without sacrificing control. The team’s most recent paper was presented in early 2017 at the IEEE’s International Conference on Robotics and Automation in Sweden; out of 900 submissions, it was one of five nominees for the best paper award.
The researchers are now upping the ante as they test the AutoRally cars racing against each other instead of driving solo on the track. “This becomes a much more complicated problem because not only does the car have to reason with itself and its own interactions with the environment, but also about other agents in the same environment,” said Boots.
Sponsored by the National Science Foundation (NSF), the research has applications both in the automotive industry and the military.
“When it comes to machine learning, there hasn’t been a huge amount of work done on how to apply artificial intelligence to physical systems that must go out in the world and learn to act in an optimal way,” Boots added. “We’re trying to use machine learning to take robotics to the next level.”
Mind over matter — sort of
Another relatively new sphere for machine learning is brain machine interfaces (BMI), a field where researchers are trying to restore movement for people who are paralyzed due to spinal cord injuries or diseases like amyotrophic lateral sclerosis (ALS).
“When paralyzed people think about moving, nothing happens because there is an interruption between their brains and bodies,” explained Chethan Pandarinath, a new assistant professor in Georgia Tech’s and Emory University’s joint Wallace H. Coulter Department of Biomedical Engineering. “But they can still think about movement, which generates electrical activity in the brain as millions of neurons start talking to each other. Our goal is to decode this activity and do something useful for people, such as restoring their ability to drink by being able to control a robotic arm and pick up a glass of water.”
“When paralyzed people think about moving, nothing happens... but they can still think about movement, which generates electrical activity in the brain as millions of neurons start talking to each other.”
In an ongoing multi-institutional clinical trial known as BrainGate, Pandarinath and other researchers have created a high-performance BMI for communications that enables people with paralysis to type words and messages. To develop the system, small electrode arrays the size of baby aspirins were implanted into the brains of three research participants with paralysis. These electrodes allowed the system to record and decode electrical activity in real time as the individuals thought about moving, which allowed them to move and click a cursor on a computer screen. This sort of cerebral typing produced six to eight words per minute — three times faster than any previous system.
The key to BrainGate’s success is running two algorithms in parallel. A Kalman filter decodes what direction the cursor should move, while a hidden Markov model decodes whether the person wants to stop and click or continue to move the cursor. Results were extremely accurate, with people clicking on their intended target more than 98 percent of the time.
Although a big step forward, the point-and-click control of the computer cursor is fairly simple compared to all the things people do when they move, Pandarinath points out. “Our goal is to accurately manipulate more complex devices under brain control, such as robotic arms or ultimately the person’s own arm.”
Deep-learning techniques may pave the way for such devices. In collaboration with research scientist David Sussillo at Google Brain, Pandarinath’s team is trying to understand how recurrent neural networks (deep-learning algorithms used in language translation and speech processing) could be adapted for neuroscience.
“With our current electrode arrays we’re only recording 100 to 200 neurons, which is a small fraction of all the millions of neurons involved when people think about moving,” Pandarinath said. “Yet from this tiny snapshot, deep learning tools might help us model what the entire network is doing, which could help us restore movement.”
Paint by networks
Deep learning, a machine-learning technique that resembles the human brain and uses layered, hierarchical algorithms rather than shallow ones, has been getting a lot of hype. Although these large neural networks are hardly a new phenomenon, they are becoming much more mainstream thanks to more affordable, scalable computing power.
“It’s what almost all computer vision projects are built on, and it’s replacing parts of computer graphics,” noted James Hays, an associate professor in IC. “It’s the better machine-learning tool for interpreting images.”
“Deep learning means you don’t need to hand craft feature extraction anymore,” he explained. “Instead, part of the learning process is for the machine to automatically figure out the right representation for your data.”
Leveraging deep-learning tools, researchers in Hays’ lab are advancing image synthesis with a real-time technology that produces realistic color images from rough sketches. Dubbed “Scribbler,” the project is funded by NSF and Adobe, and its secret sauce involves generative adversarial networks (GANs).
“GANs are a deep-learning tool that emerged in the last couple of years to train neural networks to create images,” said Hays, noting that most deep-learning techniques are for classifying images rather than creating them. With GANs, the learning process becomes somewhat of a game: While one network uses training data to produce images, another deep-learning network, known as a discriminator network, is trying to tell if the outputs from the first network are real or fake.
Scribbler uses technology known as generative adversarial networks (GANs) to produce realistic color images from rough sketches.
The research team, which includes Patsorn Sangkloy, Jingwan Yu, Chen Fang, and Fisher Yu, is one of the first to demonstrate that an adversarial deep architecture can generate realistic images from imperfect sketches and sparse color strokes. “The networks fill in details and shading in a convincing way,” Hays explained. “For example, if you put a little red on top of an object, the network will take that suggestion and make the object red in an intelligent way — meaning it will propagate that color where it’s appropriate and not bleed through to other objects.”
The researchers are also working on texture suggestions for Scribbler: Users can drag a swatch of leather or fabric over an object within a sketch, and the network will synthesize that texture into the final image.
Scribbler’s potential applications include architectural previews, helping police artists make their sketches more realistic, and improving product browsing for online shoppers. But ultimately, Hays envisions even broader use: image-synthesis tools for non experts. “Granted, current computer graphics technology can produce beautiful images, but it’s extremely challenging, and only accessible to trained people,” Hays said. “We want to make this kind of image creation as easy as text creation.”
Having demonstrated Scribbler can work in three domains (rooms, cars, and faces), the researchers are now trying to scale the technology for other categories and less accurate sketches. “Right now, we need pretty accurate input sketches to produce a realistic image,” Hays said. “The goal is for Scribbler to take bad drawings and produce good photos.”
In a related project, Hays’ team is using crowdsourcing to build a large database of paired sketches and photos and then create a deep learning system to retrieve the best photos that match what someone has drawn. The first and largest of its kind, this database could be used as training data for image synthesis — important because deep learning is data hungry, Hays noted. “It requires a lot of computation and training data before it’s an advantageous strategy.”
Cross-pollination at work
Collaboration is a hallmark of Georgia Tech’s machine-learning community, researchers say, which is important in moving theoretical advances into the real world.
Wenjing Liao, a recent recruit from Johns Hopkins University, says she was attracted by Georgia Tech’s strength in science and engineering as well as its interdisciplinary environment. As an assistant professor in the School of Mathematics, Liao is pioneering multiscale techniques to reduce high-dimensional data to low-dimensional objects, which would improve accuracy, save storage space, and reduce computing time in machine learning. “Many interesting machine learning problems arise in biology, electrical engineering, and chemical engineering,” she said. “I knew I would find interesting applications for my theory by talking to people in other departments here — and the needs in applications will inspire me to build good models for theoretical analysis.”
“The Center for Machine Learning has heavy involvement from many, many departments,” said Mark Davenport, associate professor in Georgia Tech’s School of Electrical and Computer Engineering (ECE). “It includes not only dedicated machine-learning researchers but also researchers from other areas of science and engineering who realize machine learning can help solve a problem they want to look at. I think this gives our machine learning initiative a unique flavor — it’s a very fun, collaborative environment.”
An expert in signal processing and machine learning, Davenport spends a lot of time investigating the mathematics of algorithms to understand why they work and how well they work, and identify practical shortcomings. “There are usually significant gaps between the abstract problem formulation and what you encounter in a real-world application,” he said.
He points to an activity tracker for wheelchair users, a project led by Sharon Sonenblum, a senior research scientist in the George W. Woodruff School of Mechanical Engineering (ME) and Stephen Sprigle, a professor in ME and the School of Industrial Design. The researchers are trying to prevent pressure ulcers, which are a leading cause of death for people with spinal cord injuries. Their inexpensive sensor system (think Fitbit) can be installed beneath seat cushions of wheelchairs to monitor movement and weight-shifting in real time. Powered by signal processing and machine-learning classification algorithms, the system will remind people to move, analyze their daily activity — and provide suggestions for improving their activity levels.
Yet designing machine-learning algorithms that are robust outside of the lab can be tricky. For example, during field-testing on an alpha version of the device, data seemed to indicate that one person was getting in and out of his wheelchair at odd hours of the night. As it turned out, instead of an insomniac human, the sensor was detecting a cat jumping into the chair.
Having solved such hiccups, the researchers have refined the activity tracker and are getting ready to conduct longer, more extensive testing.
“What’s interesting about this phase is we’ll be able to ask scientific questions that couldn’t be answered before, such as what kind of behavior actually helps prevent pressure ulcers from forming,” Davenport said. “The data we collect will give doctors a better idea of how to really help their patients.”
In another project, Davenport, ECE Professor David Anderson, and graduate students are working with researchers at the University of Utah to develop an audio-based monitoring system for construction sites. The NSF-funded project would track the location of key equipment across time, recognize what activities they were engaged in, and predict future actions.
“What’s especially challenging here is dealing with the combination of different machines operating at the same time, identifying what they’re doing, and then summarizing it into one coherent picture that is useful for a human project manager,” Davenport said. To achieve this, the researchers are using a combination of techniques from machine learning, such as matrix factorization, to separate out the underlying acoustical components, along with signal processing methods that leverage the presence of multiple microphones. Initial results have been promising, and researchers are now combining multiple sensors and machines. If successful, the audio-monitoring system would be an advance in gleaning information that could make the construction industry more efficient and reduce costs.
A fundamental problem for identifying anomalies in data or events, change-point detection has been a longstanding topic in statistics and signal processing circles. Yet there is a gap between theory and real-world applications, partly due to the complex nature of big data.
That’s something Yao Xie is trying to change. The Harold R. and Mary Anne Nash Assistant Professor in Georgia Tech’s H. Milton Stewart School of Industrial and Systems Engineering, Xie is not only developing theoretical foundations but also algorithms for new types of change-point detection problems related to modern applications. For example, as the recipient of an NSF Career Award, Xie is detecting changes in streaming data over dynamic networks.
And in another NSF-funded project Xie is collaborating with researchers at the University of Utah on a seismic sensor network that could shed new light on subsurface imaging. The researchers have deployed hundreds of geophones around the Old Faithful geyser in Yellowstone Park, and Xie’s team has developed algorithms to process data gleaned from this sensor network about the geyser’s eruptions in real time. “We’re able to see interesting features, such as the moment of change and the direction of the change to happen,” Xie said. “In addition to better understanding the geophysical dynamics of the geyser, the algorithms could be applied to other scenarios, such as monitoring the health of bridges and buildings in Smart Cities applications.”
In other projects, Xie is working with:
- The Atlanta Police Department to develop machine-learning algorithms that can help process police reports more efficiently and detect hidden correlations among different crime incidences. That’s no easy undertaking, because the reports are written by different officers, usually in haste, and there’s no standard format. The researchers have already demonstrated promising results, using hand-picked data for which results are already known, and now are scaling the algorithms to handle continuous streaming data.
- Researchers in Georgia Tech’s School of Materials Science and Engineering to detect change points in a sequence of diffraction patterns for materials that are generated in real time by electron beams. “In this context, change-point is linked to phase transition of the material,” Xie said. “Being able to detect such transition quickly in-situ is critical in relating the material property to its corrosion susceptibility.”
Within the flooded information environment, decision-makers need better tools for monitoring and identifying suspicious activity. In response, researchers at the Georgia Tech Research Institute (GTRI) are creating a high-performance system that automatically detects real-world events across multiple news sources — and ultimately could be used to forecast future behavior.
Known as Multi-Source Anticipatory Intelligence, or MANIC for short, the goal is to combine a variety of content (text, photos, satellite imagery, readers’ comments, etc.) from news sources, extract information from it, and analyze the connectivity structure of the articles, explained Zsolt Kira, branch chief of machine learning and analytics at GTRI. The system would not only be a boon for fake news detection, but also could advance geopolitical analysis.
Led by Erica Briscoe, chief scientist in GTRI’s Aerospace, Transportation & Advanced Systems Laboratory, the team is using deep-learning techniques in tandem with factor graphs, a technique for modeling probability distributions. The goal is twofold: to track changes in news over time, and to enable one method to inform the other. The bottom deep-learning-based neural network takes raw data and extracts pieces of information while the top-level factor graph has the higher-level task of depicting the sought-after inference, such as fake news, Kira explained. “By enabling one method to inform the other, MANIC will be able to optimize and update the low-level, deep-learning-based processing so that it can do a better job of processing at the graph level.”
In another machine-learning project, GTRI researchers are trying to improve autonomous vehicles’ ability to detect pedestrians. In lieu of a single modality neural network architecture, they are developing fusion models for multiple sensors (in this case color images and LIDAR). Yet there’s a trade-off between the size of the neural networks and the accuracy, depending on where the two different modalities are fused, Kira explained.
In past work, people have explored two extremes: combining different sensor modalities from the get-go and then running answers through a deep learning network, and processing individual modalities and then combining those answers. “Early fusion gives you leaner, smaller networks, but they are less accurate,” Kira said. “The second approach is more accurate but involves much larger networks that are harder to run on smaller processors.”
The GTRI team has demonstrated a middle-ground approach. In a nutshell, a little processing is done on the individual modalities; then modalities are combined and the remaining processing is done. This provides optimal speed without sacrificing accuracy, Kira said. More good news, the fusion method eliminates the need for pre-training these networks on very large datasets.
GTRI bolsters Georgia Tech’s bench strength in machine learning with its “soup-to-nuts” capabilities, points out Essa. “GTRI can do everything from basic research to actual deliverables for government and industry partners.”
“We are in the era of aggregation where we have many technologies to collect and aggregate data,” Essa added. “We need to move towards an era of sense-making, where we find inferences from these data — and machine learning is the key to this transition.”
T.J. Becker is a freelance writer based in Michigan. She writes about business and technology issues.