Posted on Leave a comment

Nvidia’s Next GPU Shows That Transformers Are Transforming AI

The neural network behind big language processors is creeping into other corners of AI

Transformers, the type of neural network behind OpenAI’s GPT-3 and other big natural-language processors, are quickly becoming some of the most important in industry, and they are likely to spread to other—perhaps all—areas of AI. Nvidia’s new Hopper H100 is proof that the leading maker of chips for accelerating AI is a believer. Among the many architectural changes that distinguish the H100 from its predecessor, the A100, is a “transformer engine.” Not a distinct part of the new hardware exactly, it’s a way of dynamically changing the precision of the calculations in the cores to speed up the training of transformer neural networks.

“One of the big trends in AI is the emergence of transformers,” says Dave Salvator, senior product manager for AI inference and cloud at Nvidia. Transformers quickly took over language AI, because their networks pay “attention” to multiple sentences, enabling them to grasp context and antecedents. (The T in the benchmark language model BERT stands for “transformer” as it does in the occasionally insulting GPT-3.)

“We are trending very quickly toward trillion parameter models” —Dave Salvator, Nvidia

But more recently, researchers have been seeing an advantage to applying that same sense of attention to vision and other models dominated by convolutional neural networks. Salvator notes that more than two-thirds of papers about neural networks in the last two years dealt with transformers or their derivatives. “The number of challenges transformers can take on continues to grow,” he says.

However, transformers are among the biggest neural-network models in terms of the number of parameters involved. And they are growing much faster than other models. “We are trending very quickly toward trillion-parameter models,” says Salvator. Nvidia’s analysis shows the training needs of transformer models growing 275-fold every two years, while the trend for all other models is 8-fold growth every two years. Bigger models need more computational resources especially for training, but also for operating in real time as they often need to do. Nvidia developed the transformer engine to help keep up.

 The computational needs of transformers are growing more rapidly than those of other forms of AI. Those are growing really fast, too, of course.NVIDIA

The transformer engine is really software combined with new hardware capabilities in Hopper’s tensor cores. These are the units dedicated to carrying out machine learning’s bread-and-butter calculation—matrix multiply and accumulate. Hopper has tensor cores capable of computing with floating-point numbers of a variety of precision—from 64-bit down to 8-bit. The A100’s cores were designed for floating-point numbers only as short as 16 bits. But the trend in AI computing has been toward developing neural nets that lean on the lowest precision that will still yield an accurate result. The smaller formats compute faster and more efficiently, and they require less memory and memory bandwidth. The addition of 8-bit floating-point units in the H100 leads to a significant speedup—double the throughput compared to its 16-bit units.

The transformer engine’s secret sauce is its ability to dynamically choose what precision is needed for each layer in the neural network at each step in training a neural network. The least-precise units, the 8-bit floating point, can speed through their computations, but then produce 16-bit or 32-bit sums for the next layer if that’s the precision needed there. The Hopper goes a step further, though. Its 8-bit floating-point units can do their matrix math with either of two forms of 8-bit numbers.

To understand why that’s helpful, you might need a quick lesson in the structure of floating-point numbers. This format represents numbers using some of the bits for the exponent, some for the mantissa, and one for the sign. The more bits you have representing the exponent, the greater the range of numbers you can express. The more bits in the mantissa, the greater the precision of those numbers. The standard 16-bit floating-point format (IEEE 754-2008) demands 5 bits of exponent and 10 bits of mantissa, along with the sign bit. Seeking to reduce data-storage requirements and speed machine learning, makers of AI accelerators recently adopted bfloat-16, which trades three bits of mantissa for an added exponent, giving it the same range as a 32-bit number.

Nvidia has taken that trade-off further. “One of the unique things we found when you get to [8-bit] is that there really isn’t a one size fits all format that we were confident would work,” says Jonah Alben, Nvidia’s senior vice president of GPU engineering. So Hopper’s 8-bit units can work with either 5 bits of exponent and two of mantissa (E5M2) when range is important or 4 bits of exponent and three of mantissa (E4M3) when precision is key. The transformer engine orchestrates what’s needed on the fly to speed training. We “embody our experience testing transformers into this so that it knows how to make the right decisions,” says Alben.

In practice, this usually means using different types of floating-point formats for the different parts of a training task. Generally, training a neural network involves exposing it to lots of data (forward inferencing), measuring how bad the network is at doing its task on that data, and then adjusting the network parameters, layer-by-layer backwards through the network to improve it (back propagation). Wash, rinse, repeat. Generally, back propagation needs greater precision, so the E4M3 format might be favored there, while the inferencing (forward) step favors the E5M3’s range.

Nvidia is not alone in pursuing this approach. At the IEEE/ACM International Symposium on Computer Architecture in 2021IBM researchers presented an accelerator called RaPiD that used the E5M2/E4M3 scheme for training, as well. A system of four such chips delivered training speedups between 10 and 100 percent, depending on the neural network involved.

Nvidia’s Hopper will be available in the third quarter of 2022.

Posted on Leave a comment

An extreme form of encryption could solve big data’s privacy problem

Fully homomorphic encryption allows us to run analysis on data without ever seeing the contents. It could help us reap the full benefits of big data, from fighting financial fraud to catching diseases early

LIKE any doctor, Jacques Fellay wants to give his patients the best care possible. But his instrument of choice is no scalpel or stethoscope, it is far more powerful than that. Hidden inside each of us are genetic markers that can tell doctors like Fellay which individuals are susceptible to diseases such as AIDS, hepatitis and more. If he can learn to read these clues, then Fellay would have advance warning of who requires early treatment.

This could be life-saving. The trouble is, teasing out the relationships between genetic markers and diseases requires an awful lot of data, more than any one hospital has on its own. You might think hospitals could pool their information, but it isn’t so simple. Genetic data contains all sorts of sensitive details about people that could lead to embarrassment, discrimination or worse. Ethical worries of this sort are a serious roadblock for Fellay, who is based at Lausanne University Hospital in Switzerland. “We have the technology, we have the ideas,” he says. “But putting together a large enough data set is more often than not the limiting factor.”

Fellay’s concerns are a microcosm of one of the world’s biggest technological problems. The inability to safely share data hampers progress in all kinds of other spheres too, from detecting financial crime to responding to disasters and governing nations effectively. Now, a new kind of encryption is making it possible to wring the juice out of data without anyone ever actually seeing it. This could help end big data’s big privacy problem – and Fellay’s patients could be some of the first to benefit.

It was more than 15 years ago that we first heard that “data is the new oil”, a phrase coined by the British mathematician and marketing expert Clive Humby. Today, we are used to the idea that personal data is valuable. Companies like Meta, which owns Facebook, and Google’s owner Alphabet grew into multibillion-dollar behemoths by collecting information about us and using it to sell targeted advertising.

Data could do good for all of us too. Fellay’s work is one example of how medical data might be used to make us healthier. Plus, Meta shares anonymised user data with aid organisations to help plan responses to floods and wildfires, in a project called Disaster Maps. And in the US, around 1400 colleges analyse academic records to spot students who are likely to drop out and provide them with extra support. These are just a few examples out of many – data is a currency that helps make the modern world go around.

Getting such insights often means publishing or sharing the data. That way, more people can look at it and conduct analyses, potentially drawing out unforeseen conclusions. Those who collect the data often don’t have the skills or advanced AI tools to make the best use of it, either, so it pays to share it with firms or organisations that do. Even if no outside analysis is happening, the data has to be kept somewhere, which often means on a cloud storage server, owned by an external company.

You can’t share raw data unthinkingly. It will typically contain sensitive personal details, anything from names and addresses to voting records and medical information. There is an obligation to keep this information private, not just because it is the right thing to do, but because of stringent privacy laws, such as the European Union’s General Data Protection Regulation (GDPR). Breaches can see big fines.

Over the past few decades, we have come up with ways of trying to preserve people’s privacy while sharing data. The traditional approach is to remove information that could identify someone or make these details less precise, says privacy expert Yves-Alexandre de Montjoye at Imperial College London. You might replace dates of birth with an age bracket, for example. But that is no longer enough. “It was OK in the 90s, but it doesn’t really work any more,” says de Montjoye. There is an enormous amount of information available about people online, so even seemingly insignificant nuggets can be cross-referenced with public information to identify individuals.

One significant case of reidentification from 2021 involves apparently anonymised data sold to a data broker by the dating app Grindr, which is used by gay people among others. A media outlet called The Pillar obtained it and correlated the location pings of a particular mobile phone represented in the data with the known movements of a high-ranking US priest, showing that the phone popped up regularly near his home and at the locations of multiple meetings he had attended. The implication was that this priest had used Grindr, and a scandal ensued because Catholic priests are required to abstain from sexual relationships and the church considers homosexual activity a sin.

A more sophisticated way of maintaining people’s privacy has emerged recently, called differential privacy. In this approach, the manager of a database never shares the whole thing. Instead, they allow people to ask questions about the statistical properties of the data – for example, “what proportion of people have cancer?” – and provide answers. Yet if enough clever questions are asked, this can still lead to private details being triangulated. So the database manager also uses statistical techniques to inject errors into the answers, for example recording the wrong cancer status for some people when totting up totals. Done carefully, this doesn’t affect the statistical validity of the data, but it does make it much harder to identify individuals. The US Census Bureau adopted this method when the time came to release statistics based on its 2020 census.

Trust no one

Still, differential privacy has its limits. It only provides statistical patterns and can’t flag up specific records – for instance to highlight someone at risk of disease, as Fellay would like to do. And while the idea is “beautiful”, says de Montjoye, getting it to work in practice is hard.

There is a completely different and more extreme solution, however, one with origins going back 40 years. What if you could encrypt and share data in such a way that others could analyse it and perform calculations on it, but never actually see it? It would be a bit like placing a precious gemstone in a glovebox, the chambers in labs used for handling hazardous material. You could invite people to put their arms into the gloves and handle the gem. But they wouldn’t have free access and could never steal anything.

This was the thought that occurred to Ronald Rivest, Len Adleman and Michael Dertouzos at the Massachusetts Institute of Technology in 1978. They devised a theoretical way of making the equivalent of a secure glovebox to protect data. It rested on a mathematical idea called a homomorphism, which refers to the ability to map data from one form to another without changing its underlying structure. Much of this hinges on using algebra to represent the same numbers in different ways.

Imagine you want to share a database with an AI analytics company, but it contains private information. The AI firm won’t give you the algorithm it uses to analyse data because it is commercially sensitive. So, to get around this, you homomorphically encrypt the data and send it to the company. It has no key to decrypt the data. But the firm can analyse the data and get a result, which itself is encrypted. Although the firm has no idea what it means, it can send it back to you. Crucially, you can now simply decrypt the result and it will make total sense.

“The promise is massive,” says Tom Rondeau at the US Defense Advanced Research Projects Agency (DARPA), which is one of many organisations investigating the technology. “It’s almost hard to put a bound to what we can do if we have this kind of technology.”

In the 30 years since the method was proposed, researchers devised homomorphic encryption schemes that allowed them to carry out a restricted set of operations, for instance only additions or multiplications. Yet fully homomorphic encryption, or FHE, which would let you run any program on the encrypted data, remained elusive. “FHE was what we thought of as being the holy grail in those days,” says Marten van Dijk at CWI, the national research institute for mathematics and computer science in the Netherlands. “It was kind of unimaginable.”

One approach to homomorphic encryption at the time involved an idea called lattice cryptography. This encrypts ordinary numbers by mapping them onto a grid with many more dimensions than the standard two. It worked – but only up to a point. Each computation ended up adding randomness to the data. As a result, doing anything more than a simple computation led to so much randomness building up that the answer became unreadable.

In 2009, Craig Gentry, then a PhD student at Stanford University in California, made a breakthroughHis brilliant solution was to periodically remove this randomness by decrypting the data under a secondary covering of encryption. If that sounds paradoxical, imagine that glovebox with the gem inside. Gentry’s scheme was like putting one glovebox inside another, so that the first one could be opened while still encased in a layer of security. This provided a workable FHE scheme for the first time.

Workable, but still slow: computations on the FHE-encrypted data could take millions of times longer than identical ones on raw data. Gentry went on to work at IBM, and over the next decade, he and others toiled to make the process quicker by improving the underlying mathematics. But lately the focus has shifted, says Michael Osborne at IBM Research in Zurich, Switzerland. There is a growing realisation that massive speed enhancements can be achieved by optimising the way cryptography is applied for specific uses. “We’re getting orders of magnitudes improvements,” says Osborne.

IBM now has a suite of FHE tools that can run AI and other analyses on encrypted data. Its researchers have shown they can detect fraudulent transactions in encrypted credit card data using an artificial neural network that can crunch 4000 records per second. They also demonstrated that they could use the same kind of analysis to scour the encrypted CT scans of more than 1500 people’s lungs to detect signs of covid-19 infection.

Also in the works are real-world, proof-of-concept projects with a variety of customers. In 2020, IBM revealed the results of a pilot study conducted with the Brazilian bank Banco Bradesco. Privacy concerns and regulations often prevent banks from sharing sensitive data either internally or externally. But in the study, IBM showed it could use machine learning to analyse encrypted financial transactions from the bank’s customers to predict if they were likely to take out a loan. The system was able to make predictions for more than 16,500 customers in 10 seconds and it performed just as accurately as the same analysis performed on unencrypted data.

Suspicious activity

Other companies are keen on this extreme form of encryption too. Computer scientist Shafi Goldwasser, a co-founder of privacy technology start-up Duality, says the firm is achieving significantly faster speeds by helping customers better structure their data and tailoring tools to their problems. Duality’s encryption tech has already been integrated into the software systems that technology giant Oracle uses to detect financial crimes, where it is assisting banks in sharing data to detect suspicious activity.

Still, for most applications, FHE processing remains at least 100,000 times slower compared with unencrypted data, says Rondeau. This is why, in 2020, DARPA launched a programme called Data Protection in Virtual Environments to create specialised chips designed to run FHE. Lattice-encrypted data comes in much larger chunks than normal chips are used to dealing with. So several research teams involved in the project, including one led by Duality, are investigating ways to alter circuits to efficiently process, store and move this kind of data. The goal is to analyse any FHE-encrypted data just 10 times slower than usual, says Rondeau, who is managing the programme.

Even if it were lightning fast, FHE wouldn’t be flawless. Van Dijk says it doesn’t work well with certain kinds of program, such as those that contain branching logic made up of “if this, do that” operations. Meanwhile, information security researcher Martin Albrecht at Royal Holloway, University of London, points out that the justification for FHE is based on the need to share data so it can be analysed. But a lot of routine data analysis isn’t that complicated – doing it yourself might sometimes be simpler than getting to grips with FHE.

For his part, de Montjoye is a proponent of privacy engineering: not relying on one technology to protect people’s data, but combining several approaches in a defensive package. FHE is a great addition to that toolbox, he reckons, but not a standalone winner.

This is exactly the approach that Fellay and his colleagues have taken to smooth the sharing of medical data. Fellay worked with computer scientists at the Swiss Federal Institute of Technology in Lausanne who created a scheme combining FHE with another privacy-preserving tactic called secure multiparty computation (SMC). This sees the different organisations join up chunks of their data in such a way that none of the private details from any organisation can be retrieved.

In a paper published in October 2021, the team used a combination of FHE and SMC to securely pool data from multiple sources and use it to predict the efficacy of cancer treatments or identify specific variations in people’s genomes that predict the progression of HIV infection. The trial was so successful that the team has now deployed the technology to allow Switzerland’s five university hospitals to share patient data, both for medical research and to help doctors personalise treatments. “We’re implementing it in real life,” says Fellay, “making the data of the Swiss hospitals shareable to answer any research question as long as the data exists.”

If data is the new oil, then it seems the world’s thirst for it isn’t letting up. FHE could be akin to a new mining technology, one that will open up some of the most valuable but currently inaccessible deposits. Its slow speed may be a stumbling block. But, as Goldwasser says, comparing the technology with completely unencrypted processing makes no sense. “If you believe that security is not a plus, but it’s a must,” she says, “then in some sense there is no overhead.”

NewScientist

6 April 2022

By Edd Gent