Presented by SambaNova Systems
To stay on top of cutting-edge AI innovation, it’s time to upgrade your technology stack. In this VB Live event, you’ll learn how innovations in NLP, visual AI, recommendation models, and scientific computing are pushing computer architecture to the cutting edge.
Access free on demand here.
Innovations in AI and machine learning demand more compute power than ever, but today’s chips can’t keep up with the demand. Moore’s law — the idea that computer chips will continuously shrink and get cheaper, even as they deliver greater and greater power — has hit a wall. The question becomes how to leverage AI innovations cost-effectively, and keep pace with the growing demand for compute power.
Beyond Moore’s Law
“The decline in Moore’s Law, the transistor and power issues, is starting to make itself felt in the industry,” says Alan Lee, corporate vice president and head of research and advanced development at AMD. “The new technologies we’re investigating center around modular technologies and 3D stacking, and heterogeneous systems.”
Rather than thinking of the traditional multicore, it’s about understanding how the differences in these units can be combined. Whether that’s by stacking them, having them together in the same die, or on the same multi chip module or system, the issue is bringing together the types of compute that one needs, for AI, ML, HPC applications, and types of compute science in the right ratios to achieve both performance and efficiency.
“The key as Moore’s Law slows down is to become more efficient,” says Kunle Olukotun, co-founder and chief technologist at SambaNova Systems. “We all know that efficiency comes from specialization. But in the world of machine learning, you can’t just take an algorithm and cast it into silicon.”
The key is getting efficiency while maintaining the flexibility required to support the innovations in machine learning algorithms. Machine learning application developers are continuing to change their algorithms, and to capture that requires a substrate that provides both efficiency and flexibility.
“What you need is an architecture which is more focused on how you support the execution requirements of the data flow within the application,” he says.
The characteristics of ML applications are unique, he points out, in that they’re a bunch of kernels connected by different communication parts, depending on the computational graph of the particular algorithm. That requires an architecture that can support that natively and provide very efficient data flow execution from the chip level all the way to the node level to the data center level, to exploit the characteristics of the application.
The state of AI and ML innovation
Machine learning models have evolved over the last few years from convolutional models, to recurrent neural net models, to matrix multiplier dominated models, dense models to sparse models, says Olukotun. Matrix multiply will always be a core component, and that evolution shows no signs of stopping. The challenge is continuing to put those pieces together and be able to flexibly support that kind of innovation.
“We’re in an evolutionary stage now,” Lee agrees. “We hit the next big plateau in ML, and the closer you can get to mapping or modeling a particular type of neural network or an equivalence class in neural networks, the more performance you’re going to see.”
He adds that we also have to keep in mind that there’s a large body of scientific and industrial exploration — hundreds of years of work on algorithms that also need to be brought to bear on similar problems.
“Many ML problems can inform high-performance computing problems and vice versa,” he says. “It’s certainly important to push the boundaries in specific areas, but it’s also important not to forget the past and realize that mathematical models in many cases can inform and be informed by this new branch of science enabled by big data, higher performance machines, and new ML algorithms.”
Olukotun points to the interaction between high performance computing and ML. Right now the scientists doing traditional simulation and engineering computations are reaching the limits of what they can do within a particular time frame, whether it’s simulating materials or trying to understand how turbulent flow works in jet engines. They’re looking for the marriage of ML and traditional simulation modeling.
The next game changer for the AI computing world
“One of the most difficult problems is trying to identify, from those thousands and thousands of ideas, which will take the industry in a brand new and extraordinarily profitable direction?” Lee says. “It’s very easy to dismiss an idea, not realizing that changes in the technology, changes in the optimization, changes in compilers, can shake up the game in fantastic ways.”
For Olukotun, the next innovation will be around natively executing the global data flow of large models as ML algorithms evolve.
“If you look at current architectures, they’re focused on dense matrix multiply units, but they’re not thinking about sparsity or how these kernels communicate,” he says. “And so if you can capture this data flow on chip, you can get much more efficient execution of the whole computational graph. You don’t spend a lot of your time shuffling data between the chip and the off-chip high-bandwidth memory, as you do in traditional GPU architectures.”
Matrix multiplication is important, Lee agrees. AMD’s new CDNA technology, can do matrix fused multiply-adds on a variety of different operand sizes, but knowing how those are going to fit together in conjunction with how dense or sparse the problem is, and being able to do that, whether it’s through libraries or compilers, is critical.
“All of these elements, including the matrix multiply-adds have been around for a while, but understanding different operand sizes and sparsity, and combining them in new ways is one of the largest trends in AI and ML today,” he says.
For more insight to into the future of computer architecture, the state of ML and how successful companies can begin to leverage new technologies by evolving old tech ecosystems, access this VB Live event now.
Access free on demand here.
Why multicore architecture is on its last legs -and how new, advanced computer
architectures are changing the game
How to implement state-of-the-art converged training and inference solutions
New ways to accelerate data analytics and scientific computing applications in the same accelerator
Alan Lee, Corporate Vice President and Head of Research and Advanced Development, AMD
Kunle Olukotun, Co-founder and Chief Technologist, SambaNova Systems
Naveen Rao, Investor, Adviser & AI Expert (moderator)
Presented by SambaNova Systems