Tushar Krishna - DNN-Dataflow- Hardware Co-Design for Enabling Pervasive General-Purpose AI
From Katie Gentilello on November 16th, 2018
The development of supervised learning based DL solutions today is mostly open loop. A typical DL model is
created by hand-tuning the neural network (NN) topology by a team of experts over multiple iterations, often by
trial and error, and then trained over gargantuan amounts of labeled data over weeks at a time to obtain a set of
weights. The trained model hence obtained is then deployed in the cloud or at the edge over inference accelerators
(such as GPUs, FPGAs, or ASICs). This form ofDL breaks in the absence of labelled data, and/or if the model for
the task at hand is unknown, and/or if the problem keeps changing. An AI system for continuous learning needs to
have the ability to constantly interact with the environment and add and remove connections within the NN
autonomously, just like our brains do.
In this talk, we will briefly present our research efforts towards enabling general-purpose AI.
First, we will present GeneSys, a HW-SW prototype of an Evolutionary Algorithm (EA)-based learning system,
that comprises of a closed loop learning engine called EvE and an inference engine called ADAM. EvE is a genetic
algorithm accelerator that can "evolve" the topology and weights of NNs completely in hardware for the task at
hand, without requiring hand-optimization or back propogation training. ADAM continuously interacts with the
environment and is optimized for efficiently running the irregular NNs generated by EvE, which today's suite of
DL accelerators and GPUs are not optimized to handle.
Next, we focus on the challenge of mapping a DNN model (developed via supervised or EA-based methods)
efficiently over an accelerator (ASIC/GPU/FPGA). DNNs are essentially multi-dimensionalloops, with millions of
parameters and billions of computations. They can be partitioned in myriad ways to map over the compute array.
Each unique mapping, or "dataflow" provides different trade-offs in terms of throughput and energy-efficiency, as
it determines overall utilization and data reuse. Moreover, the right dataflow for a DNN depends heavily on the
layer type, input activation to weight ratio, the accelerator microarchitecture, and its memory hierarchy. We will
present an analytical tool called MAESTRO that we have been developing in collaboration with NVJDIA for
formally characterizing the performance and energy-impact of dataflows in DNNs today. MAESTRO can be used
at design-time, for providing quick first-order metrics at design-time when hardware resources (buffers
and interconnects) are being allocated on-chip, and compile-time when different layers need to be optimally mapped
for high utilization and energy-efficiency.
Finally, we will present the micro-architecture of an open-source DNN accelerator called MAERI that is equipped
to adaptively change the dataflow depending on the DNN layer currently being mapped by levering a runtime
reconfigurable interconnection fabric.