What is High-Efficiency AI?

ORDER REPRINTS DOWNLOAD COMMENT DISCUSS SHARE

All major tech companies like Microsoft, Google, Amazon, Apple, and Tesla are working on their own AI processors—see Google’s TPU (Tensor Processing Unit)...

A desperately congested highway

Millions of ants trying to enter their anthill altogether through a single hole—this is the kind of bottleneck the exponentially growing quantity of data generates with existing computing architectures. Known as the Von Neumann bottleneck, the problem comes from the architecture design that forces program memory and data memory to share the same bus and limits throughput and processing speed on large amounts of data. Next to using caches, parallel computing is one of the workarounds, but it comes at extra costs in terms of execution time and power consumption.

It looks like our computing infrastructure must be completely rethought to cope with the data tsunami. Quantum computing is one of the promising paths for the future. Leveraging the proprieties of quantum theory like the superposition of particles, quantum computers are expected to perform the order of magnitude parallel calculations necessary to survive in the zettabyte age. At least, this is what the theory and lab experiments promise. We’ll have to arm ourselves with patience before seeing quantum computing hardware installed in real-world environments. The most optimistic experts estimate this won’t happen until 10 years from now.

This is why other, more prosaic approaches are needed. Hardware acceleration is booming, with a plethora of AI-optimized chips designed to process large blocks of data in parallel. All major tech companies like Microsoft, Google, Amazon, Apple, and Tesla are working on their own AI processors—see Google’s TPU (Tensor Processing Unit), developed specifically to speed neural network machine learning, in particular their own TensorFlow software. All in all, these are important steps to help us tame the data flood. But improving the hardware is like resolving only half of an equation with two unknowns. One needs to look at optimizing the software, too.

Taking lessons from nature

Why does the human brain use a mere 20 watts to reason, analyze, deduct, and predict, while an AI system like IBM Watson needs 1,000 times more power to perform complex operations? What is it that makes our brains so much more efficient in processing information? Despite all advancements in neurosciences, we still don’t know. We can replicate the structure of the brain in a processor—the so-called neuromorphic chips—but we can’t emulate how it works.

One theory developed by Jeff Hawkins, the founder of Numenta (as well as Palm Computing and Handspring, where he was one of the inventors of the PalmPilot and Treo), advocates that the brain uses a single representation format to process any kind of information, be it sound, image or language. He calls this representation a Sparse Distributed Representation (SDR) and describes its advantages in terms of efficiency and resiliency in his book On Intelligence.

Semantic Folding is a method for natural language understanding based on SDRs, focusing on the representation instead of statistics. In this method, text is converted in semantic fingerprints—binary 2D vectors sparsely filled with active bits that are distributed in such a way that those representing similar meaning are placed close to each other. Rendered as an image, a semantic fingerprint looks similar to brain imaging pictures. When exposing people to concepts, one can see in fMRI that the same areas in their neocortex are activated. In other words, people have similar representations of similar concepts in their brains.

In Semantic Folding, analogies can be easily calculated by using simple Boolean operations on text. By comparing the overlap of their semantic fingerprints, terms can be immediately disambiguated. The beauty of this approach is that it requires ten times less training material to build a use-case specific language model than when training a custom model on top of BERT or GPT-3. This is because the text analysis happens at the semantic fingerprint level and leverages analogy. In other words, the accuracy of a Semantic Folding-based system is not affected by “unknown” words (words that are not contained in the training set), because the system infers the meaning of a document based on its similarity with other documents.

The fact that semantic fingerprints are sparse vectors (compared to the dense vectors with floating points used in Transformer models) results in immediate efficiency gains in terms of computing power—you need one to two hours on a laptop to compute a model. This is essential for applied AI in a business context. And it’s essential if we want to reduce the carbon footprint of AI.

There is no way around high-efficiency AI. Reinventing computing architectures is a first step. Replacing brute force and billion-data-AI models with more efficient software is next.

Follow @PipelineWire