Apple will work with Nvidia to dramatically improve the performance of its Large Language Models (LLM). This collaboration includes the use of a new text generation technique that significantly increases the speed of artificial intelligence programs.
Earlier this year, Apple introduced the ReDrafter (ReDrafter) approach and released it as open source. This method increases the speed of the text generation process by combining Beam Search and Dynamic Tree Attention techniques.
Beam Search examines multiple possible text sequences simultaneously to achieve the best results, and Dynamic Tree Attention organizes and removes redundant overlaps between sequences to improve performance.
Related article
Apple has now integrated ReDrafter technology into Nvidia’s TensorRT-LLM framework. The mentioned technology is designed to optimize the execution of large language models on Nvidia graphics. According to Apple, this integration will increase the speed of generating tokens by 2.7 times.
According to Apple, the improvement in the company’s artificial intelligence performance, in addition to significantly reducing the delay for users, also helps to reduce the pressure on the graphics and reduce energy consumption.