Inference of Meta's LLaMA model (and others) in pure C/C++ with
minimal setup and state-of-the-art performance on a wide range
of hardware
