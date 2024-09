Introducing Cerebras Inference

‣ Llama3.1-70B at 450 tokens/s – 20x faster than GPUs

‣ 60c per M tokens – a fifth the price of hyperscalers

‣ Full 16-bit precision for full model accuracy

‣ Generous rate limits for devs

Try now: https://t.co/50vsHCl8LM pic.twitter.com/hD2TBmzAkw