New startup with a new processor for inference

Cerebras Systems, known for its innovative Wafer Scale Engine (WSE), has received a mix of feedback regarding its processors, particularly compared to traditional GPUs like those from Nvidia.

Cerebras is an American artificial intelligence (AI) company that specializes in building computer systems for complex AI deep learning applications. Cerebras Systems launched on Tuesday a tool for AI developers that allows them to access the startup's outsized chips to run applications, offering what it says is a much cheaper option than industry-standard Nvidia processors.

We couldn't find publicly available reviews or comments from users with hands-on experience with Cerebras' Interconnect AI hardware. You can Try the Tool by yourself here.

Startup Claims

  • Speed: 1,800 tokens/sec for Llama 3.1-8B and 450 tokens/sec for Llama 3.1-70B, 20x faster than NVIDIA GPU-based hyperscale clouds.
  • Price: Cerebras Inference offers the industry’s best price-performance at 10c per million tokens for Llama 3.1-8B and 60c per million tokens for Llama-3.1 70B.
  • Accuracy: Cerebras Inference uses native 16-bit weights for all models, ensuring the highest accuracy responses.
  • Access: Cerebras Inference is open to everyone today via chat and API access.

High Cost vs. Performance: Cerebras's processors, particularly the WSE, are significantly more expensive than Nvidia's offerings, costing around ten times more than an H100 GPU. Despite this, Cerebras claims that their chips can deliver superior performance for specific AI workloads, executing tasks up to 20 times faster than Nvidia GPUs while being priced at a fraction of the cost for certain applications.

Specialized Use Cases: The WSE is designed specifically for deep learning, boasting over 1.2 trillion transistors and optimized memory architecture. This makes it particularly effective for large-scale AI models, reducing training times from months to mere minutes in some cases. However, this specialization means it may not be suitable for all types of workloads, leading some users to prefer more versatile GPUs

Cooling and Manufacturing Issues: There seem to be significant challenges related to the thermal management of such large chips. The WSE requires advanced cooling solutions due to its high power density, which can complicate its deployment in data centers. There are ongoing discussions about the feasibility of using alternative cooling methods, such as pressurized gases, to enhance performance without overheating.

Inference in the context of AI refers to the process of applying a trained machine learning model to new, unseen data in order to make predictions or decisions.

>