Rafay Launches Serverless Inference Offering for AI Adoption

Rafay Systems, a leader in cloud-native and AI infrastructure orchestration & management, announced general availability of the company’s Serverless Inference offering, a token-metered API for running open-source and privately trained or tuned LLMs. Many NVIDIA Cloud Providers (NCPs) and GPU Clouds are already leveraging the Rafay Platform to deliver a multi-tenant, Platform-as-a-Service experience to their customers, complete with self-service consumption of compute and AI applications. These NCPs and GPU Clouds can now deliver Serverless Inference as a turnkey service at no additional cost, enabling their customers to build and scale AI applications fast, without having to deal with the cost and complexity of building automation, governance, and controls for GPU-based infrastructure.

The Global AI inference market is expected to grow to $106 billion in 2025, and $254 billion by 2030. Rafay’s Serverless Inference empowers GPU Cloud Providers (GPU Clouds) and NCPs to tap into the booming GenAI market by eliminating key adoption barriers—automated provisioning and segmentation of complex infrastructure, developer self-service, rapidly launching new GenAI models as a service, generating billing data for on-demand usage, and more.

“Having spent the last year experimenting with GenAI, many enterprises are now focused on building agentic AI applications that augment and enhance their business offerings. The ability to rapidly consume GenAI models through inference endpoints is key to faster development of GenAI capabilities. This is where Rafay’s NCP and GPU Cloud partners have a material advantage,” said Haseeb Budhani, CEO and co-founder of Rafay Systems.

Also Read: Majesco and Vertex Announce Partnership to Deliver Advanced Tax Solutions

“With our new Serverless Inference offering, available for free to NCPs and GPU Clouds, our customers and partners can now deliver an Amazon Bedrock-like service to their customers, enabling access to the latest GenAI models in a scalable, secure, and cost-effective manner. Developers and enterprises can now integrate GenAI workflows into their applications in minutes, not months, without the pain of infrastructure management. This offering advances our company’s vision to help NCPs and GPU Clouds evolve from operating GPU-as-a-Service businesses to AI-as-a-Service businesses.”

SOURCE: Businesswire

Archives

Categories

Meta

Also Read: Majesco and Vertex Announce Partnership to Deliver Advanced Tax Solutions