Datadog's Platform Expands to Support Monitoring and Troubleshooting of Generative AI Applications

Datadog, Inc, the monitoring and security platform for cloud applications, announced new capabilities that help customers monitor and troubleshoot issues in their generative AI-based applications.

Generative AI-based features such as AI assistants and copilots are quickly becoming an important part of all software product roadmaps. While there is a lot of promise in these emerging capabilities, deploying them in customer-facing applications brings many challenges including cost, availability and accuracy.

The tech stacks used in generative AI are evolving quickly while new application frameworks, models, vector databases, service chains and supporting technologies are seeing rapid adoption and usage. In order to keep up, organizations require observability solutions that can adapt and evolve along with AI stacks.

Datadog announced a broad set of generative AI observability capabilities to help teams deploy LLM-based applications to production with confidence and help them troubleshoot health, cost and accuracy in real time.

These capabilities include integrations for the end-to-end AI stack:

AI Infrastructure and compute: NVIDIA, CoreWeave, AWS, Azure and Google Cloud
Embeddings and data management: Weaviate, Pinecone and Airbyte
Model serving and deployment: Torchserve, VertexAI and Amazon Sagemaker
Model layer: OpenAI and Azure OpenAI
Orchestration framework: LangChain

Also Read: Digitate’s New Generative AI Capability Unlocks Innovation and Delivers Greater Agility Across Enterprises

Additionally, Datadog released in beta a complete solution for LLM observability, which brings together data from applications, models and various integrations to help engineers quickly detect and resolve real-world application problems like model cost spikes, performance degradations, drift, hallucinations and more to ensure positive end user experiences.

LLM observability includes:

Model catalog: Monitor and alert on model usage, costs and API performance.
Model performance: Identify model performance issues based on different data characteristics provided out of the box, such as prompt and response lengths, API latencies and token counts.
Model drift: Categorization of prompts and responses into clusters enabling performance tracking and drift detection over time.

“It’s essential for teams to measure the time and resources they are investing in their AI models, especially as tech stacks continue to modernize,” said Yrieix Garnier, VP of Product at Datadog. “These latest LLM monitoring capabilities and integrations for the AI stack will help organizations monitor and improve their LLM-based applications and capabilities while also making them more cost efficient.”

SOURCE: PRNewswire

Archives

Categories

Meta