SPONSORED Your phone buzzes at 2 AM. The website is down. Slack has become a wall of red alerts, and customers are already tweeting. You stare at the screen, still half-asleep, trying to figure out where to even begin looking.
This is the ritual that site reliability engineers (SREs) know too well. These are the folks that must keep online services running at all costs, and when those services go down, stress levels soar. Recovery is a race against time, yet most teams burn the first hour just gathering evidence before the actual troubleshooting begins.
“The first five minutes is panic,” says Goutham Rao, chief executive officer and co-founder of NeuBird. “The next 25 minutes is assembling the crew to say that we have a proxy error. Get on Slack, get on phone calls, call people.” War rooms get spun up. Bridge calls convened. Fingers pointed between teams while the outage clock keeps ticking.
Rao knows the pain first-hand. The serial entrepreneur once had to fly from San Francisco to Amsterdam to fix his own bug in a dark datacenter because the customer wouldn’t allow remote access. The downtime was essentially the flight time. He decided there had to be a better way, and so Neubird was born.
This startup is backed by Microsoft and partnered with AWS, and is making this whole dance unnecessary. Its product Hawkeye is an AI-powered SRE that runs the investigation while your team is still rubbing the sleep from their eyes. Rao emphasizes that this isn’t another chatbot for querying logs. It’s an agentic system that forms hypotheses, tests them against your telemetry, and tells you what actually broke.
Why Cloud Operations Hit A Breaking Point
SRE automation has been a long time coming, says Rao. The architecture that makes modern software possible is also what makes it so maddening to debug. Service-oriented architectures became the industry standard over the past two decades because they let teams build faster. However, they also create a tangled mesh of interdependencies that few fully understand. These are complex systems, where pulling a thread in one system can unravel another thousands of miles away.
Here’s a scenario Rao describes: your website times out. Intuitively, it looks like a problem with the UI or web application layer. You’d think something is wrong with the front end. But the real problem turns out to be a database running out of resources three layers down.
“The root cause of why your website is running slow is not because of anything related to your web app or your compute. It’s because you’re running out of capacity,” he explains. “Who would have thought this? And it takes a long time for people to be able to connect these dots.”
The tools meant to help have created their own problems. AWS environments now generate millions of telemetry data points across thousands of resources. You can instrument everything, but more visibility often just means less clarity. This problem is often called the observability paradox.
According to AWS, 70 percent of alerts require manual correlation across multiple services. Engineers typically spend three to four hours investigating complex incidents, and that’s before anyone starts fixing anything.
Rao is quick to point out this isn’t about replacing people. “It’s not about do the same with fewer people,” he says. “That’s never been the case in any innovation cycle. It always is do more with what you have.”
What Makes Agentic AI Different
The AIOps market is crowded with tools that slap chatbot interfaces onto log queries and call it innovation. Hawkeye is doing something structurally different, and the distinction matters if you’re going to trust it with your production environment.
Most enterprise AI products use retrieval augmented generation (RAG). You feed documents to an LLM, vectorize them, then ask questions about that content. That approach works fine for corporate knowledge bases and policy documents, but it collapses in a heap if you try to use it for IT telemetry.
“You can’t copy all of your IT telemetry into ChatGPT and say ‘help me’,” Rao explains. “That doesn’t work.” The data is a constantly changing morass of logs, traces, configuration data, and time-series metrics captured at millisecond granularity. You can’t dump all of that into a prompt window and expect useful results.
Agentic systems flip the approach. Instead of feeding content to the LLM and asking questions, you tell the LLM to figure out what information it actually needs, then surgically extract it from your data sources. The LLM generates investigation programs rather than prose answers.
This is where context engineering becomes more important than prompt engineering. Rao uses a medical analogy to explain the difference: even the best doctor in the world cannot diagnose you accurately if you cannot describe your symptoms properly.
“The problem with LLMs is you can ask it a question and you’ll always get an answer,” he says. “That’s a problem for production systems, because you don’t want to mislead a person.” Give an LLM the wrong context and it will confidently solve the wrong problem. The trick is making sure it asks the right questions of the right data sources before it starts reasoning.
A System That learns – And Writes Its Own Instructions
Underneath Hawkeye sits something NeuBird calls the Raven AI Expression Language (RAEL). It’s a structured grammar that lets LLMs create verifiable investigation programs rather than natural language responses. These programs can be validated and compiled, which eliminates hallucinations in the investigation steps themselves.
“For us an agentic system is a combination of an expert system with the cognitive capabilities that exist in Gen AI,” Rao explains. The system marries expert system reliability with generative AI creativity. This makes it structured enough to be trustworthy, but flexible enough to handle novel situations.
The ability to codify investigation techniques enables engineers to shape how investigations run over time. Tell Hawkeye in plain English to pay more attention to networking next time, and the underlying RAEL grammar (which the LLM itself creates) morphs accordingly. You’re coaching a cognitive system, not configuring a static rules engine.
One customer discovered this capability when Hawkeye couldn’t explain a sudden drop in DNS requests. The root cause was an external Cloudflare outage that Hawkeye had no visibility into. The customer responded by adding Cloudflare status checks to future investigations. The system learns.
An Army Of LLMs
Hawkeye doesn’t run on a single LLM, either. NeuBird uses what Rao calls a squadron of models. Some are better suited for time-series analysis, and others for parsing JSON structures. The current mix includes Anthropic’s Claude and various GPT models, though the architecture is designed to swap them as the market evolves. Enterprises can also bring their own Bedrock models, burning down committed cloud spend while using Hawkeye’s investigation framework.
The platform connects natively to AWS services including CloudWatch, EKS, Lambda, RDS, and S3, though it also works with Azure and on-premises environments. Standard observability stacks like Dynatrace, Splunk, and Prometheus are supported out of the box. For organizations running homegrown tooling, the Model Context Protocol (MCP) provides a bridge to proprietary systems.
Security will be a big concern for potential users. Hawkeye operates with read-only access and stores no telemetry data. It only persists some metadata that fingerprints your environment, like how many EC2 instances you have or what Kubernetes clusters exist. For organizations that need additional isolation, there’s a full in-virtual private cloud (VPC) option. All processing happens inside that VPC, and data never leaves their AWS environment.
Keeping Your Hands On The Wheel
Hawkeye stops at recommendations. It won’t automatically execute fixes, and that’s deliberate. “We purposely limit it from taking actions,” Rao explains, arguing that agentic systems are a little like self-driving cars for many; a cool concept, but still new enough for most people to completely take their hands off the wheel. That said, for customers who are willing to automate repetitive actions, NeuBird provides an option to automate as such.
Genuinely benign actions, such as toggling feature flags, are OK. In that example, the flag itself has already been tested and the consequences are well understood. But writing code or patching Helm charts? Not yet.
The concern is that a 95 percent success rate with spectacular five percent failure could poison the well for agentic systems entirely. Better to keep a human in the loop for now and build trust gradually.
When Hawkeye cannot solve a problem, it says so. The system fact-checks its conclusions against actual telemetry, so the worst case is admitting uncertainty rather than confidently pointing you in the wrong direction. It also has an interesting behind-the-scenes feature that helps to hone its results: it uses competing LLMs to argue with one another about their findings. This dialectic leads to better results that have been sanity checked.
Hawkeye’s dashboard generates reports showing estimated time saved per investigation. Model Rocket, a custom technology solutions provider running a complex environment spanning Lambda, RDS, ElastiCache, and EKS, cut mean time to recovery by over 90 percent after deploying the platform.
The Cognitive Shift
NeuBird sits in an enviable position. Microsoft is a backer, and the company participates in Redmond’s elite Pegasus program, which provides access to enterprise customers including Adobe, Autodesk, and Chevron. On the AWS side, NeuBird was selected for numerous AWS programs including the Generative AI Accelerator, and Generative AI Competency Partner status, with Hawkeye available on the AWS Marketplace.
Part of what got it there is its understanding that agentic AI isn’t software you configure once and forget about. “You have to treat it like a cognitive being, a cognitive system, because that’s what it’s rooted in,” Rao says. “Coach it, work with it, give it feedback, let it collaborate with it. It’s not a binary system.”
2 AM phone calls for SREs aren’t going away. Infrastructure will always break in creative ways at inconvenient hours. But if NeuBird’s bet pays off, by the time you get to your desk with slippers and coffee at the ready, Hawkeye will be well on its way to delivering a root cause analysis.
Sponsored by NeuBird.ai.