A guide

CloudCruise, traced through BADGER: a guide to the architecture and where it stops.

CloudCruise is a Y Combinator W24 company building a developer platform for browser agents. The product surface is plain-English workflow authoring, a managed VM browser fleet, and a maintenance agent that watches every run. Underneath, the actual machinery is a directed-graph DSL called BADGER, published under MIT. Reading BADGER tells you a lot about what the platform can do, and one specific thing about what it cannot. This page walks the architecture in five layers, then names the one architectural boundary that decides whether CloudCruise is even a candidate for your workflow.

Matthew Diakonov, Written with AI

Published April 28, 202611 min

The product, in one paragraph the docs would actually agree with

You describe a web workflow in plain English. The platform's coding agent compiles it into a BADGER graph: a YAML-shaped file where each node is a typed browser action and each edge is a control-flow path. The graph runs on managed VM Chromium with deterministic execution, smart queueing, and per-credential rate limits. When something breaks, a separate maintenance agent classifies the failure, decides on a remediation (retry, fall back to a vision selector, escalate to a human), and in some cases patches the graph file so the same break does not recur. You trigger workflows over an API.

The public reliability claims, from the home page at the time of writing, are 99.9 percent session availability, less than two seconds average retrieval time, and one hundred percent uptime across a three-month production window. They closed a five million dollar round in March 2026, anchored on healthcare browser automation. The founders, Adrian Ziegler, Felix Martin Eckert, and Vere, came out of Stanford and Google. None of these numbers are this guide's job to litigate. The numbers are the surface; the architecture is what tells you whether the surface fits your workflow.

Layer one

BADGER: every workflow is a typed graph of browser actions.

The first thing to know is that there is no script. A BADGER workflow is a graph file. The README at github.com/CloudCruise/BADGER calls out the structure directly: nodes are clearly defined browser actions (NAVIGATE, CLICK, INPUT_TEXT, BOOL_CONDITION, EXTRACT_DATAMODEL, and so on), edges are explicit control flow between nodes, including conditions and loops. The unit of authoring is a node; the unit of replay is the graph.

BADGER workflow file (illustrative)

That shape carries a real architectural commitment. A graph file can be diffed in a pull request. A graph file can be statically validated against an input and output schema before a run starts. A graph file can be rewritten by a different agent without breaking everything that referenced it. None of those properties hold for a thousand-line Playwright script. They are the reason most of the new RPA platforms in this generation, including the one this site sells, store workflows as files rather than as recordings.

BADGER also draws an explicit firm line, in the README's own words, between where models help and where they do not. Models help during authoring (turn English into a graph) and during repair (classify a failure, propose a patch). Models do not run the steady-state replay. The runtime walks the graph and executes typed actions; that is it. Most enterprise compliance teams will not accept a model on the runtime path because it makes the bot non-reproducible. BADGER's authoring-AI plus deterministic-replay split is the design choice that lets a browser bot pass a SOC 2 audit and still benefit from frontier models when writing the workflow.

Layer two

Five execution strategies, picked per node, in one ladder.

The interesting part of BADGER is not the node taxonomy. Most browser DSLs have a NAVIGATE and a CLICK. The interesting part is that every node carries a selector_strategy field that picks one of five ways to find the target element. The CloudCruise docs at docs.cloudcruise.com list them under the workflow editor overview. The author picks the strategy when the node is written; the runtime can fall back to a different strategy on retry; the maintenance agent can rewrite the strategy when a node breaks repeatedly.

Five strategies, in order from cheapest to most flexible

STATIC

Pin an explicit XPath selector on the node. Deterministic targeting. The fastest mode and the one that keeps a workflow reproducible across runs, until the page ships a class-name change and the selector breaks.

LLM_DOM

The runtime hands the trimmed DOM to a model and asks it to pick the element that matches the node's intent. Pays off when a vendor portal reorders fields between releases. Costs an inference call per use.

LLM_VISION

When the DOM is misleading (rendered iframes, canvas controls, late-binding components), the model looks at a screenshot and points at the element. The repo notes vision retries up to fifteen times before the node fails.

COORDINATES

Pin a literal x and y on the node. The escape hatch for the case where neither selector nor model can describe the target. Fragile to viewport changes; useful for the last five percent.

PROMPT

The node is a free-form instruction the model interprets at runtime against the current run context. Closer to a full agent than a step. The graph still constrains the surrounding flow.

Read the ladder top to bottom and the design becomes obvious. STATIC is the unit-economics floor: a deterministic XPath, no inference cost, runs in milliseconds. LLM_DOM is the first point where the model enters the loop, but only the trimmed DOM, which the runtime can pre-process to a few kilobytes. LLM_VISION is the slowest and most expensive node type, and the README admits it: vision retries up to fifteen times before a step fails. COORDINATES is the escape hatch when neither selector nor model can describe the target. PROMPT is the agent-shaped escape hatch when the step is genuinely free-form and the graph is willing to spend a model call on it.

What this ladder buys you is graceful degradation. The 2003-era selector-only RPA tools had one strategy, an XPath, and one failure mode, the run stopped. A five-strategy ladder absorbs small UI shifts (LLM_DOM picks the field even though the id changed), medium ones (LLM_VISION picks the button even though the DOM is canvas), and the long-tail of unrepresentable cases (COORDINATES on the last five percent). The price is that every node now has to declare its level of paranoia up front, and the maintenance agent has to know how to walk the ladder when something breaks.

Layer three

The maintenance agent, where the second model lives.

The selector ladder absorbs predictable failure. Unpredictable failure (a portal that ships a new auth screen, a payer that adds a captcha, a vendor that switches to a single-page app framework) needs a different shape of help. The CloudCruise home page calls this the maintenance agent, and the docs describe it as a separate concern: classify the error, repair or retry, escalate when neither works. It is the place where the second model lives, and the place where the most interesting product judgments happen.

A run that falls through the ladder, then to the maintenance agent

The diagram above is the canonical happy-and-unhappy path. A STATIC node runs and matches. An LLM_DOM node runs and the element is not there. The runtime escalates to the maintenance agent, which classifies the failure (the field id changed in the latest portal release), proposes a patch (rewrite the node to a new XPath, or change the selector_strategy to LLM_VISION), re-runs from the failed node, and writes the patch back to the graph file so the next run never sees the failure. That is the self-healing claim, and it is a real one.

The honest limit on this loop is what the maintenance agent can see. It can see the DOM. It can see a screenshot. It can see the network responses the page made. It cannot see a modal that a different Windows process popped up over the browser, it cannot see the system print dialog that an export action triggered, and it cannot see the desktop SAP GUI window that opened when the user clicked a deep link in an EHR. Those are not browser events; they happen in processes the BADGER runtime has no handle on. The maintenance agent is sharp inside its window, and silent outside it.

“Five execution strategies (STATIC, LLM_DOM, LLM_VISION, COORDINATES, PROMPT) on top of the browser DOM, plus a separate maintenance agent that patches the graph when it breaks, is the modern shape of browser RPA. Reading BADGER as a public spec is the cleanest way to see that shape laid out.”

github.com/CloudCruise/BADGER (MIT) and docs.cloudcruise.com workflow editor overview

The boundary

One architectural sentence decides whether CloudCruise can run your workflow.

Every BADGER selector strategy reads the same input surface. STATIC reads document.evaluate. LLM_DOM reads a trimmed copy of document.documentElement. LLM_VISION reads a screenshot of the rendered viewport. PROMPT reads the run context, which is built from prior nodes, which were built from the same DOM. COORDINATES reads literal pixels inside the same browser window. Five strategies, one surface: the browser tab. That is the genius and the limit of the design.

For a workflow that lives entirely inside Chromium, that surface is enough. Vendor portal logins, insurance form fills, payer claim status checks, lead enrichment, document downloads, the long tail of B2B SaaS interfaces, all of those are a tab. The BADGER runtime has the right vocabulary and the maintenance agent has enough signal to keep the runs green. CloudCruise is sharp on this surface and the public customer stories (healthcare browser automation in particular) reflect it.

For a workflow that crosses out of the tab, the surface stops being enough. A SAP GUI window is not a DOM. A Jack Henry green-screen is not a DOM. An Oracle Forms session is not a DOM. A Hyperspace patient chart in Epic is not a DOM, even though the surrounding citrix or browser shell may be. An Excel sheet that the user edits in place is not a DOM. None of these systems expose a document.querySelector to read against, and none of them render through the screenshot pipeline a browser-managed VM understands. The input surface is the Windows operating system itself, and the way to read it is the OS-level accessibility tree.

The same pattern, different surface

How the same architecture looks when the input surface is UIAutomation, not the DOM.

This is the part that is harder to read about, because most comparison pages stop at the marketing claim that one tool is for browsers and another is for desktop. The architectural sentence is more interesting. Mediar's open-source SDK, Terminator (MIT), uses the same overall shape as BADGER: a workflow file, an AI authoring stage, and a deterministic replay engine that walks a small ordered list of match strategies before it gives up. The difference is what each strategy reads.

In apps/desktop/src-tauri/src/focus_state.rs between lines 168 and 196, the restore_focus_state function tries four strategies in order. Strategy one matches on the recorded automation or accessibility id. Strategy two matches on the parent window plus the element bounds. Strategy three matches on visible text content. Strategy four falls back to focusing the parent window so the next step retries from a known anchor. None of the four strategies call a model. All four read the live Windows UI Automation tree, which is the same tree screen readers use to describe a Windows application to a blind user.

That tree exposes role, name, automation id, bounding box, and parent chain for every visible control in every running Windows process. SAP GUI exposes it. Jack Henry exposes it. Oracle Forms exposes it. Epic Hyperspace exposes it. Chrome and Edge expose it as well, which means the same runtime can also drive a browser tab, but through the OS surface rather than through the DOM. Five strategies became four because the data the strategies read is more uniform: a node in UIAutomation has a stable set of attributes that a node in a DOM does not.

The takeaway is not that one architecture is better. It is that they are doing the same job on different surfaces, and the choice of surface is a choice of which legacy systems you can touch. Nothing about CloudCruise's ladder is wrong; it is the right ladder for the surface it reads. Nothing about Mediar's ladder is more sophisticated; it is the right ladder for the surface it reads. Pick the tool by the surface your workflow actually lives on, not by the strategy count.

Two short lists, instead of a comparison table

The right way to use this guide is not as a versus page; it is as a triage. If your workflow lives on the surface CloudCruise reads, CloudCruise is a good answer. If it does not, no amount of self-healing on the wrong surface will help. Two checklists are enough.

When CloudCruise is the right answer

the workflow lives entirely inside Chromium
your buyer's compliance team accepts a managed VM browser fleet
the site already returns structured data in network responses
you want plain-English authoring and a managed concurrency layer
a maintenance ticket on a broken selector is acceptable downtime

When the input surface is wrong and you need a different runtime

the data lives in SAP GUI, Oracle Forms, or any thick-client Windows app
the user is on a Jack Henry, FIS, or Fiserv green-screen
the workflow opens an Excel file the user edits in place
an Epic or Cerner Hyperspace window is the system of record
your security team requires the bot to run on the user's own desktop, not a fleet VM

The simpler form of the same triage: if a screen reader can read the system through a browser tab, CloudCruise is in scope. If the screen reader has to attach to a separate Windows process to read the system, you need a runtime that reads UIAutomation directly. Both are real categories. Both have a place. Buying one when you needed the other is the most common reason RPA rollouts stall.

Where this guide ends and your evaluation begins

A useful evaluation of CloudCruise asks four questions, in order. First: how much of the workflow is a browser tab, in minutes per run? Second: how often does the underlying portal ship a release that breaks selectors, in releases per quarter? Third: what is the cost of an outage between maintenance-agent patches, in dollars per minute? Fourth: which compliance frame does the bot need to fit, and does a managed VM browser fleet satisfy it? The answers map directly onto the architecture in the layers above. The five-strategy ladder is sized to question two. The maintenance agent is sized to question three. The managed VM fleet is the answer to question four. Question one is the input-surface question; if the percentage is low, the rest of the answers do not matter.

The point of this page is not to talk you out of CloudCruise. It is to give you a concrete enough picture of what CloudCruise is for that you do not buy it for the wrong workflow, and to give you a vocabulary (input surface, selector ladder, maintenance agent) that you can carry to any other automation vendor and ask the same four questions. The architecture is the only thing that does not lie.

Bring a workflow that lives outside the browser tab.

If your buyer is on SAP GUI, Jack Henry, Oracle Forms, or a Hyperspace window in Epic, those are the workflows that need a different input surface. We will record one live in the call and show the four-strategy cascade replay against your environment.

Frequently asked questions

What is CloudCruise, in one paragraph?

CloudCruise is a Y Combinator W24 company building a developer platform for browser agents. You describe a web workflow in plain English, the platform's coding agent compiles it into a directed graph called BADGER, and a managed VM browser fleet runs that graph on a schedule or via API call. A second agent watches for failures and patches them. They raised a $5M round in March 2026 and have made healthcare payer portals and EHR web apps their public focus area.

What does BADGER stand for?

Browser Automation Directed Graph Engine Ruleset. The repository is at github.com/CloudCruise/BADGER under the MIT license. It is a workflow DSL built to replace ad-hoc Playwright scripts with explicit graphs of nodes (browser actions like NAVIGATE, CLICK, INPUT_TEXT, BOOL_CONDITION, EXTRACT_DATAMODEL) and edges (control-flow paths). The DSL is the declarative layer; the runtime that executes a BADGER graph is the closed-source product.

How many ways can a BADGER node find an element?

Five execution strategies, configurable per node. STATIC for an explicit XPath. LLM_DOM for an instruction the model resolves against the trimmed DOM. LLM_VISION for screenshot-based picking with up to fifteen retries. COORDINATES for literal x and y. PROMPT for a free-form instruction the model interprets against the current run context. Selector cascade plus model assist is the modern shape for browser RPA, and BADGER is one of the cleanest public expressions of it.

Is CloudCruise a UiPath replacement?

Only for the part of UiPath's surface that is web-based. UiPath ships a Windows desktop runtime, a Citrix runtime, a mainframe terminal connector, and a browser runtime. CloudCruise replaces the browser runtime well and does not replace the rest. If your UiPath estate is mostly SAP GUI, Oracle Forms, or banking core green-screens, swapping it for CloudCruise leaves the harder workflows uncovered. If your UiPath estate is mostly vendor portal logins and form fills, CloudCruise is a credible swap.

Can BADGER drive a desktop application?

The repository's public surface area is browser-only. There is a use_native_actions option on a node that lets the runtime issue OS-level mouse and keyboard events instead of synthetic DOM events, but that is still inside the context of a browser tab. Driving a separate Windows process (a SAP GUI window, an Excel sheet, a Jack Henry session) requires a different runtime that reads the OS accessibility surface, not a DOM. That is the architectural boundary the rest of this page is about.

How does CloudCruise's self-healing differ from a model-in-the-loop bot?

Self-healing here means the runtime classifies the failure (selector miss, network error, captcha, two-factor prompt), picks a remediation (re-resolve via LLM_DOM, escalate to LLM_VISION, queue a human, retry), and patches the graph file when the fix recurs. The model is in the authoring loop and in the failure-recovery loop, not in the steady-state replay path. This is the same split most modern RPA systems land on: AI to write and to repair, deterministic execution to run.

Where does CloudCruise stop and where does Mediar take over?

CloudCruise stops at the edge of the browser tab. If the workflow is a payer portal login, an insurance form fill, or a download from a vendor extranet, CloudCruise is in scope and well-suited. The moment the workflow opens a desktop SAP GUI screen, a Hyperspace window in Epic, an Oracle Forms session, or a Jack Henry green-screen, the DOM stops being a meaningful representation and a different surface is needed. Mediar reads the Windows UI Automation accessibility tree, which exposes role, name, automation id, and bounds for every visible control across every running Windows process. The same idea (a workflow file plus a selector cascade plus AI authoring) implemented against a different input surface, which is why it can drive thick-client apps that no browser-only tool can.

Is the BADGER repo something I can use on my own?

The DSL spec, the node taxonomy, and a reference Playwright runner are public under MIT. You can read the repo to see what categories of action a real production browser-RPA system thinks are worth being first-class. You will not get the managed VM fleet, the maintenance agent, the credential vault, or the self-healing classifier without becoming a customer. Treat the repo as an architecture document; treat the platform as the product.

Keep reading

Architecture

What robotic process automation actually is, traced through the source

A six-event capture filter, a four-stage synthesis pipeline, and a four-strategy replay cascade. The mechanical answer to the question, with the open-source files that implement each layer.

Read

Input layer

RPA agent UI input layer: accessibility tree versus pixels

The choice of input surface is the most consequential architectural decision an RPA agent makes. Walks the tree-versus-pixel split and what each gives up.

Read

Company

Mediar, the company: the founders, the funding, and the open-source SDK

Background on the Y Combinator backed company at mediar.ai, the open-source Terminator SDK, and how the open-source pieces fit into the commercial product.

Read