A walkthrough, not a category page

One journal entry, four fields, no LLM in the runtime: what SAP data entry automation actually looks like at the field level.

The pages that come up for this topic mostly describe one of three things: SAP Build Process Automation, SAP GUI Scripting (VBScript over the GUI scripting API), or bulk Excel loaders like Process Runner and Winshuttle. Each of those is a real product, and on the right shaped problem each is the right answer. This page is about a fourth option, which is what most of the existing material skips: an AI agent that watches an operator post one journal entry once, writes a TypeScript workflow file, and then replays it deterministically through the Windows accessibility tree. To make that concrete, I am going to trace one specific workflow we ship for an F&B chain, field by field, with the file and line numbers a reviewer can open.

M
Matthew Diakonov
12 min

The workflow we are going to trace

The Mediar workflow store is a Postgres table; each row is a numbered workflow with a TypeScript file attached. Workflow id 164 is named “Imperial Treasure SAP Journal Entry”, and it is the one I will reference for the rest of this page. Its shape is checked into the executor’s integration test at crates/executor/tests/integration_test_typescript.rs, so a reviewer can read the same struct the runtime reads.

The workflow declares a zod input schema with four string fields: company_code, journal_entry_type, document_date, posting_date. Those four are the contract between whatever system supplies the journal and the SAP-side post. They are also exactly the four fields that have to be correct on the F-02 header before the line items grid will accept input.

apps/web/workflows/imperial-treasure/src/terminator.ts

The fact that this is a real shipped workflow, not a demo, matters for one reason: it makes everything else on this page checkable. The input schema is the audit handle. If a reviewer redlines a field, it changes here. If the upstream system passes a fifth field, the zod parse fails before the runtime ever opens SAP GUI.

Where most existing guides stop

To set up the contrast honestly: the four shapes of content that come up for this topic each address something real, and three of them are incompatible with the workflow above.

SAP Build Process Automation

Vendor-side, inside SAP BTP

Lives inside SAP's Business Technology Platform and is the right answer if your team is already on BTP and your bottleneck is BTP-side workflow orchestration. It is not designed to drive a non-BTP desktop app, and it carries SAP's pricing model.

SAP GUI Scripting

VBScript over a COM interface

Has to be enabled server-side (sapgui/user_scripting) and on the client. Works only inside SAP GUI for Windows. Famous for breaking on patch level changes. Useful for one-off scripts a technical user maintains; not useful when the workflow needs to also touch Excel, a PDF, or a non-SAP app.

Bulk Excel loaders

Process Runner, Winshuttle, LSMW

Map a sheet onto a transaction and submit in batch. Excellent for migrations and large monthly loads. Awkward for ad-hoc, event-driven posts (a single PDF arrives, post it now), and they typically depend on the same SAP GUI Scripting surface under the hood.

Selector-based RPA

UiPath, Power Automate, Automation Anywhere

Drives SAP GUI through a recorded selector tree. The mature category. Production-grade, but the implementation cycle is weeks to months for a single F-02 flow because the selectors have to be tuned by hand, and re-tuned every time SAP repaints a control.

What none of the four describes well is the field-level mechanics of a recording-driven AI workflow. That is the hole this page is trying to fill, with one specific workflow as the evidence.

The runtime, in five steps

From the moment the upstream PDF lands to the moment the document number comes back, the workflow does five distinct things. None of them involves an LLM call. All of them are reading or writing through the Windows UI Automation tree.

Imperial Treasure SAP Journal Entry, end to end

1

Read the trigger

A new journal-entry PDF lands in OneDrive. The recording captured what to do with it: open Excel, validate the totals, then start the SAP post.

The trigger step is just file-watch metadata. No SAP work yet; this is the boundary where AI-recorded automation picks up the document and the deterministic replay takes over.
2

Launch saplogon.exe and route to F-02

The runtime starts SAP GUI for Windows, signs into the recorded system, and types /nF-02 into the OK code field. Each of those is a click_element or type_into_element MCP call.

saplogon.exe is the canonical Windows SAP launcher. Once the session is live, the OK code field appears in the UI Automation tree as an Edit node with the role-based locator name:OK Code|role:Edit.
3

Type the four header fields

company_code, journal_entry_type, document_date, posting_date. Each one is one type_into_element call against the matching accessibility node, with clear_before_typing set so a stale value does not concatenate with the new one.

The four fields are the exact zod inputs the workflow declares. They are what makes the file an honest contract between the recording and any upstream system that posts to it.
4

Click into the line items table and write each row

G/L account, debit/credit indicator, amount, cost center, text. The recording replays the same tab order it captured, and each cell is a type_into_element against the row’s accessibility node.

SAP’s line item grid renders each cell as a UI Automation child of the row, which is why the runtime can address them without pixel coordinates. A line that spans 12 columns is 12 MCP calls, not one screenshot diff.
5

Save with Ctrl+S, read the document number

The runtime emits a key press, waits for the status bar to update, and reads the new document number out of the status-bar accessibility node. That string is the workflow output.

The output is what gets written back into the upstream system: a Mercury booking, a Snowflake row, a Slack confirmation. SAP does not need to know any of that. The workflow file does.

The reason the trace is worth reading carefully is that the interesting work happens at step 3, when the runtime types into a field. That single primitive is what makes the rest of the workflow either reliable or fragile, and it is the part the existing material on this topic does not describe.

The primitive that types into a SAP field

In the Mediar codebase the MCP tool that types text into a control is called type_into_element. It is emitted from apps/desktop/src-tauri/src/mcp_converter.rs (lines 2298 and 2460). The arguments it takes are deliberately narrow: a string to type, a flag for whether to clear the field first, a timeout, and a structured locator. There is no pixel coordinate, no image template, and no model call. The locator is the handle into the accessibility tree.

apps/desktop/src-tauri/src/mcp_converter.rs (paraphrased)

When that step runs against a live SAP GUI session, the runtime resolves the selector to a node in the live UI Automation tree, checks whether the node implements EditPattern, and either calls EditPattern.SetValue directly or falls back to dispatching keystrokes through the accessibility tree when the control does not implement EditPattern. For an F-02 header field, the EditPattern path is the common one, which is why the typing feels instant on screen rather than character by character.

0

The production executor crate has zero references to openai, anthropic, gemini, or any other inference SDK. The model runs once during recording. The runtime is deterministic Rust calling Windows UI Automation.

LLM call sites in crates/executor (verifiable via ripgrep on github.com/mediar-ai/terminator)

What happens when SAP repaints a screen

Selector-based RPA in SAP GUI is a maintenance treadmill because the selectors are tied to the control hierarchy SAP renders, and SAP ships support packs every few months that quietly reorder children, rename labels, or insert a wrapper element. The Mediar runtime handles that case by walking through several locator strategies in sequence before failing.

Locator resolution, in order

  1. Recorded automation id

    If the field still exposes the same UIA AutomationId from the recording, the runtime hits it on the first try.

  2. Window handle plus bounds

    If the AutomationId moved, the runtime falls back to the same handle and bounding box from the recording session.

  3. Visible text label

    If neither matches, it walks the live tree for a control whose accessible name matches the recorded label (e.g. 'Document Date').

  4. Parent window, last try

    Final fallback: locate the parent window by title and let the next step retry. Three strategies have already failed before this one fires.

Three of those four strategies are independent of absolute pixel position, which is why the same recording survives a routine SAP patch level change without a re-recording. The fourth exists to surface the failure cleanly, not to recover invisibly: when all four miss, the runtime stops the sequence and reports the unresolved step to the dashboard. That posture is the right one for SAP, where a half-posted journal is worse than no journal.

Why the workflow file is the audit artifact

The reason finance and audit teams have been slow to sign off on model-driven RPA in SAP is straightforward: a frontier model deciding which control to click on each run is not a deterministic system. Two identical PDFs can produce two different posts. That is fine for a chatbot, and it is unacceptable for a journal entry that flows into a regulated G/L.

The shape that does work is the one this page has been describing: the model writes the workflow once, the workflow is checked into source control, and the runtime is deterministic. The reviewer reads the TypeScript file, the reviewer signs off the file, the runtime replays the file. The audit shape is closer to a SQL stored procedure than to an autonomous agent.

We moved an LG-customer F&B chain from UiPath to Mediar; their CFO told the board they're now saving 70% on costs.
M
Mediar customer reference
F&B chain on SAP B1, internal note

The shape of the loop, in numbers

0fields the workflow input schema requires (company_code, journal_entry_type, document_date, posting_date)
0locator strategies the runtime walks before it fails an unresolved field
0LLM call sites in the production executor crate (run ripgrep yourself to verify)
0execute_sequence MCP step that the executor wraps the entire workflow in

Counts are taken straight out of the codebase. Four input fields are declared in the zod schema for workflow id 164 in crates/executor/tests/integration_test_typescript.rs. Four locator strategies live in apps/desktop/src-tauri/src/focus_state.rs. The executor crate has no LLM dependency in Cargo.toml and grep returns nothing for openai, anthropic, or gemini in the source tree. The single execute_sequence wrapper is what build_typescript_workflow_sequence emits; it is also what the integration test asserts against.

When this is not the right fit, honestly

A few cases where one of the other categories above is a better answer than what this page describes.

You already have a green BAPI for the transaction. If your SAP team has exposed the right RFC, you have authorization, and the workflow does not need to span beyond SAP, call the BAPI. Front-end automation is what you reach for when one of those conditions is missing, not when all three are met.

The job is one big monthly migration. A 40,000-row chart-of-accounts load is exactly what LSMW or Process Runner are built for. Recording-driven automation is built for the ongoing, event-driven case (a PDF arrives, post it; a row appears in a queue, post it), not the one-time bulk load.

You need a model in the loop on each run. If the workflow legitimately requires reasoning at execution time (extract a free-text comment, decide which G/L account it implies, apply judgment), the deterministic runtime is the wrong shape. The right shape is to keep the AI step at recording time and let a human add the reasoning step in front of the queue, so the SAP-side post is still deterministic.

What you would need to ship something like this

If you wanted to build the same shape on top of the open-source primitive instead of the cloud product, the path is short. Terminator (github.com/mediar-ai/terminator, MIT) gives you the UI Automation calls and the locator resolver. The MCP tools that wrap them (type_into_element, click_element, set_value, get_text) are documented in that repo. You provide the recording surface, the orchestration, and the workflow store. That is the path teams pick when they already have an automation platform and want to plug in a SAP-aware desktop primitive without buying another vendor.

The shorter path is to bring one transaction (F-02, FB50, MIRO, VA01, OB52, your call) and let us record it on a session call. The output is the same TypeScript file shape this page has been describing, and it runs against your test client the same hour it is recorded.

Record one F-02 post live and read the file the AI emits

Bring one journal entry. We will record it against your test SAP system on the call, hand you the TypeScript workflow file, and run the deterministic replay back. You leave with a checked-in artifact, not a slide deck.

Frequently asked questions

Does Mediar use SAP GUI Scripting under the hood?

No. SAP GUI Scripting is a VBScript surface SAP exposes through a COM interface; it works only inside SAP GUI for Windows, has to be enabled by Basis on the server and the client (sapgui/user_scripting), and is famously unstable to UI patch level. Mediar reads SAP GUI through the Windows UI Automation accessibility tree, the same surface a screen reader uses. That tree is supplied by the OS, not by SAP, so the same primitive that types into F-02 also types into a Jack Henry green-screen, an Oracle EBS form, or a Win32 desktop app. Nothing in the runtime calls the SAP GUI Scripting API.

Why not BAPI / RFC?

BAPI and RFC are the right answer when you have SAP-side authorization to expose them, a developer to write the ABAP wrapper, and a license that permits external API calls at the volume you need. Plenty of SAP customers do not. Common blockers we see: the BAPI for the transaction does not exist (true for a surprising number of mid-market F&B and retail use cases), the customer is on a hosted instance and cannot get an RFC user provisioned, or the workflow spans SAP plus a non-SAP step (open the PDF, check a value in Excel, then post the journal) and BAPI alone does not cover the boundary. Front-end automation through the accessibility tree handles all three cases without an SAP-side change.

What does the workflow file actually look like?

It is a TypeScript file. The recording pass writes a `createWorkflow` call with a zod input schema and a list of steps. Each step is an MCP tool call: `type_into_element`, `click_element`, `set_value`, etc. For the Imperial Treasure journal entry the input schema has four fields: company_code, journal_entry_type, document_date, posting_date. The integration test for the executor exercises exactly that shape against workflow id 164. The file is the audit artifact. A reviewer can diff it the way they would diff a stored procedure, redline a step, and re-run the deterministic replay against the test SAP instance.

What happens when SAP repaints F-02 after a support pack?

The runtime resolves each field by walking through several locator strategies before giving up. The recorded automation id is tried first, then the window handle plus bounds, then the visible text label of the control, then the parent window as a fallback. Three of those four strategies do not depend on absolute pixel position, so the kinds of UI tweak that SAP support packs ship (a field shifts a row, a screen variant rearranges a tab) usually resolve through one of the first three strategies. Only when all of them miss does the runtime mark that step for re-recording and surface it in the dashboard.

Is there a model deciding what to click while the workflow runs?

No, and this is the architectural bet that lets a regulated finance team sign the workflow off. A grep of the executor crate (crates/executor in github.com/mediar-ai/terminator) finds zero references to openai, anthropic, gemini, or any other inference SDK. The model runs once, during the offline recording-processing pass, where it reads the captured event stream and writes the TypeScript file. After that file is checked in, the runtime is deterministic Rust that walks the accessibility tree and emits MCP tool calls. Two identical inputs produce two identical action sequences.

Can it post a journal entry in F-02, or only in FB50?

Both, and any other transaction in the same family. The runtime does not encode the SAP transaction code; it encodes the screen flow. The recording captures the operator pressing /nF-02, tabbing through company code, document date, posting date, document type, and the line items, pressing save, and observing the resulting document number. If the same operator records FB50 instead, the resulting workflow file has different selectors but the same MCP primitives. Switching transaction codes is a re-recording, not a code change.

How does the document number come back into our system?

The last step of the recording is usually a read: SAP renders the new document number in a status bar after the post completes. The MCP tool called during replay is `get_text` against the status bar element, and the value is returned through the `execute_sequence` step result. From there the workflow file can write it to a database, a Slack channel, or back into the upstream system that supplied the journal data (a POS export, an OneDrive PDF, an Excel sheet). The OneDrive-to-SAP flow we run for one F&B chain follows that pattern end to end.

What if a popup we have never seen appears mid-flow?

The runtime emits a structured failure into the execution trace: the parent window title, the surfaced text, and the unresolved step id. Two things happen next. First, the executor stops the sequence (stop_on_error defaults to true for SAP workflows, since posting half a journal is worse than not posting it). Second, the dashboard surfaces the failure to a human, and the recording app can be opened directly on the failing step to capture the new branch. The runtime never silently retries with a guessed click, because that is the failure mode that turns into a data-quality incident.

Does this work on SAP S/4HANA Fiori, or only the classic GUI?

Both, but the surface is different. Classic SAP GUI for Windows exposes Win32 controls into the UI Automation tree; that is the cleanest case. SAP Fiori is a web app, so the runtime reads it through Chrome's accessibility tree (the same one DevTools surfaces under the Accessibility panel) and the same `type_into_element` primitive works. Hybrid environments (Fiori for the launchpad, classic GUI for the long tail of transactions like F-02 and OB52) are the common case in the field, and a single Mediar workflow can cross between them in one execution.

Is the runtime open source?

The execution layer is. The Terminator SDK that performs the UI Automation calls and the locator resolution lives at github.com/mediar-ai/terminator under MIT, and the MCP tools (`type_into_element`, `click_element`, `set_value`, `get_text`) are documented there. The orchestration layer, the recording pipeline, and the no-code workflow builder are commercial. A team that wants to wire SAP data-entry primitives into their own queue can build directly on Terminator without paying for the cloud product.

How much does an SAP journal entry cost to run?

Pricing is $0.75 per minute of runtime. A four-field journal entry in F-02 typically lands somewhere between 25 and 60 seconds depending on network, screen variant, and how many line items are on the document. At 40 seconds per post, 200 posts a day works out to roughly $40 a day in runtime cost. The $10K turn-key program fee converts to credits with a bonus, so it is effectively prepaid usage rather than a license.