Matching every German tender to a company in 2 seconds

The problem

A small or mid-sized company that could win public contracts faces a recurring problem. Among the thousands of tenders published in a single week, a handful are a close fit. Most of them are never found.

They are scattered across dozens of portals, written in bureaucratic language that rarely matches the words a company uses for its own work, and buried under thousands of contracts that are irrelevant. Finding the few that fit means reading past hundreds that do not. The rational response is to give up before starting: the work is there, the budget is there, and it goes to whoever happened to be looking.

The scale makes this concrete. A representative German corpus holds 25,745 tenders (25,393 of them from the EU TED database), of which 13,767 are active contract notices. The question is always the same: which of these fit a given company? That is hard for one company on one day. It becomes a different kind of problem when the answer has to be produced for every company, every day, automatically.

The requirement

The ordering of the work matters more than it first appears. Before investing in reading a tender, checking its deadline, pulling the documents, and deciding whether to bid, only one thing is needed: a fast, honest answer to the question "what is even relevant to look at?"

Most existing tools invert this. They place a late, expensive step right at the beginning. Qualifying a tender in depth is a step-four problem; it should not be the first thing touched. The first step has to be relevance, and it has to be cheap.

There is a capacity constraint too. Most companies never seriously engage with public tenders because they lack the people for it. A tool that then needs three more people to operate has not solved the problem; it has added another one.

So the requirement follows directly. To even decide whether public tenders are worth the effort, the very first step in the pipeline has to be extremely fast, already selective, and return curated results. Not a search box, but a short, trustworthy shortlist that can be judged at a glance.

The pipeline

From 25,745 tenders to a match

Each stage strips cheap work off the next: first a deterministic CPV pre-filter, then a language model as the judge.

Why the obvious approaches do not cut it

The obvious approaches are worth ruling out first. Each falls short for a specific reason.

Keyword search. Searching for the relevant words finds the obvious matches, misses everything named differently, and floods the results with hits where the word appears but the contract is something else entirely. A search for "lighting" returns stage lighting, traffic signals, and a study on light pollution.

CPV codes. Every tender is tagged with one or more CPV codes (Common Procurement Vocabulary), the EU's standard catalogue of "what is being bought." This is genuinely useful: it is structured, language independent, and every notice carries it. But as the only tool it is far too blunt. A code names the rough category, not whether a company can deliver the job. Thousands of unrelated contracts share a single code, the people writing tenders pick codes inconsistently, and a company that does three things carries codes that overlap with half the catalogue. There is simply too much fuzziness in it to decide a match on the code alone.

Vector search. The clever-sounding fix turns each company description and each tender into an embedding: a long list of numbers, a point in a high-dimensional space, arranged so that texts with similar meaning land close together. A company is then matched to the tenders nearest to it. It feels right. In testing it works badly, for one structural reason.

Semantic closeness is not the same as fitness to deliver. Consider a company that manufactures LED street luminaires. In embedding space it lands right on top of a tender for the three-year maintenance of municipal street lighting: both are saturated with the same words, street and lighting and LED and luminaire and municipal. The model sees near-identical text and calls it a match. But one wants a product on a pallet and the other wants a crew with ladders on call for three years. Same neighbourhood, opposite business.

Illustrative

Close in vector space, opposite in practice

Illustrative 2-D sketch of a high-dimensional space. The three tenders crowding the manufacturer are near-identical to a vector search. Only one is a contract the company can actually win.

Does a re-ranker fix this?

The usual rescue is a hybrid: retrieve with vector search, then run a re-ranker — a second, heavier model that re-scores the shortlist and drops the bad ones. It helps, but only with half the problem.

A re-ranker can clean up false positives: candidates that were retrieved but do not actually fit. The candidate was shown to it, so it can throw it out. What it can never do is recover a false negative: a tender that fits but was never retrieved in the first place. If fuzzy embeddings did not pull it onto the shortlist, the re-ranker never sees it, and it is gone.

That asymmetry is the whole game. False positives are annoying but cheap: a human glances and moves on. False negatives are silent and expensive: a contract that could have been won, never known to exist. Vector search, fuzzy by nature, produces exactly the kind of error a re-ranker cannot fix.

Illustrative

Two ways to be wrong

Illustrative. A re-ranker drags the result left (fewer false positives) but never down: the matches vector search failed to retrieve stay missed. The CPV pre-filter plus an LLM that reads every survivor lands in the bottom-left.

The approach: a filter that cannot miss, then a judge

The pipeline described here is built around that one asymmetry. The CPV pre-filter is deterministic, so it does not "miss" the way embeddings do: if a tender carries a matching code, it survives, full stop. A language model then reads every survivor and judges it one by one. Nothing relevant is dropped on a quiet similarity score. As long as the CPV code is right, there are no false negatives left to recover, because nothing was thrown away unseen.

The design follows from there: a deterministic filter that cannot miss, a model fast enough to read every survivor, and a caching technique that makes reading them nearly free. The next three steps are exactly that.

Step 1: The CPV pre-filter, the 80/20

This is the fast, selective first step that everything else depends on. CPV codes are 8 digits long and hierarchical: each extra digit is one more level of detail, and each level cuts the field. Follow one branch down, for a company that lays cable:

45 00 00 00   Construction work            division   ~12,768 candidates
45 31 00 00   Electrical installation      class        ~3,461 candidates
45 31 10 00   Wiring and fitting work       sub-cat        ~808 candidates
45 31 43 00   Cabling infrastructure        leaf           ~788 candidates

The deeper the code goes, the fewer tenders survive, but the gains shrink fast. Across real company profiles (about 18 CPV codes per firm on average), the full funnel looks like this:

CPV digits	Avg candidates/firm	Share of corpus
2 (division)	12,768	50.3%
3 (group)	6,860	27.0%
4 (class)	3,461	13.6%
5 (category)	1,048	4.1%
6 (sub-category)	808	3.2%
8 (leaf)	788	3.1%

The first three to four digits do the heavy lifting. The biggest single cut is at 4 to 5 digits, from 3,461 down to 1,048 candidates, a drop of 70%. From the sixth digit on, the funnel has basically run dry: going from 6 to 8 digits saves only 20 more candidates, while buying a real risk, namely excluding genuine matches that share the category but carry a different leaf.

Diminishing returns

CPV selectivity per digit

The candidate pool drops steeply through 5 to 6 digits, then barely moves. Filtering finer just trims a few more while risking real matches.

That gives a sweet spot at 5 to 6 digits. Filtering coarser lets too much junk through, which multiplies the cost of the next stage; filtering finer saves almost nothing while throwing away good hits. In production the filter runs at 6 digits: 25,745 tenders become about 808 candidates on average, 97% gone, deterministically and for roughly $0.40 per company.

Step 2: The diffusion LLM as the judge

The remaining ~800 candidates need real judgement, exactly the "does this firm deliver this, or does the contract want something else?" question that vectors fail at. That is a job for a language model.

Classic autoregressive language models produce their answer token by token, sequentially. For this task that is doubly awkward: latency sensitive (each token has to be waited for) and slow in aggregate when hundreds of verdicts are needed.

A diffusion language model works differently. It does not generate word by word, it refines the whole answer in a few parallel passes at once, like an image being sharpened out of noise. The result is dramatically higher throughput and low latency. Concretely, with the diffusion LLM used here: 0.24 s latency per request, 427 tokens/s throughput.

How it generates

Token by token vs all at once

Token by tokendone

Thistenderfitsthecompany

Diffusion, all at oncedone

Thistenderfitsthecompany

Top: a normal model writes one token at a time, left to right. Bottom: a diffusion model starts with every token at once as noise and crystallises them into place in a few fast passes, so the answer lands almost immediately.

Rather than 800 sequential requests, the judge processes 100 tenders per wave in parallel. For ~800 candidates that is roughly eight waves, about 2 seconds for a complete company against every relevant tender in Germany.

Why diffusion, and not just a fast model fired in parallel?

A fair objection: any fast autoregressive model, fired at all 800 judgements at once, would hit roughly the same wall-clock. The decisive difference is early stopping.

One use case builds a free preview for a company: its 20 best-matching tenders, shown in seconds. That does not need all 800 verdicts, only the strongest handful. So the judging runs in fast batches and stops the moment a batch has produced enough strong matches. Firing all 800 at once would compute every verdict even when the first wave already answered the question. Because each diffusion batch is so cheap and low latency, this batch-then-early-stop loop stays almost instant: the preview is usually ready after a wave or two, not after grinding through everything.

Step 3: Prompt caching, why calls 2 to 800 cost almost nothing

Consider the cost of those 800 calls. Each call is built from two parts. The first is long and never changes: the instruction plus the company's full profile, its service groups and references, about 1,575 tokens of stable text. The second is short and changes every time: the one tender being judged, about 393 tokens. So roughly 80% of every call is identical to the last one.

Modern LLM providers expose exactly this with prompt caching. The first time the model processes that long prefix it does the full work and stores the result. Every later call that starts with the same prefix reuses the stored work instead of recomputing it, and is charged roughly one tenth of the normal input price for those tokens.

This fits the workload almost perfectly. When one company is matched against every tender, the 1,575-token company prefix is byte for byte identical across all 800 calls. It is paid for in full exactly once, on call 1. Calls 2 through 800 read it from cache at a tenth, and only the ~393 fresh tender tokens cost full price.

Prompt caching

One company profile, paid for once

The 1,575-token company profile is identical in every call. It is billed in full once, then read from cache 799 times.

Multiply that across a full run and the prefix, 80% of the input, almost vanishes from the bill after the first call.

Prompt caching

Cost of one full run (1 firm x 25,745)

Caching the company prefix cuts input cost by 72% and total cost by 42% (output tokens are unchanged, so the total saving is smaller than the input saving).

The result: human vs machine

The two approaches side by side, on the same task, one company against its ~808 pre-filtered candidates:

	Human	Diffusion LLM (+ cache)
Time	13.5 hours (1.7 working days)	~2.2 seconds
Cost	~187 EUR (minimum wage 13.90 EUR/h)	~0.40 USD (~0.37 EUR)

Showstopper

Time for one company (~808 candidates)

About 22,000x faster. The diffusion-LLM bar is there; it is just far too short to see next to a human.

Showstopper

Cost for one company (~808 candidates)

About 500x cheaper. Same story: the diffusion-LLM bar barely registers next to a human at minimum wage (13.90 EUR/h).

That is about 22,000x faster and about 500x cheaper, and that is in the fair version where the human also gets the CPV pre-filter. Without it, a human would have to check all 25,745 tenders: 429 hours, about 54 working days, about 5,964 EUR, per company, per run.

This is exactly where applied AI pays off. What explodes with customer count for a human stays cheap for the machine. Matching 1,000 companies a day in full against their candidate pool costs about 400 USD and takes minutes. For a human that would be 13,500 working hours a day, simply impossible.

The chain is the point: a CPV pre-filter (97% gone, free, and no false negatives), then a diffusion LLM as the judge (parallel, 2 s), then prompt caching (input down 72%). Each step on its own is well known. Together they turn a "cannot be done in real time" problem into a matter of split seconds and cents.