Why Banking77 is hard
Noisy labels
Intent data is messy. Banking77 does not give you a clean synthetic path to good-looking results.
SeedFrontier.ai
Banking77 breakthrough
Intent classification result
Noisy labels. 77 intents. Real overlap. SeedFrontier just hit 94.42% on the official Banking77 test set, up 0.59 percentage points over baseline, under a strict full-train protocol with no leakage.
94.42%
official Banking77 test accuracy
Measured on the official test set
+0.59pp
over baseline
Meaningful lift under the same evaluation frame
~225 ms
inference
Runtime kept visible instead of hand-waved away
~68 MiB
model footprint
Compact enough to matter in deployment discussions
Result snapshot
Accuracy
94.42%
official Banking77 test score
Delta
+0.59pp
improvement over baseline
Runtime
~225 ms
inference
Footprint
~68 MiB
model size
Why Banking77 is hard
Intent data is messy. Banking77 does not give you a clean synthetic path to good-looking results.
Why Banking77 is hard
This is a large intent inventory with plenty of room for confusion across semantically adjacent classes.
Why Banking77 is hard
The hard part is not just class count. It is the genuine semantic overlap between user intents.
The headline
The point of this page is not to shout “state of the art” in the abstract. It is to show a result that is specific, measurable, and anchored to a known benchmark with visible operating characteristics.
94.42% on the official Banking77 test set, with a strict full-train protocol, no leakage, roughly 225 milliseconds inference, and a footprint around 68 MiB.
Protocol
The landing page needs to make the evaluation discipline obvious. That is what separates a real benchmark statement from a dressed-up experiment.
The headline number is the real external score, not a cherry-picked split or internal holdout.
The result was produced with a disciplined training setup rather than ad hoc experimentation.
The page makes credibility explicit. No contamination, no hidden shortcut, no soft benchmark framing.
Accuracy is paired with inference and memory footprint so the result reads like a deployable system.
Why this matters
Messaging line
Banking77 is not easy. That is exactly why this result is worth publishing.
Coming next
This page is the clean public frame for the result: the challenge is real, the benchmark is official, the gain is measurable, and the runtime profile stays visible.
The deeper write-up can expand on architecture, training setup, ablations, and what the result means for SeedFrontier as a product.