The AI layer looks over-provisioned
Enhancing every meeting by pushing full transcripts through a top-tier model is the simplest thing to build and an expensive thing to run. Most of what a meeting summary needs — speaker turns, action-item extraction, basic structuring — does not require a flagship model on every call. Paying frontier prices for commodity work is margin left on the table, and it caps how generous the free / low tiers can be in a market where acquisition is everything.
The moat is narrower than the polish implies
"Bot-free local capture" is a UX choice, not a defensible technology — it's replicable, and the assistants people already use are moving toward native listening. The notetaker market is crowded (Fireflies, Fathom, Otter, TL;DV, and Granola together pull tens of millions of visits), and the differentiation is converging.
The memory layer is mostly promise
The contextual-recall vision is the real moat, but cross-meeting, queryable, durable memory is the hard part — and it's the part that's least built. Right now the depth lives in single meetings, not across them.
Platform gaps
No Android (as of early 2026) leaves a meaningful slice of the market — and of many companies' employees — uncovered.