Everyone asks which model we use. Almost nobody asks how we prepare the data.
Walk through any small-business block in Long Island City or Midtown right now and you'll find the same conversation happening inside three-person law firms, twelve-person real estate brokerages, and forty-person accounting practices: "Which AI should we buy?"
It's the wrong question.
We've shipped AI systems for NYC firms where the model choice made a 5% difference to accuracy. Data prep made a 60% difference. According to McKinsey's State of AI 2024 report, companies seeing real value from AI aren't the ones chasing the biggest models — they're the ones rebuilding their data pipelines first. MIT Sloan's 2024 research reached the same conclusion: the gap between AI winners and losers is almost entirely in how data is prepared, not which model is chosen.
This post is partly a technical note and partly a small announcement: EnovaCreations is now building AI integrations for NYC small businesses — document automation, intake systems, internal "ask your data" tools, and workflow automation for professional services firms. Same team, same local focus, a new layer on top of the websites and SEO work we've always done.
We wanted the first post about that expansion to be honest about where the real engineering time actually goes. Because if you're evaluating AI vendors right now — us, a national consultancy, or an in-house hire — the questions you ask will determine whether you end up with a demo that wowed a meeting or a system that still works six months later.
Here's what actually moves the needle.
The uncomfortable truth
If your AI system is underperforming, nine times out of ten it's not the model. It's that the model is being fed garbage and asked to perform magic.
## 1. Chunking strategy: how you split the document matters more than the model
Large language models have context limits. So every document — a 200-page commercial lease, a folder of patient intake forms, a decade of case law — has to be split into pieces before the AI can work with it. Most tutorials tell you to split by character count. "Cut every 1,000 characters, overlap 200." That works for a blog post. It falls apart on real business documents. Why it matters for your business: If a clause gets split across two chunks, the AI sees half the clause and confidently gives you a wrong answer. For a Flatiron-district law firm reviewing NDAs, that's a compliance issue. For a Forest Hills real estate brokerage screening leases, that's a missed auto-renewal date and a lost year of rent negotiation. What good looks like: Split by semantic section — clause boundaries, form fields, headings — not arbitrary character counts. For a lease, that means each numbered clause is its own chunk. For a patient form, each question/answer pair. For a contract, each section and its defined terms. This is unglamorous. Nobody tweets about their chunking pipeline. It's also the single biggest lever on system accuracy we've found.
## 2. Metadata: tag before you retrieve
Every chunk we create gets tagged with structured metadata: document type, client, date, source file, section heading, jurisdiction if relevant. Why it matters: When a user asks "what does our standard NDA say about non-solicitation for New York vendors?" — a generic AI system reads every chunk in your library and hopes the most relevant one scores highest. A well-tagged system filters first (document_type = "NDA" AND jurisdiction = "NY") and only then asks the model to compare a handful of clearly-relevant chunks. The result: cheaper, faster, and dramatically more accurate. You're using the model for what it's good at (understanding language) and using traditional database filtering for what it's good at (narrowing a haystack). Rule of thumb for business owners: If an AI vendor's demo can't answer "show me only documents from 2025 signed by Firm X," they haven't built real metadata infrastructure. That's a red flag.
## 3. Cleaning: PDFs are a nightmare, and an hour here saves weeks later
Real business documents are messy.
- Headers and footers repeated on every page.
- OCR artifacts from scanned contracts ("th1s cl@use shall" instead of "this clause shall").
- Tables rendered as image garbage.
- Signature pages that throw off the whole document structure.
- Mixed languages in cross-border contracts.
- Redacted sections.
We spend real time building cleaning pipelines specific to each client's document types. An hour of careful cleanup at the start saves weeks of "why is the AI hallucinating on this one clause?" later.
For NYC professional services firms, this is especially true. Law firms, real estate brokerages, and accounting practices have document libraries built up over decades — each vintage with its own formatting quirks. Ignoring that history and dumping everything into a generic AI tool is how projects quietly fail in month three.
4. Query rewriting: users ask messy questions
Here's something AI vendors rarely admit: end users ask terrible questions.
A paralegal types: "that thing about terminating early in the Smith lease."
A partner types: "what did we agree on rent escalation for 2024."
A patient types: "do I need to fast before my appointment next Tuesday."
A great AI system doesn't just throw these raw queries at the model. It preprocesses — rewrites the messy question into a clear one, identifies the entities ("Smith lease" → document_id=LEASE-2024-0341), expands abbreviations, and disambiguates.
Why it matters: A cheap preprocessing step that clarifies what the user is actually asking routinely outperforms an expensive frontier model working on raw input. Small business AI is full of this kind of high-leverage, low-cost engineering that gets skipped because it's not exciting to talk about.
5. Evaluation data: 20 real test cases beat reading 500 outputs
Most teams deploy AI systems and "eyeball" whether the outputs look right.
This doesn't scale and it doesn't catch regressions.
What we do instead: sit down with the firm's actual staff — the paralegal, the office manager, the bookkeeper — and write 20 to 40 carefully-crafted test cases. Real questions the system will actually face, each with the "right" answer documented.
Every time we change a prompt, swap a model, or update the cleaning pipeline, we re-run those tests. If accuracy drops on case #17, we know before a client ever sees it.
Why small businesses should care: A vendor who doesn't have an evaluation set can't honestly tell you whether their last "improvement" made things better or worse. They're guessing. You deserve better than guessing.
Where the time actually goes
For a typical AI integration project we build for a small NYC firm, here's the rough time breakdown:
| Phase | Share of project time | Why |
|---|---|---|
| Data prep (chunking, cleaning, metadata) | ~60% | The single biggest lever on accuracy, and also the most client-specific work. |
| Evaluation + iteration | ~15% | Building tests, running experiments, tuning prompts against real staff workflows. |
| Model selection and configuration | ~15% | Important, but a commodity. The gap between "good enough" models is narrowing every quarter. |
| UI and integration into existing tools | ~10% | Plugging into Clio, Follow Up Boss, QuickBooks, Dotloop, whatever stack is already in use. |
Nobody brags about this breakdown on social media. Nobody sells you a "60% data prep transformation." But it's what separates a flashy demo from a system your team actually uses in January, April, and the following December.
What this means if you're evaluating AI vendors
This is the practical part. If your firm is looking at AI integrations — from us, from a national consultancy, or from an in-house hire — here are the questions that will tell you who actually ships working systems:
- How will you handle our specific document types? If the answer is generic, the system will be generic.
- What does your chunking strategy look like for a 100-page lease (or patient file, or tax return)? If the vendor blinks, they haven't thought about it.
- What metadata will you attach to each chunk, and how will we filter on it later? This is the fast-vs-slow, cheap-vs-expensive question.
- How will you clean scanned PDFs and older documents? Especially critical for firms with a long paper history.
- Can I see your evaluation set for a similar client? They don't have to share the client's data, but they should be able to show the shape of how they test.
- How do you handle a user asking a messy or ambiguous question? The answer should be something more thoughtful than "the model figures it out."
- What's your plan when the model changes under us in six months? Models update. Systems built on top of them need to survive that.
If a vendor can answer these clearly, you're in good hands — whether that's us or someone else. If they pivot every answer back to "our model is the best," that's the red flag.
The short version for NYC business owners
Which AI model you pick is the wrong first question.
Data prep is where the real leverage lives. Chunking, metadata, cleaning, query rewriting, and evaluation are what separate an AI system that wins a meeting from one that quietly wins back ten hours a week for your team, every week, for years.
This is the work EnovaCreations has been doing quietly behind the scenes for the last year, and it's the work we're now offering formally as an AI integration service for NYC small businesses. If you have a workflow that feels like it should be automated by now — the six-hour paperwork ritual, the intake triage, the document stack on the corner of the desk — we'd like to hear about the specific task. No decks, no "AI readiness score." Just a conversation.
Frequently Asked Questions
Is EnovaCreations still doing websites, SEO, and Google Business Profile work?Yes. Every existing service remains, with the same team and the same pricing. AI integration is an added service, not a replacement.
Does the model I pick really not matter?It matters — just a lot less than people think. For most small-business tasks, any of the top three or four models is "good enough." The accuracy, reliability, and cost differences between well-prepared and poorly-prepared data dwarf the differences between models.
How long does a realistic AI integration project take for a small firm?Our starting shape is a two-week prototype sprint against one specific task, followed by a four-to-eight-week production build if the prototype proves out. Data prep is the bulk of both phases.
Do you store or train on our data?No. We design every integration with your data privacy in mind, use vendor-neutral contracts where possible, and never use client data to train public models. For regulated industries (healthcare, legal), we scope deployments to comply with HIPAA and attorney-client confidentiality norms.
What's RAG, and is that what we need?RAG (retrieval-augmented generation) is the umbrella technique for most "ask your documents" AI systems — chunking your documents, retrieving the relevant pieces at query time, and letting the model answer from those pieces rather than from its general training. It's the right pattern for roughly 80% of the small-business AI problems we see. The other 20% is closer to traditional automation with a language model sprinkled in.
How do we get started?Book a free 45-minute AI audit from our site. You'll leave with a ranked list of the three highest-leverage automation candidates in your business, with rough time and cost estimates — whether or not you end up working with us.


