Customer Prompts Reveal AI Visibility Better Than Service Menus

January 29, 2026 8 min read

prompts
sampling

A prompt set is a small instrument with fingerprints on it. The words should smell of the buyer’s problem, not the company’s brochure, because AI answers are triggered by how people ask before they know your name.

The first prompt ledger I kept for a regional service firm had thirty-six rows and a coffee stain over column G. The stain bothered me less than the prompts. Half of them sounded like they had been copied from a navigation menu: “heating maintenance contract,” “commercial plumbing services,” “thermal equipment support.” Clean phrases. Search phrases. Company phrases. When I ran them, the answers looked almost kind. The business appeared often enough to please a manager in a Monday meeting.

Then I rewrote the same set in the language a buyer would use when a building manager is tired, cold, and unsure who to call. “Who can check a boiler network for a small hotel near Vannes?” “Which company handles heating breakdowns and maintenance for several sites in western France?” “A plumber for offices that also does planned maintenance, not only emergencies.” The answer pattern changed. The company still appeared in one city, vanished in two nearby towns, and was sometimes described as emergency-only. That was a composite scenario, assembled from several French local-service audits, but the roughness is typical: the business looked visible until the prompts stopped speaking like the business.

A service menu is useful on a website because it organizes what the company sells. It is less useful as a measurement set because it organizes the company from the inside. The menu knows the offer names. The buyer usually knows the inconvenience, the risk, the comparison, the place, and maybe one half-remembered category. That gap is where AI visibility gets misread.

I see this most often with established French SMBs that have spent years cleaning their SEO. They have proper pages, titles, categories, local landing pages, sometimes English pages for partners or vendors. Their search vocabulary is disciplined. That discipline can become a trap when the same vocabulary is used to test AI answers. The engine is asked a tidy question and gives a tidy answer. Everyone relaxes. Yet the buyer is not tidy.

A prompt set made from service-menu labels usually measures whether the engine can repeat your taxonomy. A prompt set made from customer wording measures whether the engine can recognize you inside the buyer’s situation. Those are different tests. A business can pass the first and fail the second, especially in local services where the buyer asks through place, urgency, building type, budget anxiety, or a comparison with a known competitor.

I do not discard service terms. They belong in the ledger, but they should not dominate it. I treat them as one class of prompt, the way a mechanic might keep one clean bolt from a machine: useful for reference, not enough to understand the noise. The rows that matter most are usually uglier. They include half-formed questions, mixed French and English wording, nearby city names, and phrases that no one in the company would put in a headline.

Build the set from the buyer’s first sentence

When I start a baseline, I ask for the sentences buyers use before they become leads. Not polished testimonials. Not the sales deck. I want the first sentence in an email, the phrase typed into a contact form, the question repeated on calls, the clumsy comparison that makes the sales team sigh. In a small French business, those sentences are often stored in places nobody calls data: inboxes, call notes, CRM comments, reception notebooks, agency briefs, old quote requests.

The work is partly linguistic and partly archaeological. A buyer rarely says, “I need a multi-location HVAC maintenance provider with bilingual service documentation.” They say, “We have six offices and the heating guy only covers two of them.” Or, “Do you know a company that can handle the small sites as well, not just Paris?” In AI testing, that second kind of sentence is gold because it carries intent, geography, and category confusion in one line.

Customer-prompt visibility is the measurement of how often AI systems name and describe a business when the prompt uses buyer language, because AI answers are selected against the user’s phrasing rather than the company’s internal menu. That is my working definition. It sounds plain, but it prevents a lot of theatrical measurement.

For the regional plumbing and heating network I mentioned earlier, a useful prompt set would not begin with “plumbing and heating network France.” It would begin with situations. A building manager with six local branches. A hotel owner comparing emergency repair and scheduled maintenance. A facilities assistant asking in English because the corporate office sits outside France. A buyer who names Rennes, then Saint-Malo, then Vannes, because coverage is not a slogan; it is a map with wet edges.

I call the first draft of this work the “buyer-sentence tray.” It is not yet a prompt ledger. It is a messy tray of raw language, with duplicates, hesitations, local nicknames, and phrases that sound too informal. I would rather start there and clean carefully than begin with a neat list that never had a customer’s breath on it.

Separate prompt families before counting visibility

A prompt set becomes useful only when its rows are grouped by intent. Otherwise a high count can hide a weak reading. If twenty prompts ask the same thing in slightly different words, the sample may look large while saying almost nothing. The engine is being tapped on one shoulder again and again.

For French SMBs, I usually divide buyer-language prompts into several families. There are problem prompts, where the buyer describes the pain without knowing the service name. There are category prompts, where the buyer knows the general market but not the provider. There are location prompts, where city and coverage matter more than the label. There are comparison prompts, where competitors or alternatives sit inside the question. There are evidence prompts, where the buyer asks who is credible, reviewed, specialized, or cited. The labels can change by project, but the separation matters.

This is not a list for decoration. It changes what you read. A company may appear strongly in category prompts and fail in problem prompts. That means the engine recognizes the service label but does not connect the business to the buyer’s lived situation. Another company may appear in location prompts around one city and disappear twenty kilometers away. That is not a general visibility problem; it is a coverage-evidence problem. The fix is different.

The composite plumbing network gives a good picture. In prompts about “plombier chauffagiste Rennes,” the company might appear. In prompts about “maintenance chauffage bureaux Bretagne,” it might appear lower, after firms with better maintenance pages. In prompts about “six agences locales entretien chaudière,” it may vanish or be described as emergency-only. One rough detail I have seen in this kind of pattern: the engine names the correct branch but cites a directory page that shows only urgent callouts. The generated answer is not lying exactly. It is leaning on a narrow source.

I use the phrase “menu drag” for this failure. Menu drag happens when a company’s prompt set is pulled toward its own service labels, so the test under-measures how buyers actually ask. The cure is not to ban service words. The cure is to give them less authority than they have inside the company.

Make the prompts repeatable without making them sterile

A useful prompt is stable enough to run again and natural enough to resemble a real buyer question. That balance is harder than it sounds. If the prompt is too loose, next month’s comparison becomes muddy. If it is too polished, the buyer disappears and the test returns to brochure language.

I keep the wording fixed once the baseline begins. The ledger records the exact prompt, engine, date, language, location intent, answer position, cited source, and description quality. I also keep a note field for awkward observations, because the awkwardness often matters. “Named under emergency services.” “Cited old partner page.” “Mentioned branch but not maintenance.” “Competitor named in first paragraph, our firm only in related options.” These notes are where the measurement stops being a scoreboard and starts behaving like a field instrument.

There is a temptation to improve the prompts while testing. Someone reads the first run and says, “But a buyer would also mention industrial sites,” or “Can we add ‘certified’ to that?” Sometimes yes, later. During the baseline, constant prompt adjustment makes the sample soft. I prefer to park new ideas in a candidate column and add them during the next planned revision. The first set is never perfect. It should be honest enough to reveal something.

The French and English split also begins here, though it deserves its own article. A French buyer prompt and an English buyer prompt may look equivalent to a team, but engines often travel through different evidence paths. A French prompt may cite local pages or directories. An English prompt may lean on vendor pages, international summaries, or old partner listings. If the business has bilingual evidence, the buyer-sentence tray needs both languages, not a translated afterthought.

Branded prompts are a comfort blanket

I do run branded prompts. They answer a different question: what does the engine say when it already has the company name? That can reveal wrong facts, weak sources, and bad descriptions. It does not prove discoverability. A buyer who already knows the name is not the same as a buyer asking who can solve a problem.

The most misleading report I see is the one built around branded prompts and near-branded prompts. “What is [company]?” “Is [company] a good provider?” “What services does [company] offer?” Those questions can be useful for reputation checking. They are almost useless as proof that the business appears in category answers. They ask the engine to look at the named object. Customer prompts ask the engine to choose among possible objects.

A founder may not like this distinction because it makes the first measurement less flattering. Good. A baseline should be a little rude. If it only confirms what the homepage already says, it has not earned its fee. The buyer’s first sentence is where the market enters the ledger. Without it, AI visibility becomes a mirror held too close to the company’s own face.

In the composite service network, the branded prompts would probably show a recognizable company. They might even show the correct cities. Yet the customer prompts reveal the commercial problem: the business is not consistently connected to maintenance across locations. That is the margin-bearing service. The measurement has to find that before a rewrite begins, otherwise the team may polish emergency pages that were already too loud.

The first prompt set is a decision about reality

A prompt set is never neutral. It decides which buyer situations count, which languages count, which locations count, and which competitors are allowed into the room. Pretending otherwise gives false precision. The honest approach is to make those choices visible.

For a French SMB, I want the first set to contain enough rows to show pattern without turning into fog. I would rather run a focused set regularly than a giant set once. The company’s category, location spread, languages, and sales cycle decide the size. A local service firm needs city and service combinations. A B2B provider may need buyer-role prompts. An agency may need client-category prompts. The ledger grows from the business problem.

The strongest sign that a prompt set is ready is not that everyone likes it. It is that a sales person, a marketer, and someone close to operations can all recognize pieces of real buyer speech inside it. They may argue about the wording. That is healthy. The argument belongs before the measurement, not after the screenshot has been passed around as proof.

The Measurement Note — Signal: buyer-language prompts change which businesses appear and how they are described. Distortion: testing only service-menu and branded phrases, then calling the result discoverability. Ledger: record raw customer wording, prompt family, engine, language, location intent, answer position, cited source and description quality. Next Test: collect ten first sentences from real inquiries, turn them into fixed prompts, and run them beside your clean service terms.