French and English Prompts Often Give Different Verdicts

A bilingual business is not measured twice for elegance. It is measured twice because the machine may walk through two different evidence corridors and return two different versions of the same company.

The first warning sign is usually small. In a composite scenario I use when explaining this to French teams, a 19-person software integrator near Lyon appears confidently in an English AI answer about industrial implementation partners. The answer names the firm, gives it a tolerable description, and cites an English vendor page. Then the same test is run in French. Same engine. Same category. Same rough buyer need. This time the company appears lower, sometimes not at all, and when it does appear the description has a smudge: it sounds like a reseller of software licences, not the team that handles implementation inside factories.

Nothing dramatic has happened. The site has not broken. The brand has not vanished. The problem is quieter than that. The French prompt and the English prompt have opened different cupboards of evidence. One contains vendor directories, partner pages and English ecosystem wording. The other contains French case studies, local service pages, maybe an old directory entry, maybe a trade-media mention with thinner language. If you mix those runs into one cheerful average, the average lies with a clean face.

A language is not only a translation layer

Many French SMBs treat bilingual testing as a courtesy: run the prompt in French because the customers are French, run it in English because some evidence exists in English. That is a start, but it misses the mechanism. The language of the prompt does not merely change the words in the generated answer. It can change the sources selected, the entities compared, the geographic assumptions, and the business category itself.

For a French service company, “intégrateur logiciel industriel près de Lyon” and “industrial software implementation partner near Lyon” are cousins, not twins. The first phrase may push the system toward French business directories, regional case pages, or pages where “intégrateur” is used loosely. The second may pull in vendor ecosystems, partner listings, English documentation pages, and international category language. A human marketer sees one business. The machine sees two trails.

This is why I dislike bilingual AI visibility reports that show one column called “visibility” and leave the prompt language buried in a note. The language is not a footnote. It is a measurement condition.

French-English prompt divergence is the measurable gap between AI answers produced from equivalent buyer prompts in French and English, because each language can route the engine toward different sources, categories and descriptions.

That definition is deliberately plain. It keeps us away from mystical explanations. The gap is not a personality flaw in the model. It is usually evidence routing. The model has more than one path to your company, and the prompt language helps choose the path.

Equivalent prompts are harder than translated prompts

A direct translation often looks neat on a spreadsheet. It also often fails as a test. Buyers do not translate their anxiety cleanly. A French operations manager might ask for a “prestataire pour connecter notre ERP à la production.” An English vendor ecosystem might call the same work “shop-floor systems integration” or “industrial ERP implementation.” These are not decorative differences. They are category doors.

In the Lyon integrator composite, the English prompt “best software integrators for French industrial SMEs” tended to position the company among vendor-adjacent firms. A French version, if translated too literally, became stiff and unnatural: “meilleurs intégrateurs logiciels pour PME industrielles françaises.” A buyer might write that, but often the real question is messier: “qui peut nous aider à déployer un logiciel de production dans une usine près de Lyon.” The second French version changes the evidence path. It introduces help, deployment, production, factory, and location. The answer changes.

This is where I use what I call the paired-prompt test. I do not ask, “What is the French translation?” I ask, “What is the French buyer trying to solve, and what is the English-language evidence likely to call that same problem?” Then I keep the pair close enough to compare, but not so close that one side becomes unnatural.

A paired prompt is not a mirror. It is two footprints made by people walking toward the same purchase from different linguistic ground.

The ledger should show that roughness. I want columns for prompt intent, actual prompt text, language, engine, location signal, business named or not, position, cited source and description quality. If the French prompt has a stronger local phrase than the English one, I note it. If the English prompt uses a vendor term French buyers rarely use, I note that too. Measurement improves when the test admits its own seams.

Separate the source gap from the description gap

When a bilingual test gives different verdicts, teams often jump too quickly to rewriting. They see weaker French visibility and conclude, “We need more French pages.” Maybe. But first I want to know which gap I am seeing.

There is a source gap when the French and English prompts cite different classes of pages. The English run may cite vendor directories; the French run may cite the company website or a local business profile. Or the reverse: the French run may lean on old directories while the English answer uses a current partner page. The cure depends on which source is doing the feeding.

There is a description gap when both languages can find the company, but one language describes it badly. In the integrator example, the English answer sometimes understood the company through partner language and got the industrial context roughly right. The French answer sometimes flattened the work into resale or generic IT support. That is not absence. It is a bad description wearing the clothes of visibility.

There is also a position gap. The business appears in both languages, but the English answer places it in the first named group while the French answer buries it after bigger consultancies. Position is not the same as citation, and citation is not the same as accuracy. A business can be named high, cited weakly and described poorly. That is not a victory; it is a noisy observation.

I usually classify bilingual differences into three ledger rows: source divergence, category divergence and description divergence. Source divergence asks, “Which pages fed the answer?” Category divergence asks, “Which market bucket did the engine put the business into?” Description divergence asks, “What did the answer say the business actually does?”

The classification is boring on purpose. It prevents the meeting from becoming a theatre of opinions. Nobody has to argue whether “the AI likes us more in English.” The ledger shows which part of the answer changed.

French evidence can be thinner even when French pages exist

A company may have French pages and still have weak French evidence. This sounds unfair until you read the pages closely. Many French SMB sites contain elegant service language but thin extractable facts: few client types, few category phrases, little geographic specificity, no clear distinction between resale, implementation, maintenance, training and support. The page feels acceptable to a human reader who already knows the firm. To an answer engine, it is a polite fog.

English evidence can have the opposite problem. Vendor partner pages may be structured, category-rich and easy to cite, but they describe the business through the vendor’s frame. The firm becomes a “certified partner” or “solution provider,” which may be true and still too narrow. In the composite Lyon case, the English evidence helped the firm appear, but it also risked trapping the description inside vendor language.

That is why bilingual measurement should not automatically prefer the stronger-looking language. If English prompts produce more mentions, the next question is: more mentions as what? A reseller? A consultant? A certified partner? A local implementation specialist? The answer may be useful for one buyer path and harmful for another.

French pages often need more than translation from English. They need their own evidence architecture: buyer problems named in French, service boundaries stated plainly, locations tied to real work, case material described without hiding the category. The aim is not to stuff pages with mechanical phrases. The aim is to make the correct description easier to repeat.

A useful French page gives the machine handles without making the human reader feel they are holding a catalogue.

Do not average the languages too early

The worst bilingual reports are tidy. They show an overall score, a few screenshots, a recommendation to improve content, and a line saying that tests were run in French and English. That hides the one thing the business needed to learn.

For a French SMB with mixed-language evidence, I want two ledgers before I want one score. The French ledger answers one business question: how does the company appear when the likely French buyer asks in their own words? The English ledger answers another: how does the company appear when the evidence ecosystem, vendor language or international category terms are activated? Both matter, but they do not mean the same thing.

Only after several runs do I look for a combined reading. Even then, the combined reading should be conservative. If the business appears reliably in English but fails in French, I would not call it visible for the French market. If it appears in French but only through vague or old sources, I would not call that stable. If it appears in both languages but the descriptions diverge, I treat that as a correction problem before I treat it as a growth problem.

The discipline is simple: keep the languages apart long enough for their errors to become visible.

This also helps agencies. When a client asks why the English answer looks better than the French one, the answer should not be a shrug about models being unpredictable. Show the cited sources. Show the prompt wording. Show the category label the system used. Show whether the company was absent, present, cited, high-positioned or accurately described. The conversation becomes less glamorous and much more useful.

What I test before recommending a rewrite

Before I tell a company to rewrite anything, I want a small but repeated bilingual sample. I start with buyer-intent prompts, not service-menu prompts. I run French and English separately. I keep engines separate as well, because a language difference inside ChatGPT is not automatically the same difference inside Perplexity or Copilot.

Then I read the sources. This is the part people skip because it feels slow. It is also where the answer becomes explainable. If the French answer cites a weak directory, the fix may involve strengthening better French sources and making the official page clearer. If the English answer cites a vendor page that describes the firm too narrowly, the fix may involve improving the company’s own English evidence or clarifying the partner description where possible. If neither language cites anything useful, the problem is larger than translation.

In the Lyon integrator composite, the first recommendation would not be “write more bilingual content.” That is too blunt. I would first separate the French buyer prompts from English vendor-category prompts, score the descriptions, and inspect the cited pages. Only then would I decide whether the site needs clearer implementation language, stronger French case pages, better category statements, or source correction around old partner listings.

The awkward truth is that bilingual AI visibility work often starts with humility. We do not yet know which language carries the cleaner evidence. We do not know whether the better-looking answer is commercially better. We do not know whether a missing French mention is caused by the French site, the prompt set, the source pool or the engine’s routing. The ledger is there because guessing sounds clever for about ten minutes.

After that, someone asks for proof.

The Measurement Note — Signal: French and English prompts cite different evidence for the same business. Distortion: averaging both languages into one visibility score too early. Ledger: record prompt intent, exact wording, language, engine, cited source, answer position and description accuracy separately. Next Test: build five paired buyer prompts in French and English, run them in two engines, and compare sources before recommending any rewrite.