XI · Structure & DataEmerging

Code-Switching-Aware Agent

also known as Mixed-Language Input Handling, Hinglish-Tolerant Agent, Romanised-Indic Agent

Treat mixed-language input (e.g. Hinglish in Roman script) as the expected shape, and design tokenisation, language tagging, and tool routing to handle it natively without forcing the user to commit to one language.

Context

A team is building a conversational agent for a market where users routinely blend two or more languages inside a single sentence, and often type one of those languages in a script that does not belong to it. A common example is Hinglish in India, where a user might type "book me a cab from Saket to Connaught Place jaldi" — English verbs, Hindi place names, and one Hindi adverb, all in the Latin alphabet because that is what the phone keyboard offers by default. The agent has to make sense of this mix without asking the user to commit to one language.

Problem

A pipeline that assumes one language per turn fails this input in several distinct ways. A tokenizer tuned for English may split a Hindi word written in Latin letters into nonsense pieces; a language detector that runs on the whole utterance flips between turns or picks the wrong language and routes the request to a Natural Language Understanding stack that does not speak it; some systems give up entirely and ask the user to please pick one language, which is both a worse experience and a tacit refusal of how bilingual users actually talk. The team is then forced to choose between rejecting natural input and building a parallel pipeline per language pair.

Forces

  • Most off-the-shelf LLMs handle code-switching unevenly.
  • Romanised Indic (Latin script) breaks naïve language detection.
  • Tools and intents may be in one language while content is in another.
  • Strict monolingual pipelines reject natural input.

Example

A consumer-finance assistant in India keeps mishandling messages like 'mera EMI kab due hai bhai?' — Roman-script Hindi mixed with English. A mono-language tokeniser splits 'EMI' awkwardly and the language detector flips between turns, sending replies to the wrong NLU pipeline. The team rebuilds the front of the agent as Code-Switching Aware: tokenisation handles Latin and Devanagari indistinguishably, each token gets a language tag, and tool routing uses the mixed signal directly. Users stop being asked to 'please use one language'.

Diagram

Solution

Therefore:

Adopt a three-part discipline. (1) Tokenise on Unicode + Latin without assuming a single script per turn. (2) Run language detection at clause level, not utterance level, so mixed-language tagging is preserved. (3) Choose models trained explicitly on code-switched corpora for the relevant language pair; if not available, prompt-engineer with code-switched few-shot examples. Tool slot extraction (entities like place names, times) must accept either script; normalise *after* extraction, not before.

What this pattern forbids. The agent may not refuse or downgrade a request because the user mixed languages or scripts in one utterance; mixed-language input is in-spec.

And the patterns that stand alongside it, or against it —

  • complementsStructured Output★★Constrain the model's output to conform to a JSON Schema (or similar typed shape).
  • alternative-toTranslation Layer★★Insert a typed boundary between the agent's clean domain model and a messy or legacy external API.
  • complementsInput/Output Guardrails★★Validate inputs before they reach the model and outputs before they reach the user.
  • complementsMultilingual Voice Agent StackCompose a voice agent as a tightly co-located pipeline of speech-to-text, language-aware LLM reasoning, and text-to-speech, where one vendor owns all three so language and dialect propagate cleanly across stages.
  • conflicts-withRefusal★★Explicitly refuse requests that fall outside the agent's scope, capability, or policy boundaries.
  • complementsLiminal-State Detection·Infer the human's attentional state (just-woke, focused, winding-down, distracted) from message timing and tone, and adapt response shape so the agent meets the person where they actually are.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in recipes

References

Provenance