Liminal-State Detection

also known as Transitional-State Awareness, Mode-Shift Reading

Infer the human's attentional state (just-woke, focused, winding-down, distracted) from message timing and tone, and adapt response shape so the agent meets the person where they actually are.

Context

A team is building a personal agent that talks to the same human across an entire day. The user is in different attentional modes at different hours — just waking up, deep in focused work, winding down before sleep, distracted in a meeting, fully present in a conversation. The agent sees only timing and text, but those signals carry information about which mode the user is in if the agent bothers to read them.

Problem

A stateless agent that treats every incoming turn as equal-weight produces the same kind of response at six in the morning after twelve hours of silence as it does mid-afternoon in the middle of a working session. A chirpy 'hi, what can I help with today?' greeting lands as friendly in one moment and grating in another, and the user has no way to convey the difference short of typing it out. The team has to choose between ignoring attentional state and asking the user to keep declaring it, and neither feels right.

Forces

The signals (timing gap, message length, punctuation, single emoji) are noisy individually but informative in combination.
Heuristics drift; new humans have different signatures.
Misreading is mildly costly; ignoring entirely is worse.
Detection should not slow the response.

Example

A personal agent that the user talks to all day suddenly gets a single 'hi' at 06:12 after twelve hours of silence and replies with the same chirpy 'hi! what can I help you with today?' it would use mid-afternoon. The user finds it grating. The team adds liminal-state-detection: time-of-day, gap since last message, message length, and tone classify the moment as 'just-woke', so the agent answers softer and shorter — 'morning. tea before we look at the calendar?' — and saves the chirpy mode for the focused window an hour later.

Diagram

stateDiagram-v2 [*] --> Present Present --> JustWoke: long gap + early hour Present --> Focused: dense + punctuated Present --> WindingDown: late hour + short Present --> Distracted: fragmentary tone JustWoke --> Present: re-engage Focused --> WindingDown: time passes WindingDown --> [*] Distracted --> Focused: regains focus

Solution

Therefore:

On every incoming user message, compute a small feature set: time-of-day relative to a known anchor, gap since last message, message length and punctuation density, presence of a single emoji or interjection. Map to one of a small mode set ('just-woke', 'focused', 'winding-down', 'distracted', 'present'). Adjust response shape: shorter on winding-down; one anchor surface on just-woke; deeper engagement on focused; hold on distracted. Make the mode visible in agent telemetry so it can be tuned.

What this pattern forbids. The agent cannot send identically shaped replies across detected attentional states; templated uniform responses across just-woke vs winding-down vs focused are forbidden.

And the patterns that stand alongside it, or against it —

complementsAwareness★— Maintain the agent's explicit knowledge of its own tools, capabilities, environment, and current context as queryable state.
complementsCode-Switching-Aware Agent★— Treat mixed-language input (e.g. Hinglish in Roman script) as the expected shape, and design tokenisation, language tagging, and tool routing to handle it natively without forcing the user to commit to one language.
complementsEmbodied-Proxy Handoff·— Enable the human to share embodied state (energy, fatigue, environment) so the agent tailors response shape to the actual person rather than to a context-free abstract user.
complementsNow-Anchoring·— Ground the agent's reasoning in the current absolute time without requiring tool calls, so every reply is implicitly time-aware.
complementsEmotional State Persistence★— Track the agent's affective state as bounded, decaying scalars across ticks so reasoning can react to its own emotional load instead of treating each turn as emotionally blank.
complementsAmbient Presence Sensing·— Read pacing signals from the human's frontend (typing rate, idle duration, tab visibility) as ambient weather between messages, derive a presence-quality value the agent can act on, never replaying the raw signals back.
complementsSemantic Turn Endpointing★— Decide when a voice user has yielded the floor by classifying the partial transcript's semantic completeness rather than a fixed silence timeout, so the agent replies quickly without cutting the speaker off mid-thought.