Skip to content

fix(amd): make detection thresholds configurable and fix short-greeting voicemail misclassification#5490

Open
octo-patch wants to merge 1 commit intolivekit:mainfrom
octo-patch:fix/issue-5477-amd-configurable-thresholds
Open

fix(amd): make detection thresholds configurable and fix short-greeting voicemail misclassification#5490
octo-patch wants to merge 1 commit intolivekit:mainfrom
octo-patch:fix/issue-5477-amd-configurable-thresholds

Conversation

@octo-patch
Copy link
Copy Markdown
Contributor

Fixes #5477

Problem

The AMD classifier's short-greeting fast path (on_user_speech_ended) emitted a HUMAN verdict unconditionally whenever speech duration was <= HUMAN_SPEECH_THRESHOLD (2.5 s) followed by >= HUMAN_SILENCE_THRESHOLD (0.5 s) of silence -- even when transcript text had already been delivered via push_text() (meaning an LLM classification was already in flight).

This caused voicemail greetings that paused mid-sentence (e.g. ~2.33 s speech / 528 ms silence) to be misclassified as HUMAN regardless of what the transcript said.

Solution

Two changes:

1. Make detection thresholds configurable

HUMAN_SPEECH_THRESHOLD, HUMAN_SILENCE_THRESHOLD, and MACHINE_SILENCE_THRESHOLD are now keyword arguments on both _AMDClassifier and the public AMD class, so callers can tune detection behaviour without patching module-level constants.

2. Fix the short-greeting fast path

When _classify_task is already running at the time on_user_speech_ended is called (indicating that transcript text has arrived), the fast-path HUMAN verdict is skipped. Instead the code falls through to the LLM path using machine_silence_threshold, so the LLM can classify the greeting from the available transcript.

Testing

The fix is consistent with the existing flow: push_text() already creates _classify_task, so detecting _classify_task is not None reliably indicates that transcript evidence is available. No new runtime dependencies are introduced.

…il misclassification

Fixes livekit#5477

Two related changes in the AMD classifier:

1. Expose HUMAN_SPEECH_THRESHOLD, HUMAN_SILENCE_THRESHOLD, and
   MACHINE_SILENCE_THRESHOLD as keyword arguments on both _AMDClassifier
   and the public AMD class.

2. Fix the short-greeting fast path: when speech ends within
   human_speech_threshold, the classifier previously emitted HUMAN
   unconditionally after a brief silence, ignoring transcript text that had
   already arrived via push_text(). When _classify_task is already running,
   the code now falls through to the LLM path so short voicemail greetings
   (e.g. paused mid-sentence at 2.3 s / 528 ms silence) are no longer
   misclassified as HUMAN.
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


octo-patch seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

@chenghao-mou chenghao-mou self-assigned this Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AMD short-greeting heuristic classifies voicemail as HUMAN without invoking LLM

3 participants