Skip to content

fix(udt): declare audio stream before avformat_write_header()#392

Merged
matteius merged 1 commit intoopensensor:mainfrom
origin2000:fix/udt-audio-stream-init
Apr 18, 2026
Merged

fix(udt): declare audio stream before avformat_write_header()#392
matteius merged 1 commit intoopensensor:mainfrom
origin2000:fix/udt-audio-stream-init

Conversation

@origin2000
Copy link
Copy Markdown
Contributor

Fixes #352 — Detection-triggered MP4 recordings contain no audio track despite audio being enabled and the transcoder initialising successfully.


Problem

When ONVIF motion detection triggers a recording via the Unified Detection Thread (UDT), the resulting MP4 file contains only a video track. The log shows the audio transcoder initialising correctly, which makes the root cause non-obvious:

[Detection] [c545d_wide] Audio recording enabled for detection recording
[Detection] [c545d_wide] Successfully initialized audio transcoder from PCM to AAC
[Detection] [c545d_wide] Sample rate: 8000, Channels: 1, Bit rate: 32000

ffprobe on the output file confirms the issue:

1 streams: ['video']   ← audio missing

Root cause

The MP4 container format requires all streams to be declared before avformat_write_header() is called. Once the header is written the stream table is sealed — any stream added afterwards is ignored by the muxer.

The UDT recording path violated this constraint through a lazy-init design:

udt_start_recording()
    mp4_writer_create()          ← writer allocated
    mp4_writer_set_audio(1)      ← has_audio = 1, stream_idx = -1

flush_prebuffer_to_recording()
    flush_packet_callback()
        first video keyframe →  mp4_writer_initialize()
                                    avformat_new_stream()  [video]
                                    avformat_write_header() ← HEADER SEALED (video only)

        first audio packet  →   transcode_audio_packet()  ← transcoder init OK
                                    stream_idx == -1
                                    → packet silently dropped ✗

Every subsequent audio packet — from the pre-buffer flush, live recording, and post-buffer — is transcoded correctly but then discarded because writer->audio.stream_idx is still -1.

This affects only the UDT path (mp4_writer_initialize). The scheduled recording path (record_segment) already handles audio registration correctly before writing the header and is unaffected.


Solution

Declare the audio stream eagerly — before avformat_write_header() — by splitting the work into two steps:

Step 1 — udt_start_recording(): call the stateless transcode_pcm_to_aac() immediately after the writer is created to obtain the AAC AVCodecParameters (including the MPEG-4 Audio Specific Config / extradata). Store the result in a new pending_audio_codecpar field on the writer.

Step 2 — mp4_writer_initialize(): in the video-keyframe branch, check for pending_audio_codecpar and register the audio stream via avformat_new_stream() before avio_open / avformat_write_header() are called. Consume and free the pending params immediately after.

udt_start_recording()
    mp4_writer_create()
    mp4_writer_set_audio(1)
    transcode_pcm_to_aac()       ← stateless, no side-effects
        → pending_audio_codecpar = AAC params + extradata

flush_prebuffer_to_recording()
    flush_packet_callback()
        first video keyframe →  mp4_writer_initialize()
                                    avformat_new_stream()  [video]
                                    avformat_new_stream()  [audio AAC]  ← NEW
                                    avformat_write_header() ← both streams sealed ✓

        first audio packet  →   transcode_audio_packet()
                                    stream_idx == 1
                                    → av_interleaved_write_frame() ✓

transcode_pcm_to_aac() is fully stateless (local alloc/free, no global audio_transcoders[] access), so calling it here does not interfere with the lazy init_audio_transcoder() that runs during packet processing.


Changes

File Change
include/video/mp4_writer.h Add AVCodecParameters *pending_audio_codecpar to struct mp4_writer
src/video/mp4_writer_core.c Initialise field to NULL in mp4_writer_create(); free in mp4_writer_close()
src/video/unified_detection_thread.c Call transcode_pcm_to_aac() eagerly in udt_start_recording()
src/video/mp4_writer_utils.c Register audio stream from pending_audio_codecpar before avformat_write_header() in mp4_writer_initialize()

No changes to record_segment, mp4_writer_thread, mp4_recording_core, or any other recording path.


Testing

Hardware: Tapo C545D Dual-Lens, go2rtc RTSP proxy, Docker.

ffprobe -v quiet -print_format json -show_streams <detection.mp4> | \
  python3 -c "import sys,json; s=json.load(sys.stdin)['streams']; \
  print(f\"{len(s)} streams: {[x['codec_type'] for x in s]}\")"

Before: 1 streams: ['video']
After: 2 streams: ['video', 'audio']

New log entries confirm correct sequencing:

[Detection] [c545d_wide] AAC codec params prepared for MP4 header (eager init)
[Detection] [c545d_wide] Audio stream (AAC) declared at index 1 before header write for c545d_wide

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes detection-triggered (UDT) MP4 recordings missing an audio track by ensuring the audio stream is declared before avformat_write_header() seals the MP4 header.

Changes:

  • Add pending_audio_codecpar to mp4_writer_t to stage audio codec parameters for header-time stream registration.
  • Eagerly prepare AAC codec parameters in udt_start_recording() so UDT can declare the audio stream before writing the MP4 header.
  • Consume pending_audio_codecpar in mp4_writer_initialize() to create the audio stream prior to avio_open/avformat_write_header(), and free it afterward.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/video/unified_detection_thread.c Precomputes audio codec parameters during UDT recording startup for use during MP4 initialization.
src/video/mp4_writer_utils.c Declares an audio stream from staged codec parameters before writing the MP4 header.
src/video/mp4_writer_core.c Initializes and frees the new staged codec-parameters field in writer lifecycle.
include/video/mp4_writer.h Extends mp4_writer_t with pending_audio_codecpar for UDT header-time audio registration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/video/unified_detection_thread.c Outdated
Comment thread src/video/mp4_writer_utils.c Outdated
@origin2000 origin2000 force-pushed the fix/udt-audio-stream-init branch 2 times, most recently from e080ea5 to 71de980 Compare April 18, 2026 20:24
@matteius matteius requested a review from Copilot April 18, 2026 20:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1134 to +1136
a_stream->codecpar->codec_tag = 0;
a_stream->time_base = (AVRational){1, writer->pending_audio_codecpar->sample_rate};
writer->audio.stream_idx = a_stream->index;
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a_stream->time_base is set using writer->pending_audio_codecpar->sample_rate without validating it. If sample_rate is 0/unknown (possible for some pass-through audio codecpar), this creates an invalid {1,0} time base and can make avformat_write_header() fail or produce broken timestamps. Please guard for sample_rate > 0 and fall back to a safe default (or store/carry the input audio time_base from UDT and use that).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to address this one @origin2000

Comment thread src/video/mp4_writer_utils.c
Comment thread include/video/mp4_writer.h Outdated
Comment on lines +67 to +71
// Pending AAC codec parameters set by udt_start_recording() before the first
// packet arrives. mp4_writer_initialize() consumes this to declare the audio
// stream BEFORE avformat_write_header() is called — the only legal window for
// adding streams to an MP4 container. NULL after consumption or on error.
AVCodecParameters *pending_audio_codecpar;
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field comment says "Pending AAC codec parameters", but udt_start_recording() can also populate pending_audio_codecpar with a deep-copy of already MP4-compatible non-AAC params (e.g., native AAC/MP3/etc.). Updating this comment to describe it as "pending audio codec parameters" (not necessarily AAC) will avoid misleading future changes.

Copilot uses AI. Check for mistakes.
Comment on lines +1904 to +1912
// Eagerly determine the output codec parameters so mp4_writer_initialize()
// can declare the audio stream BEFORE avformat_write_header() — the only
// legal window for adding streams to an MP4 container.
//
// The pending params must exactly match what mp4_writer_write_packet() will
// later mux:
// * PCM variants -> transcode to AAC (mp4_writer_write_packet transcodes)
// * MP4-compatible -> deep-copy as-is (packets pass through unchanged)
// * anything else -> disable audio (not supported)
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes a previously silent audio-drop condition, but there’s no automated regression test covering the UDT/detection MP4 path to ensure an audio stream is declared before avformat_write_header() and the output contains both video+audio streams. Since CI runs FFmpeg-linked C tests, consider adding a small regression test that exercises this UDT init path (or mp4_writer_initialize with pending_audio_codecpar) and asserts 2 streams (e.g., via writer->output_ctx->nb_streams or ffprobe).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, but not tested

@origin2000 origin2000 force-pushed the fix/udt-audio-stream-init branch from aa8cc7d to 4fc3d71 Compare April 18, 2026 21:30
Detection-triggered MP4 recordings had no audio track because the audio
stream was registered after avformat_write_header() — illegal for MP4
containers. Fixes opensensor#352.

transcode_pcm_to_aac() is now called eagerly in udt_start_recording()
and the resulting AVCodecParameters stored as pending_audio_codecpar.
mp4_writer_initialize() consumes this before writing the header.

Only the UDT path is affected; record_segment is unchanged.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
} else {
// Incompatible and not PCM: cannot mux or transcode.
is_audio_codec_compatible_with_mp4(ain->codecpar->codec_id, &codec_name);
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_audio_codec_compatible_with_mp4() is called again here only to populate codec_name, but it was already evaluated in the preceding else if condition (and sets codec_name as a side effect even when returning false). Consider removing this redundant call and just log using the existing codec_name value.

Suggested change
is_audio_codec_compatible_with_mp4(ain->codecpar->codec_id, &codec_name);

Copilot uses AI. Check for mistakes.
writer->is_rotating = 0; // Initialize rotation flag
writer->shutdown_component_id = -1; // Initialize to -1 to indicate not registered
writer->pending_audio_codecpar = NULL; // Populated by udt_start_recording(), consumed by mp4_writer_initialize()
writer->pending_audio_time_base = (AVRational){0, 1}; // Set alongside pending_audio_codecpar
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending_audio_time_base is initialized to {0, 1}, which looks “valid” to checks like den > 0 but represents an invalid 0 time base. Consider initializing to {0, 0} (FFmpeg’s common "unknown" rational) or another clearly-invalid sentinel, and ensure callers validate both num/den before using it.

Suggested change
writer->pending_audio_time_base = (AVRational){0, 1}; // Set alongside pending_audio_codecpar
writer->pending_audio_time_base = (AVRational){0, 0}; // Unknown/unset until provided alongside pending_audio_codecpar

Copilot uses AI. Check for mistakes.
@matteius matteius merged commit 16e315b into opensensor:main Apr 18, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Recordings based on motion detection have no audio.

3 participants