Skip to content

rework concurrent upload#610

Open
varmar05 wants to merge 6 commits intodevelopfrom
rework_concurrent_upload
Open

rework concurrent upload#610
varmar05 wants to merge 6 commits intodevelopfrom
rework_concurrent_upload

Conversation

@varmar05
Copy link
Copy Markdown
Collaborator

@varmar05 varmar05 commented Apr 14, 2026

This PR contains two fixes / improvements

  • Replace lockfile with upload last_ping db column
  • Rework concurrent upload using upsert strategy

This is a refactoring that replaces the filesystem-based Toucher/lockfile mechanism with a database-based heartbeat approach for tracking active uploads.
This eliminates the NFS-specific hack (os.access() + os.utime()) that was needed to bust NFS attribute caches in the old approach. The new approach is cleaner and more cloud-native.
Replace the old try/except IntegrityError + cleanup loop pattern with an atomic upsert in Upload.create_upload().
Decouple the upload directory name from the DB primary key via transaction_id. Upload directory is created at this stage, no need to take care for it later.

With the upsert logic, the ObjectDeletedError scenario which happened because a concurrent request could delete a stale upload row mid-operation is eliminated:
  - During push_finish, the heartbeat context manager continuously updates last_ping, keeping the upload fresh
  throughout the operation
  - A concurrent request can only take over an upload whose last_ping has expired
  - Since heartbeat prevents expiry, no other request can claim the row while push_finish is running
  - The upload object therefore stays valid for the full lifetime of the request — ObjectDeletedError becomes
  impossible
@varmar05 varmar05 requested a review from MarcelGeo April 14, 2026 11:14
@coveralls
Copy link
Copy Markdown

coveralls commented Apr 14, 2026

Coverage Report for CI Build 24565333868

Coverage decreased (-0.2%) to 93.085%

Details

  • Coverage decreased (-0.2%) from the base build.
  • Patch coverage: 23 uncovered changes across 3 files (162 of 185 lines covered, 87.57%).
  • 7 coverage regressions across 2 files.

Uncovered Changes

File Changed Covered %
server/mergin/sync/public_api_v2_controller.py 32 22 68.75%
server/mergin/sync/models.py 57 49 85.96%
server/mergin/sync/public_api_controller.py 32 27 84.38%

Coverage Regressions

7 previously-covered lines in 2 files lost coverage.

File Lines Losing Coverage Coverage
server/mergin/sync/models.py 5 92.77%
server/mergin/sync/public_api_controller.py 2 92.44%

Coverage Stats

Coverage Status
Relevant Lines: 9791
Covered Lines: 9114
Line Coverage: 93.09%
Coverage Strength: 0.93 hits per line

💛 - Coveralls

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces filesystem lockfile-based upload coordination with database-driven liveness tracking (upload.last_ping) and introduces a stable external upload identifier (upload.transaction_id), enabling concurrent upload handling via an atomic upsert/takeover strategy.

Changes:

  • Add last_ping and transaction_id columns to upload (with backfills and constraints).
  • Rework upload session creation to use an upsert-based “stale takeover” approach and heartbeat updates instead of lockfiles.
  • Update API/controllers/schemas/tests to use transaction_id in routes and responses.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
server/migrations/community/e3a7f2b1c94d_add_upload_last_ping.py Adds upload.last_ping and backfills to support DB-based liveness detection.
server/migrations/community/f1d9e4a7b823_add_upload_transaction_id.py Adds upload.transaction_id, backfills from id, enforces NOT NULL + unique index for stable external transaction IDs.
server/mergin/sync/models.py Implements Upload.create_upload() upsert/takeover strategy, upload_dir based on transaction_id, and heartbeat thread updating last_ping.
server/mergin/sync/public_api_controller.py Switches v1 push flow to Upload.create_upload(), replaces lockfile toucher with upload.heartbeat(), and returns/uses transaction_id.
server/mergin/sync/public_api_v2_controller.py Switches v2 create version flow to Upload.create_upload() and uses upload.heartbeat() during heavy work.
server/mergin/sync/permissions.py Replaces get_upload with get_upload_or_fail() lookup by transaction_id.
server/mergin/sync/schemas.py Updates serialized “uploads” lists to expose transaction_id rather than internal id.
server/mergin/sync/utils.py Removes the old lockfile Toucher helper.
server/mergin/sync/public_api.yaml Updates push_finish description to remove lockfile mention.
server/mergin/tests/test_project_controller.py Updates tests to use transaction_id everywhere and adds coverage for stale upload takeover + cleanup.
server/mergin/tests/test_public_api_v2.py Updates concurrency-related test setup to use Upload.create_upload().
server/mergin/tests/test_db_hooks.py Updates upload creation in tests to use Upload.create_upload().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread server/mergin/sync/models.py Outdated
Comment thread server/mergin/sync/models.py Outdated
Comment thread server/mergin/sync/models.py
Comment thread server/mergin/sync/models.py Outdated
Copy link
Copy Markdown
Collaborator

@MarcelGeo MarcelGeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try to run some concurrent push test If this will really help and do not move race conditions to different segments.

Comment thread server/mergin/sync/models.py Outdated
Comment thread server/mergin/sync/models.py Outdated
Comment thread server/mergin/sync/public_api_v2_controller.py
Comment thread server/mergin/sync/public_api_v2_controller.py
- guard against transaction folder errors
- make last_ping tz naive
- set transaction id to uuid type
- remove redundant is_active method
… versions in DB

If os.rename failed for moving uploaded data we would end up in broken project with project version in DB but no actual data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants