Skip to content

maintenance portlet | Implement fix assets and clean orphan assets polling APIs #35191#35232

Open
hassandotcms wants to merge 21 commits intomainfrom
35200-maintenance-fix-assets-clean-assets
Open

maintenance portlet | Implement fix assets and clean orphan assets polling APIs #35191#35232
hassandotcms wants to merge 21 commits intomainfrom
35200-maintenance-fix-assets-clean-assets

Conversation

@hassandotcms
Copy link
Copy Markdown
Member

Summary

Add 4 new REST endpoints to MaintenanceResource, replacing legacy DWR fix assets and clean assets operations with modern REST APIs for the Maintenance portlet Tools tab.

New endpoints:

  • POST /api/v1/maintenance/_fixAssets — Runs all registered FixTask classes that check for and fix database inconsistencies. Executes synchronously and returns task results when complete.
  • GET /api/v1/maintenance/_fixAssets — Polls current fix assets task results. Used by the UI to check progress while the process is running.
  • POST /api/v1/maintenance/_cleanAssets — Starts a background thread that walks the assets directory structure, checks each directory against the contentlet database, and deletes orphan binary folders. Returns 409 Conflict if already running.
  • GET /api/v1/maintenance/_cleanAssets — Polls clean assets background process status (totalFiles, currentFiles, deleted, running, status).

Implementation details:

  • CleanAssetsStatusView (@Value.Immutable) with typed fields instead of BeanUtils.describe() string map
  • 409 Conflict guard on POST /_cleanAssets prevents IllegalThreadStateException from calling .start() on an already-running thread
  • FixTasksExecutor results returned as List<Map> (raw type from executor, cannot be typed further)
  • ResponseEntityCleanAssetsStatusView typed wrapper for Swagger documentation

Test plan

  • Integration tests pass: ./mvnw verify -pl :dotcms-integration -Dcoreit.test.skip=false -Dit.test=MaintenanceResourceIntegrationTest
  • POST /_fixAssets as admin returns 200 with task results or null entity
  • GET /_fixAssets returns current task results
  • POST /_cleanAssets starts process and returns initial status with running: true
  • GET /_cleanAssets returns current status with totalFiles, currentFiles, deleted, running, status fields
  • POST /_cleanAssets while already running returns 409 Conflict
  • Non-admin users get 403 on all endpoints

Closes #35200, Closes #35202

…s, and delete pushed assets #35191

Add three new endpoints to MaintenanceResource:
- POST /_searchAndReplace: database-wide find/replace across content tables
- DELETE /_oldVersions: drop versionable objects older than a given date
- DELETE /_pushedAssets: clear all push publishing history

Includes Validated forms, @Value.Immutable response views, full Swagger
annotations, and SecurityLogger entries for all destructive operations.
…ions, and pushed assets #35191

Tests cover:
- _searchAndReplace: admin success, non-admin rejection, null/empty form validation,
  empty replaceString allowed
- _oldVersions: admin success, non-admin rejection, missing date, invalid date formats
- _pushedAssets: admin success, non-admin rejection
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 7, 2026

Claude finished @hassandotcms's task in 4m 21s —— View job


PR Review

  • Gather context
  • Read changed files
  • Analyze for issues
  • Post review

Issues Found

1. CleanAssetsJobProcessor leaks a ScheduledExecutorService on JobProcessingException

CleanAssetsJobProcessor.java:59-72 — The reporter scheduler is created before worker.run(), and cancelled in the finally block. But if the scheduler itself throws during scheduleAtFixedRate (unlikely, but possible), the reporter is never shut down. More importantly, reporter.shutdown() is called but not reporter.awaitTermination(...), so inflight progress callbacks can continue running against a completed/failed job's tracker after the job is done. This is a minor resource concern but real in high-concurrency environments.

2. run() guard in CleanAssetsThread can silently no-op

CleanAssetsThread.java:99run() is guarded by if(!processStatus.isRunning()). If processStatus.running is true when the job processor calls worker.run(), the entire operation silently does nothing but the job reports SUCCESS with deleted=0. This can happen if a previous run crashed between setRunning(true) (line 102) and setRunning(false) (line 169) without getInstance() creating a new instance in between. The job processor calls CleanAssetsThread.getInstance(true, true) which resets processStatus only when !instance.isAlive() — since run() is called directly (not via start()), isAlive() is always false, so a fresh processStatus is always created. This is currently safe, but the correctness depends on a subtle interaction with thread state semantics that would break if start() were ever called on the returned instance.

3. Incomplete double-checked locking in jobHelper()

MaintenanceResource.java (jobHelper() method) — The pattern:

MaintenanceJobHelper local = jobHelper;
if (local == null) {
    local = CDIUtils.getBean(...).orElseThrow(...);
    jobHelper = local;  // no inner null check
}

is missing the inner null check from the standard DCL pattern. Two threads can both see local == null, both call CDIUtils.getBean(), and both write to the volatile field. Since @ApplicationScoped returns the same CDI proxy both times, this is functionally safe — but the intent (single initialization) is not expressed correctly. Should be:

synchronized (this) {
    local = jobHelper;
    if (local == null) {
        jobHelper = local = CDIUtils.getBean(...).orElseThrow(...);
    }
}

Fix this →

4. test_requestFixAssetsJob_whileActive_throwsConflict is likely flaky

MaintenanceResourceIntegrationTest.java:243-250 — The test starts a fix-assets job then immediately tries to start another, expecting a ConflictException. FixTasksExecutor.execute() is CPU-bound and can complete extremely quickly in integration environments with few or no fix tasks to run. If the job transitions from PENDING → SUCCESS before the second getActiveJobs call executes, active.jobs().isEmpty() returns true and no conflict is raised. The test silently passes (no exception), which JUnit will then report as a test failure (expected ConflictException, got none). This is a real flakiness risk on fast CI hardware. A more reliable approach: start a clean-assets job (which is I/O-bound and slower) for the conflict test.

5. getLatestJob uses getJobs not getActiveJobs, so FAILED/ABANDONED state jobs are returned

MaintenanceJobHelper.java:73getJobs(queueName, 1, 1) returns the most recent job across all states including FAILED, ABANDONED_PERMANENTLY, and CANCELED. The comment says "active if one is pending or running, otherwise the most recent completed job" — but a failed/abandoned job is also returned. A client polling this endpoint after a crash would receive a FAILED job and might incorrectly think the system is in a normal done state. This needs explicit documentation or a different query strategy.

6. resultMetadata instance field is mutable and not thread-safe

CleanAssetsJobProcessor.java:42, FixAssetsJobProcessor.java:38 — Both processors initialize private Map<String, Object> resultMetadata = new HashMap<>(). The framework calls getResultMetadata() after process() on the same instance (confirmed by reading JobQueueManagerAPIImpl at lines 953, 1037). The @Dependent scope creates one instance per injection point. Since the framework holds the reference through the job lifecycle, this is safe. However, if process() throws before assigning this.resultMetadata, getResultMetadata() returns Collections.emptyMap() while the framework has already recorded a FAILED state — which is fine, just worth documenting the intentional emptiness.

7. FixTasksExecutor.execute(null)null JobExecutionContext is benign but undocumented

FixAssetsJobProcessor.java:54 — The parameter arg0 is declared but never referenced inside FixTasksExecutor.execute() (confirmed by reading the source). Passing null is safe today, but this is a fragile assumption. A one-line comment explaining this would help future readers avoid a refactoring pitfall. Fix this →

8. @FixMethodOrder(NAME_ASCENDING) creates implicit inter-test ordering

MaintenanceResourceIntegrationTest.java — Alphabetical sort means clean-assets conflict test (test_requestCleanAssetsJob_whileActive_throwsConflict) runs before the basic enqueue-and-complete test. Since both share the same queue, a job lingering from one test (e.g., still in PENDING before the worker picks it up) will cause the next test's POST to get a spurious 409. The awaitJobCompletion in the finally blocks mitigates this, but tests should ideally be independent. At minimum, the comment should flag this ordering dependency.


Not Issues

  • The cluster lock tryClusterLock → checkAndEnqueue pattern correctly handles the three cases: lock acquired + no active job (enqueue), lock acquired + active job (409), lock not acquired (409). The ConflictException propagates correctly as a DotRuntimeException subclass.
  • getActiveJobs correctly includes PENDING state (confirmed in PostgresJobQueue.java:338), so a newly created but not-yet-started job will block a second request.
  • Using CleanAssetsThread.run() directly (not .start()) to execute synchronously on the job queue worker thread is intentional and correct. The thread's NEW state means isAlive() == false, so getInstance(true, true) always resets processStatus.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 7, 2026

Rollback safety analysis complete. Safe To Rollback. Label applied.

@hassandotcms hassandotcms changed the base branch from 35199-maintenance-search-replace-old-versions-pushed-assets to main April 7, 2026 13:09
@github-actions github-actions Bot added the Area : Backend PR changes Java/Maven backend code label Apr 7, 2026
@hassandotcms hassandotcms force-pushed the 35200-maintenance-fix-assets-clean-assets branch from 642aeae to cf2de86 Compare April 7, 2026 14:53
…ts with polling #35191

Add four new endpoints to MaintenanceResource:
- POST /_fixAssets: run all FixTask classes to fix DB inconsistencies
- GET /_fixAssets: poll fix assets task results
- POST /_cleanAssets: start background thread to clean orphan asset dirs
- GET /_cleanAssets: poll clean assets progress status

POST /_cleanAssets returns 409 Conflict if already running.
Includes @Value.Immutable CleanAssetsStatusView, full Swagger annotations,
and integration tests.
@hassandotcms hassandotcms force-pushed the 35200-maintenance-fix-assets-clean-assets branch from cf2de86 to 054bbe1 Compare April 8, 2026 13:06
…s, and delete pushed assets #35191

Add three new endpoints to MaintenanceResource:
- POST /_searchAndReplace: database-wide find/replace across content tables
- DELETE /_oldVersions: drop versionable objects older than a given date
- DELETE /_pushedAssets: clear all push publishing history

Includes Validated forms, @Value.Immutable response views, full Swagger
annotations, and SecurityLogger entries for all destructive operations.
…ions, and pushed assets #35191

Tests cover:
- _searchAndReplace: admin success, non-admin rejection, null/empty form validation,
  empty replaceString allowed
- _oldVersions: admin success, non-admin rejection, missing date, invalid date formats
- _pushedAssets: admin success, non-admin rejection
…tenance-search-replace-old-versions-pushed-assets
…sets' into 35200-maintenance-fix-assets-clean-assets
…assets endpoints #35200

Resolves PR #35232 review feedback:

- CleanAssetsThread: volatile fields for cross-thread visibility; synchronized
  getInstance(); new atomic startCleanProcess() that claims running state before
  super.start() so callers can't observe running=false immediately after POST.
- MaintenanceResource.startCleanAssets: uses startCleanProcess() to eliminate the
  TOCTOU race that could double-start the thread and throw IllegalThreadStateException.
- MaintenanceResource.startFixAssets: AtomicBoolean guard returns 409 if already
  in flight; defensive copy of FixTasksExecutor's singleton result list prevents
  ConcurrentModificationException when a GET overlaps the POST.
- MaintenanceResource: SecurityLogger audit entries on both POST endpoints, matching
  the existing pattern used by searchAndReplace/dropOldVersions/deletePushedAssets.
- getFixAssetsProgress: clarified javadoc/Swagger that POST is synchronous and GET
  returns last-run results (page-reload scenarios), not live progress.
- Integration tests: assertTrue(status.running()) on cleanAssets start;
  concurrent-start 409 tests for both cleanAssets and fixAssets.
description = "Conflict - fix assets process is already running",
content = @Content(mediaType = "application/json"))
})
@POST
Copy link
Copy Markdown
Member

@fabrizzio-dotCMS fabrizzio-dotCMS Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about naming them like :
maintenance/assets/fix
maintenance/assets/clean

or

maintenance/assets/_fix
maintenance/assets/_clean

To please some folks

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maintenance/assets/_fix
maintenance/assets/_clean

seems better, consistent with other apis in this file.


final User user = assertBackendUser(request, response).getUser();

if (!FIX_ASSETS_RUNNING.compareAndSet(false, true)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this does not guarantee that the process might have been started on a different node
Perhaps you should use our Redis cache to make a lock
We also have a LockManager, which implements an identifier lock based

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, valid point. fix in progress

* @return
*/
public static CleanAssetsThread getInstance ( Boolean restartIfDied, boolean processBinary ) {
public static synchronized CleanAssetsThread getInstance ( Boolean restartIfDied, boolean processBinary ) {
Copy link
Copy Markdown
Member

@fabrizzio-dotCMS fabrizzio-dotCMS Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent opportunity to use our JobQueue rather than Threads that can be executed on different nodes simultaneously. Look at JobQueueManagerAPI
it works like a job queue managed by a single node. You can register a specialized JobProcessor to read from our job queue and handle an arbitrary task. Our Content Import API Works on top of that. It is intended to replace Quartz

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a look at com/dotcms/rest/api/v1/content/dotimport/ContentImportResource.java

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorporating it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. using JobQueue now.

…erAPI #35200

Addresses review feedback from @fabrizzio-dotCMS on PR #35232:

- Endpoint paths renamed from _fixAssets/_cleanAssets to assets/_fix
  and assets/_clean (nested-resource convention).
- JVM-local AtomicBoolean replaced with cluster-aware ClusterLockManager
  (ShedLock/JDBC) around the check-and-create so two nodes cannot both
  pass the active-job guard in the same millisecond.
- Fix-assets and clean-assets REST execution migrated to
  JobQueueManagerAPI via new @Queue processors; legacy CleanAssetsThread
  and FixTasksExecutor classes are untouched, so DWR/Quartz callers
  continue to work unchanged.

New classes:
  MaintenanceJobHelper        — creates jobs, enforces 409 semantics
  FixAssetsJobProcessor       — runs FixTasksExecutor on the job worker
  CleanAssetsJobProcessor     — runs CleanAssetsThread.run() on the job
                                worker, bridges BasicProcessStatus to
                                the progress tracker

Removed (unused after the migration):
  AbstractCleanAssetsStatusView
  ResponseEntityCleanAssetsStatusView
  CleanAssetsThread.startCleanProcess() (dead helper)

getLatestJob() switched from getCompletedJobs (ordered completed_at ASC
— returned the oldest job) to getJobs (ordered created_at DESC — returns
the most recent regardless of state).

Integration tests reshaped to the job-queue shape: POST returns a jobId,
tests poll JobQueueManagerAPI until the job reaches a terminal state,
and assert the persisted result metadata. Added an end-to-end test that
plants a UUID-named orphan directory under the assets root and verifies
clean-assets deletes it. Test method docs follow the project's
"Given scenario / Expected result" convention.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[TASK] Implement clean assets endpoints [TASK] Implement fix assets endpoints

2 participants