Skip to content

feat: NFT metadata state machine with retry logic#74

Merged
tac0turtle merged 5 commits intomainfrom
pthmas/fix-nft-image
Apr 24, 2026
Merged

feat: NFT metadata state machine with retry logic#74
tac0turtle merged 5 commits intomainfrom
pthmas/fix-nft-image

Conversation

@pthmas
Copy link
Copy Markdown
Collaborator

@pthmas pthmas commented Apr 23, 2026

Summary

Replace boolean metadata_fetched with a proper state machine and retry logic for NFT metadata fetching.

Changes

  • Migration: Add metadata_status (pending/fetched/retryable_error/permanent_error), metadata_retry_count, next_retry_at, last_metadata_error, last_metadata_attempted_at, metadata_updated_at to nft_tokens; drop metadata_fetched
  • New nft_metadata module: URI resolution (IPFS, Arweave, data: URIs), metadata extraction, HTTP fetch with error classification (retryable vs permanent), SSRF protection, direct-image detection
  • Retry queue: Metadata fetcher now picks up pending tokens and retryable_error tokens where next_retry_at <= NOW(), ordered by queue time then transfer block
  • Configurable: METADATA_FETCH_WORKERS and METADATA_RETRY_ATTEMPTS env vars added to docker-compose
  • Frontend: New nftMetadata.ts helpers (getNftImageUrl, isNftMetadataPending, isNftMetadataUnavailable, etc.); NFT token page polls every 10s while metadata pending; shows distinct "unavailable" state on permanent error; updated types to drop metadata_fetched

Summary by CodeRabbit

  • New Features

    • Richer NFT metadata state tracking with scheduled retries, error/timestamp visibility
    • Real-time metadata refresh polling on token detail pages
    • Improved image resolution with multiple fallbacks and direct image/data URI support
  • Bug Fixes

    • More robust error classification and retry handling for metadata fetches
    • Hardened metadata fetching against unsafe URLs, redirects and oversized/invalid payloads
  • Refactor

    • Metadata fetching moved to a status-driven background pipeline (no on-demand fetch on read)
  • Tests

    • New unit and integration tests covering metadata extraction, retries and client behavior
  • Chores

    • Runtime config added for metadata workers and retry attempts

Replace boolean metadata_fetched with a proper state machine
(pending/fetched/retryable_error/permanent_error) and exponential
backoff retry. Extract metadata parsing into nft_metadata module
with SSRF protection, data URI support, and direct-image detection.
Frontend polls for pending metadata and shows unavailable state.
@pthmas pthmas self-assigned this Apr 23, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

Replaces boolean metadata tracking with a state machine (status, retry scheduling, errors, timestamps); introduces a new nft_metadata module for safe fetching/parsing (ipfs/ar/data:/HTTP, SSRF/DNS checks, size/type limits, retry/backoff); updates indexer, worker, DB migration, API projections, tests, frontend types, utilities, and runtime config to use the new model.

Changes

Cohort / File(s) Summary
DB Migration & Schema
backend/migrations/20260422000003_nft_metadata_state.sql
Adds status/retry/error/timestamp columns and CHECK constraint; initializes rows; creates filtered index for pending/retryable work; drops metadata_fetched.
Backend Types
backend/crates/atlas-common/src/types.rs
NftToken struct: removed metadata_fetched; added metadata_status, metadata_retry_count, next_retry_at, last_metadata_error, last_metadata_attempted_at, metadata_updated_at.
Indexer Upsert
backend/crates/atlas-server/src/indexer/indexer.rs
Batch upsert adjusted to insert/init metadata_status, metadata_retry_count, next_retry_at (replacing metadata_fetched) and preserve conflict update semantics.
Metadata Worker & Pipeline
backend/crates/atlas-server/src/indexer/metadata.rs
Worker selects by metadata_status and next_retry_at, records attempts/errors, schedules retries, delegates fetch/parsing to nft_metadata, and persists status/retry/error/timestamps instead of boolean flag.
New Metadata Module
backend/crates/atlas-server/src/nft_metadata.rs, backend/crates/atlas-server/src/lib.rs, backend/crates/atlas-server/src/main.rs
New public module for URI normalization (ipfs/ar), data: handling, JSON/image extraction, SSRF-safe DNS resolver, HTTP fetch with payload/type limits, retry decision logic, and related types/constants.
API Handlers
backend/crates/atlas-server/src/api/handlers/nfts.rs, backend/crates/atlas-server/src/api/handlers/addresses.rs
SQL projections updated to include new metadata state fields; removed on-demand metadata fetch/update from get_token (now returns DB state).
Server Dependencies & Config
backend/crates/atlas-server/Cargo.toml, docker-compose.yml
Adds base64 and percent-encoding crates; adds env vars METADATA_FETCH_WORKERS and METADATA_RETRY_ATTEMPTS with defaults.
Indexer HTTP client change
backend/crates/atlas-server/src/indexer/metadata.rs (client builder refactor)
Centralized build_metadata_client() with timeout, UA, redirects disabled, SSRF-safe resolver; worker uses that client.
Integration Tests
backend/crates/atlas-server/tests/integration/nfts.rs
Seed/test updates to set new metadata fields; adds test verifying API returns metadata_status, metadata_retry_count, image_url, and nested metadata.
Frontend Types & Utils
frontend/src/types/index.ts, frontend/src/utils/nftMetadata.ts, frontend/src/utils/nftMetadata.test.ts, frontend/src/utils/index.ts
Types updated to include metadata state fields and flexible metadata shapes; added helpers for image/description/attributes and predicates for pending/unavailable; tests added and re-exported.
Frontend Pages
frontend/src/pages/NFTTokenPage.tsx, frontend/src/pages/NFTContractPage.tsx, frontend/src/pages/AddressPage.tsx
Pages now use getNftImageUrl utility; NFT token page polls (refetch) while metadata is pending and shows explicit “Metadata unavailable” fallback; imports adjusted.

Sequence Diagram(s)

sequenceDiagram
    actor Scheduler
    participant Worker as MetadataWorker
    participant DB as Database
    participant Module as nft_metadata
    participant HTTP as HTTP_Client
    participant DNS as SsrfSafeResolver

    Scheduler->>Worker: request_batch()
    Worker->>DB: SELECT tokens WHERE metadata_status IN ('pending','retryable_error') AND next_retry_at <= NOW()
    DB-->>Worker: token_batch

    loop per token
        Worker->>Module: fetch_metadata(client, token_uri)
        Module->>Module: normalize_uri (ipfs/ar/data/http)
        alt direct image (data: or image content-type)
            Module-->>Worker: DirectImage(url_or_data)
            Worker->>DB: UPDATE metadata_status='fetched', metadata_updated_at=NOW(), metadata_retry_count=0
        else JSON metadata
            Module->>HTTP: GET (if HTTP)
            HTTP-->>Module: payload
            Module->>Module: parse & extract (name, image)
            Module-->>Worker: Json(metadata, image_url)
            Worker->>DB: UPDATE metadata, image_url, metadata_status='fetched', metadata_updated_at=NOW()
        else retryable error
            Module-->>Worker: FetchError(Retryable, code)
            Worker->>Module: schedule_retry(retry_count, max_attempts)
            Module-->>Worker: next_retry_at or PermanentDecision
            Worker->>DB: UPDATE metadata_status='retryable_error', next_retry_at=?, last_metadata_error=?, metadata_retry_count=metadata_retry_count+1
        else permanent error
            Module-->>Worker: FetchError(Permanent, code)
            Worker->>DB: UPDATE metadata_status='permanent_error', last_metadata_error=?, last_metadata_attempted_at=NOW()
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • tac0turtle

Poem

🐰 I hopped through URIs and threads so spry,

ipfs and arweave, data tucked by—
Statuses, retries, timestamps in tow,
Safe DNS checks so fetchers can go.
Polls hum softly; image pixels show. 🎋

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: NFT metadata state machine with retry logic' directly and comprehensively describes the main feature being introduced—the replacement of a simple boolean flag with a state machine for NFT metadata processing and the addition of retry logic.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pthmas/fix-nft-image

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
backend/crates/atlas-server/Cargo.toml (1)

44-45: Consider hoisting to workspace deps.

Per the workspace learning ("Keep all Rust workspace dependency versions in the root Cargo.toml"), base64 and percent-encoding would ideally be declared in the root workspace Cargo.toml and referenced here via { workspace = true }, consistent with the other workspace-managed deps above. Note the existing indexer-specific deps (num-bigint, async-channel, etc.) also deviate from this pattern, so this is optional depending on your convention for crate-local deps.

Based on learnings: "Keep all Rust workspace dependency versions in the root Cargo.toml".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/Cargo.toml` around lines 44 - 45, The crate-level
Cargo.toml lists base64 and percent-encoding as direct versioned deps; to follow
the workspace convention, add those dependencies with their versions to the root
workspace Cargo.toml and replace the entries in this crate's Cargo.toml with
workspace pointers (e.g. base64 = { workspace = true } and percent-encoding = {
workspace = true }) or remove them here if already inherited; also review other
indexer-specific deps like num-bigint and async-channel for consistency and
hoist them if you want them managed at the workspace level.
frontend/src/types/index.ts (1)

72-72: Prefer a finite union for metadata_status.

The database constraint enforces exactly four status values ('pending', 'fetched', 'retryable_error', 'permanent_error'), but the frontend type is currently string. Tightening this to a union will catch typos in helpers and tests at compile time.

♻️ Proposed type tightening
+export type NftMetadataStatus =
+  | 'pending'
+  | 'fetched'
+  | 'retryable_error'
+  | 'permanent_error';
+
 export interface NftToken {
   contract_address: string;
   token_id: string;
   owner: string;
   token_uri: string | null;
-  metadata_status: string;
+  metadata_status: NftMetadataStatus;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/types/index.ts` at line 72, Replace the loose string type for
metadata_status with a finite literal union of the four allowed DB values so
TypeScript can catch typos; change the declaration (symbol: metadata_status in
frontend/src/types/index.ts) to use a union 'pending' | 'fetched' |
'retryable_error' | 'permanent_error' (or extract to a named type alias like
MetadataStatus and use that) and update any refs (helpers, tests, functions
expecting metadata_status) to use the new union type.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/crates/atlas-server/src/indexer/metadata.rs`:
- Around line 210-220: The SELECT in metadata.rs that builds the tokens vec (the
query bound to NFT_METADATA_PENDING and NFT_METADATA_RETRYABLE_ERROR in the
token fetch logic) currently orders "ORDER BY next_retry_at ASC NULLS FIRST"
which causes NULL (pending) rows to always sort before due retryable rows and
can starve retries; change the ORDER BY to place NULL next_retry_at values last
(e.g., "ORDER BY next_retry_at ASC NULLS LAST, last_transfer_block DESC") so
retryable_error rows with a non-null next_retry_at that is due are prioritized
ahead of pending rows, keeping the rest of the query (bindings to
NFT_METADATA_PENDING, NFT_METADATA_RETRYABLE_ERROR and the LIMIT using
self.config.metadata_fetch_workers) unchanged.

In `@backend/crates/atlas-server/src/nft_metadata.rs`:
- Around line 89-90: The metadata handling lacks size limits: add a strict
max-size guard when reading HTTP responses and before decoding data: URIs to
prevent OOM. In the HTTP path replace response.bytes() with a bounded reader
(e.g., reqwest::Body::bytes_stream with manual cumulative size checks or wrap
the response body with http_body_util::Limited) and return an error if the total
exceeds the configured MAX_METADATA_BYTES; likewise, inside
parse_data_json_uri() validate the payload length (after extracting the
base64/percent segment but before base64/percent decoding) against the same
MAX_METADATA_BYTES and reject oversized inputs to avoid unbounded allocations.
Ensure errors surface from the calling function (the code that currently calls
parse_data_json_uri and uses response) so oversized payloads are handled
uniformly.
- Around line 99-106: The SSRF gap exists because validate_resolved_host(&url)
and client.get(&url).send() use different DNS resolutions; implement a custom
DNS resolver that enforces your is_non_public_ip() checks and plug it into the
reqwest client via ClientBuilder::dns_resolver() so reqwest uses the same vetted
IP. Create a struct (e.g., VettedResolver) that implements reqwest::dns::Resolve
and, inside its resolve method, perform the lookup (use tokio::net::lookup_host
or equivalent), run each candidate through is_non_public_ip(), return only
vetted SocketAddr(s) or an error if none pass, then build the client with
ClientBuilder::new().dns_resolver(Arc::new(VettedResolver::new())).finally
replace the existing client usage where validate_resolved_host(&url) is called
followed by client.get(&url).send() to ensure both validation and actual request
use the same resolver.

---

Nitpick comments:
In `@backend/crates/atlas-server/Cargo.toml`:
- Around line 44-45: The crate-level Cargo.toml lists base64 and
percent-encoding as direct versioned deps; to follow the workspace convention,
add those dependencies with their versions to the root workspace Cargo.toml and
replace the entries in this crate's Cargo.toml with workspace pointers (e.g.
base64 = { workspace = true } and percent-encoding = { workspace = true }) or
remove them here if already inherited; also review other indexer-specific deps
like num-bigint and async-channel for consistency and hoist them if you want
them managed at the workspace level.

In `@frontend/src/types/index.ts`:
- Line 72: Replace the loose string type for metadata_status with a finite
literal union of the four allowed DB values so TypeScript can catch typos;
change the declaration (symbol: metadata_status in frontend/src/types/index.ts)
to use a union 'pending' | 'fetched' | 'retryable_error' | 'permanent_error' (or
extract to a named type alias like MetadataStatus and use that) and update any
refs (helpers, tests, functions expecting metadata_status) to use the new union
type.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: df7a21c2-2325-428a-8c8e-c5d6804bc1fe

📥 Commits

Reviewing files that changed from the base of the PR and between 1495e71 and fa79a95.

📒 Files selected for processing (19)
  • backend/crates/atlas-common/src/types.rs
  • backend/crates/atlas-server/Cargo.toml
  • backend/crates/atlas-server/src/api/handlers/addresses.rs
  • backend/crates/atlas-server/src/api/handlers/nfts.rs
  • backend/crates/atlas-server/src/indexer/indexer.rs
  • backend/crates/atlas-server/src/indexer/metadata.rs
  • backend/crates/atlas-server/src/lib.rs
  • backend/crates/atlas-server/src/main.rs
  • backend/crates/atlas-server/src/nft_metadata.rs
  • backend/crates/atlas-server/tests/integration/nfts.rs
  • backend/migrations/20260422000003_nft_metadata_state.sql
  • docker-compose.yml
  • frontend/src/pages/AddressPage.tsx
  • frontend/src/pages/NFTContractPage.tsx
  • frontend/src/pages/NFTTokenPage.tsx
  • frontend/src/types/index.ts
  • frontend/src/utils/index.ts
  • frontend/src/utils/nftMetadata.test.ts
  • frontend/src/utils/nftMetadata.ts

Comment thread backend/crates/atlas-server/src/indexer/metadata.rs
Comment thread backend/crates/atlas-server/src/nft_metadata.rs
Comment on lines +99 to +106
validate_metadata_url_scheme(&url)?;
validate_resolved_host(&url).await?;

let response = client
.get(&url)
.send()
.await
.map_err(classify_request_error)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

For the reqwest version used by this repository, how can ClientBuilder enforce a custom DNS resolver or connector so SSRF validation and the actual connection use the same vetted IP address?

💡 Result:

To enforce a custom DNS resolver or connector in reqwest's ClientBuilder so that SSRF validation and actual connections use the same vetted IP address, implement a custom DNS resolver using the dns_resolver method. This ensures DNS resolution happens through your vetted logic, and the connector uses those exact resolved addresses, preventing IP rebinding or discrepancies that could enable SSRF. The latest reqwest version (as of 2026) is 0.13.2, which supports this via ClientBuilder::dns_resolver. For older versions like 0.11.x, the same API exists. ### Primary Approach: Custom DNS Resolver (Recommended for SSRF) Implement the reqwest::dns::Resolve trait to perform your SSRF validation during resolution: rust use std::net::SocketAddr; use std::sync::Arc; use reqwest::dns::{Resolve, Name, Resolving, Addrs}; use reqwest::ClientBuilder; use tokio::net::lookup_host; struct SsrfResolver; impl Resolve for SsrfResolver { fn resolve(&self, name: Name) -> Resolving { Box::pin(async move { // Perform your SSRF validation here let addrs: Vec = lookup_host((name.as_ref, 443)).await? .filter(|addr| { // Example: block private IPs (use a bogon checker crate like bogon) !addr.ip.is_loopback && !addr.ip.is_private }) .collect; if addrs.is_empty { return Err("No valid public IPs found".into); } Ok(Box::new(addrs.into_iter)) }) } } let client = ClientBuilder::new .dns_resolver(Arc::new(SsrfResolver)) .build?; This guarantees the connector connects only to the vetted IPs returned by your resolver. Reqwest skips resolution if the URL has an IP, but for hostnames, your custom resolver controls it entirely. ### Alternative: Custom Connector via Tower Layers (Recent Versions) Recent reqwest versions (0.12+) support ClientBuilder::connector_layer to add Tower layers to the connector stack, allowing custom connection logic: rust use tower::Layer; use std::time::Duration; let client = ClientBuilder::new .connect_timeout(Duration::from_secs(10)) .connector_layer(your_custom_layer) // e.g., a layer that validates destination before connecting .build?; This wraps the base HttpConnector (which includes your DNS resolver), enabling pre-connection validation. Combine with dns_resolver for full control. ### Why This Solves SSRF - Default system resolver might resolve to different IPs than validation checks. - Custom resolver ensures same vetted IPs are used for both validation and connection. - Blocks private/reserved IPs (e.g., 127.0.0.1, 169.254.x.x metadata) during resolution. No built-in block_private_ips exists (community discussion #2567), so custom implementation is required. Use crates like ipnet or bogon for IP validation. For the repository's specific reqwest version, check Cargo.lock or Cargo.toml, but the API has been stable since 0.11.

Citations:


🏁 Script executed:

# Find and read Cargo.toml/lock files for reqwest version
find . -name "Cargo.lock" -o -name "Cargo.toml" | head -20

Repository: evstack/atlas

Length of output: 159


🏁 Script executed:

# Locate the file mentioned in the review
find . -path "*backend/crates/atlas-server/src/nft_metadata.rs" -type f

Repository: evstack/atlas

Length of output: 106


🏁 Script executed:

# Read the full nft_metadata.rs file to understand the implementation
wc -l backend/crates/atlas-server/src/nft_metadata.rs

Repository: evstack/atlas

Length of output: 108


🏁 Script executed:

# Check reqwest version in atlas-server Cargo.toml
cat backend/crates/atlas-server/Cargo.toml

Repository: evstack/atlas

Length of output: 1549


🏁 Script executed:

# Read the relevant sections of nft_metadata.rs
sed -n '90,120p' backend/crates/atlas-server/src/nft_metadata.rs

Repository: evstack/atlas

Length of output: 878


🏁 Script executed:

# Read the other sections mentioned: 235-258, 261-278
sed -n '230,280p' backend/crates/atlas-server/src/nft_metadata.rs

Repository: evstack/atlas

Length of output: 1552


🏁 Script executed:

# Also read the validate_resolved_host and is_non_public_ip functions
sed -n '1,100p' backend/crates/atlas-server/src/nft_metadata.rs

Repository: evstack/atlas

Length of output: 2855


🏁 Script executed:

# Check the workspace Cargo.toml for reqwest version
cat backend/Cargo.toml

Repository: evstack/atlas

Length of output: 1676


🏁 Script executed:

# Read lines 235-258, 261-278 mentioned in the review
sed -n '235,258p' backend/crates/atlas-server/src/nft_metadata.rs

Repository: evstack/atlas

Length of output: 796


🏁 Script executed:

# Also read 261-278
sed -n '261,278p' backend/crates/atlas-server/src/nft_metadata.rs

Repository: evstack/atlas

Length of output: 634


🏁 Script executed:

# Check if there are any custom DNS resolver implementations already in the codebase
rg "dns_resolver|Resolve trait|custom.*resolver" backend/ --type rust

Repository: evstack/atlas

Length of output: 39


🏁 Script executed:

# Search for how the reqwest Client is constructed
rg "reqwest::Client|ClientBuilder" backend/ --type rust -A 3

Repository: evstack/atlas

Length of output: 5007


🏁 Script executed:

# Check if there are any tests related to SSRF validation or DNS resolution
rg "validate_resolved_host|is_non_public_ip" backend/ --type rust -B 2 -A 5

Repository: evstack/atlas

Length of output: 2349


🏁 Script executed:

# Check if there are existing tests for validate_resolved_host or DNS rebinding
rg "test.*validate_resolved_host|test.*dns" backend/crates/atlas-server/src/nft_metadata.rs -A 10

Repository: evstack/atlas

Length of output: 39


🏁 Script executed:

# Look at the full test module in nft_metadata.rs to understand what's tested
sed -n '300,376p' backend/crates/atlas-server/src/nft_metadata.rs

Repository: evstack/atlas

Length of output: 2625


🏁 Script executed:

# Check if ClientBuilder is ever configured with dns_resolver anywhere
rg "dns_resolver|Resolve|custom.*dns" backend/ --type rust

Repository: evstack/atlas

Length of output: 158


Close the DNS rebinding gap in SSRF protection.

The code validates resolved IPs via tokio::net::lookup_host() but then client.get(&url).send() performs its own separate DNS resolution. An attacker-controlled hostname can resolve to a public IP during validation and a private/link-local IP during the actual request, bypassing SSRF protection.

Implement a custom DNS resolver using ClientBuilder::dns_resolver() with reqwest 0.13 to ensure both validation and connection use the same vetted IP address. Wrap your existing is_non_public_ip() logic in a struct implementing the reqwest::dns::Resolve trait, then pass it to the client builder. This guarantees reqwest skips the default resolver and uses only your validated IPs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/nft_metadata.rs` around lines 99 - 106, The
SSRF gap exists because validate_resolved_host(&url) and client.get(&url).send()
use different DNS resolutions; implement a custom DNS resolver that enforces
your is_non_public_ip() checks and plug it into the reqwest client via
ClientBuilder::dns_resolver() so reqwest uses the same vetted IP. Create a
struct (e.g., VettedResolver) that implements reqwest::dns::Resolve and, inside
its resolve method, perform the lookup (use tokio::net::lookup_host or
equivalent), run each candidate through is_non_public_ip(), return only vetted
SocketAddr(s) or an error if none pass, then build the client with
ClientBuilder::new().dns_resolver(Arc::new(VettedResolver::new())).finally
replace the existing client usage where validate_resolved_host(&url) is called
followed by client.get(&url).send() to ensure both validation and actual request
use the same resolver.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
backend/crates/atlas-server/src/nft_metadata.rs (1)

124-125: ⚠️ Potential issue | 🟠 Major

Enforce limits before accepting untrusted payloads.

data:image/* URIs are returned directly and can later be persisted as image_url without any size cap. The HTTP path still calls response.bytes() before enforcing the post-read length check when Content-Length is absent or incorrect, so a malicious server can force a large allocation.

🛡️ Suggested direction
     let url = resolve_uri(uri, ipfs_gateway);
     if url.starts_with("data:image/") {
+        if url.len() > MAX_METADATA_BYTES {
+            return Err(permanent_error("data_uri_too_large"));
+        }
         return Ok(FetchedMetadata::DirectImage { image_url: url });
     }

For HTTP responses, replace response.bytes() with a bounded stream read and fail as soon as the cumulative size exceeds MAX_METADATA_BYTES.

Verify the available bounded streaming API for the pinned reqwest version:

For reqwest 0.13, what is the recommended way to stream a Response body with a maximum byte limit instead of Response::bytes()?

Also applies to: 167-172

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/nft_metadata.rs` around lines 124 - 125, The
code currently accepts data:image/* URIs and also uses response.bytes() which
can allocate arbitrarily large payloads; update the logic in nft_metadata.rs
(the branch returning FetchedMetadata::DirectImage { image_url: url }) to
enforce MAX_METADATA_BYTES before accepting/storing any untrusted payload and do
not treat data: URIs as exempt from size checks, and replace all uses of
response.bytes() (also the HTTP path around the block referenced at lines
~167-172) with a bounded streaming read that cumulatively reads the response
body and aborts with an error as soon as the total bytes exceed
MAX_METADATA_BYTES, using the reqwest streaming API appropriate to the pinned
version instead of Response::bytes().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/crates/atlas-server/src/nft_metadata.rs`:
- Around line 137-140: Before calling client.get() in the metadata fetch paths
(the block after validate_metadata_url_scheme(&url)? and the similar block at
lines ~276–304), parse the URL's host and explicitly reject IP literal hosts
that are non-public; call the existing is_non_public_ip(host) (after enhancing
it) and return an error if it flags the host. Modify is_non_public_ip to
recognize and classify IPv4-mapped IPv6 addresses (e.g., ::ffff:127.0.0.1) by
normalizing or extracting the embedded IPv4 and treating it accordingly so those
mapped addresses are considered non-public when appropriate. Ensure this host/IP
literal check runs before issuing client.get() so DNS-only protections
(SsrfSafeResolver) cannot be bypassed.

---

Duplicate comments:
In `@backend/crates/atlas-server/src/nft_metadata.rs`:
- Around line 124-125: The code currently accepts data:image/* URIs and also
uses response.bytes() which can allocate arbitrarily large payloads; update the
logic in nft_metadata.rs (the branch returning FetchedMetadata::DirectImage {
image_url: url }) to enforce MAX_METADATA_BYTES before accepting/storing any
untrusted payload and do not treat data: URIs as exempt from size checks, and
replace all uses of response.bytes() (also the HTTP path around the block
referenced at lines ~167-172) with a bounded streaming read that cumulatively
reads the response body and aborts with an error as soon as the total bytes
exceed MAX_METADATA_BYTES, using the reqwest streaming API appropriate to the
pinned version instead of Response::bytes().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8b68035b-5c39-4161-b80e-39a7a6dd7a55

📥 Commits

Reviewing files that changed from the base of the PR and between fa79a95 and 28db53e.

📒 Files selected for processing (2)
  • backend/crates/atlas-server/src/indexer/metadata.rs
  • backend/crates/atlas-server/src/nft_metadata.rs

Comment thread backend/crates/atlas-server/src/nft_metadata.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/crates/atlas-server/src/nft_metadata.rs (1)

80-89: Minor: hardcoded Arweave gateway and implicit trailing-slash contract.

Two small consistency nits in resolve_uri:

  1. ar:// is rewritten against a hardcoded https://arweave.net/ while ipfs:// honors the configurable ipfs_gateway. If operators ever want to pin Arweave to a specific gateway (for reliability, compliance, or rate-limit reasons), this isn't adjustable. Consider threading an arweave_gateway argument in parallel with ipfs_gateway.
  2. format!("{}{}", ipfs_gateway, stripped) silently produces a malformed URL if ipfs_gateway doesn't end in /. Either normalize by trimming/appending /, or document the contract in a doc-comment on resolve_uri and assert on it at startup.

Neither is blocking — happy to defer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/crates/atlas-server/src/nft_metadata.rs` around lines 80 - 89, The
resolve_uri function currently hardcodes the Arweave gateway and assumes
ipfs_gateway has a trailing slash; update resolve_uri to accept a second gateway
parameter (e.g., arweave_gateway: &str) and use it for "ar://" rewrites instead
of "https://arweave.net/". Also normalize both gateway inputs inside resolve_uri
(or at call sites) to guarantee a single '/' separator when concatenating (trim
any trailing slashes from the gateway and prepend/apply a single '/' when
joining, or ensure the gateway ends with '/'), and update all callers of
resolve_uri accordingly (search for usages of resolve_uri to modify their
signatures or pass the new arweave_gateway).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/crates/atlas-server/src/nft_metadata.rs`:
- Around line 298-318: Update is_non_public_ip to reject additional IPv4 and
IPv6 reserved ranges: for IPv4 add checks for multicast (224.0.0.0/4 via
Ipv4Addr::is_multicast), reserved 240.0.0.0/4, benchmark 198.18.0.0/15,
documentation blocks (192.0.2.0/24, 198.51.100.0/24, 203.0.113.0/24), and the
full 0.0.0.0/8 range; for IPv6 add checks for multicast
(Ipv6Addr::is_multicast), documentation 2001:db8::/32 and IPv4-translated
64:ff9b::/96 (in addition to existing IPv4-mapped handling). Update any bitmask
checks in is_non_public_ip accordingly. Add table-driven unit tests inside the
#[cfg(test)] module that exercise is_non_public_ip,
validate_metadata_url_scheme, and SsrfSafeResolver for both allowed public IPs
and rejected addresses including ::ffff:127.0.0.1 and ::ffff:8.8.8.8 to ensure
mapped IPv4 behavior is correct.

---

Nitpick comments:
In `@backend/crates/atlas-server/src/nft_metadata.rs`:
- Around line 80-89: The resolve_uri function currently hardcodes the Arweave
gateway and assumes ipfs_gateway has a trailing slash; update resolve_uri to
accept a second gateway parameter (e.g., arweave_gateway: &str) and use it for
"ar://" rewrites instead of "https://arweave.net/". Also normalize both gateway
inputs inside resolve_uri (or at call sites) to guarantee a single '/' separator
when concatenating (trim any trailing slashes from the gateway and prepend/apply
a single '/' when joining, or ensure the gateway ends with '/'), and update all
callers of resolve_uri accordingly (search for usages of resolve_uri to modify
their signatures or pass the new arweave_gateway).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 227f6921-c603-4dde-bb02-3138e34c3b38

📥 Commits

Reviewing files that changed from the base of the PR and between 28db53e and 86f7d80.

📒 Files selected for processing (1)
  • backend/crates/atlas-server/src/nft_metadata.rs

Comment thread backend/crates/atlas-server/src/nft_metadata.rs
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
frontend/src/pages/NFTTokenPage.tsx (1)

28-36: Polling looks correct; consider pausing when the tab is hidden.

The interval is properly cleared when metadataPending flips or on unmount, and refetch is stable (memoized in useNftToken). One optional refinement: skip polling when document.visibilityState !== 'visible' (or attach a visibilitychange listener) to avoid unnecessary background requests for a page that the user isn't looking at. Not a blocker — the backend eventually transitions tokens to permanent_error via METADATA_RETRY_ATTEMPTS, so the loop is bounded in practice.

♻️ Optional refinement
   useEffect(() => {
     if (!metadataPending) return undefined;

     const id = window.setInterval(() => {
+      if (document.visibilityState !== 'visible') return;
       void refetch();
     }, 10_000);

     return () => window.clearInterval(id);
   }, [metadataPending, refetch]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/pages/NFTTokenPage.tsx` around lines 28 - 36, The polling effect
in NFTTokenPage.tsx should pause when the tab is hidden: update the useEffect
that currently starts an interval when metadataPending is true to check
document.visibilityState === 'visible' before creating the interval (or add a
'visibilitychange' listener to start/stop the interval), and ensure the cleanup
still clears the interval id; reference the existing metadataPending/refetch
logic in the effect so polling only runs while metadataPending is true and the
document is visible.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@frontend/src/pages/NFTTokenPage.tsx`:
- Around line 132-145: The Details column render currently shows the "Metadata
is being fetched" badge only when metadataPending && !imageUrl, which leaves no
pending indicator if metadataPending is true but imageUrl exists and description
is empty; update the JSX conditional logic around
metadataPending/imageUrl/description (the ternary that checks metadataPending &&
!imageUrl, metadataUnavailable && !imageUrl, description) so that the pending
badge is also rendered when metadataPending is true and description is empty
(e.g., change the first branch to metadataPending && (!imageUrl || !description)
or equivalent), keeping metadataUnavailable and description branches intact.

---

Nitpick comments:
In `@frontend/src/pages/NFTTokenPage.tsx`:
- Around line 28-36: The polling effect in NFTTokenPage.tsx should pause when
the tab is hidden: update the useEffect that currently starts an interval when
metadataPending is true to check document.visibilityState === 'visible' before
creating the interval (or add a 'visibilitychange' listener to start/stop the
interval), and ensure the cleanup still clears the interval id; reference the
existing metadataPending/refetch logic in the effect so polling only runs while
metadataPending is true and the document is visible.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 601e6c02-5391-4b54-8868-2be63a6dd25d

📥 Commits

Reviewing files that changed from the base of the PR and between 86f7d80 and 0be27c5.

📒 Files selected for processing (2)
  • docker-compose.yml
  • frontend/src/pages/NFTTokenPage.tsx
✅ Files skipped from review due to trivial changes (1)
  • docker-compose.yml

Comment on lines +132 to +145
{metadataPending && !imageUrl ? (
<div className="mt-2">
<div className="inline-flex items-center px-2.5 py-1 rounded-md border border-dark-500 bg-dark-700 text-fg-muted text-xs">
<svg className="w-4 h-4 mr-1 text-fg-subtle" viewBox="0 0 24 24" fill="none" stroke="currentColor">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 9v2m0 4h.01M12 5a7 7 0 100 14 7 7 0 000-14z" />
</svg>
Metadata has not been fetched yet.
Metadata is being fetched.
</div>
</div>
) : token?.metadata?.description && (
<p className="text-fg-subtle mt-4">{token.metadata.description}</p>
)}
) : metadataUnavailable && !imageUrl ? (
<p className="text-fg-subtle mt-4">Metadata unavailable.</p>
) : description ? (
<p className="text-fg-subtle mt-4">{description}</p>
) : null}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Details fallback skips description while metadataPending with an image present.

When metadataPending is true but imageUrl resolves (e.g., image_url was populated from an earlier partial indexer pass while status is still retryable_error), the first branch metadataPending && !imageUrl is false, so the chain falls through to description. That's generally fine, but note that if description is also empty in that state the block renders nothing — no pending indicator at all in the Details column despite metadataPending being true. The image column still shows the image in that case, so this is acceptable; just flagging for awareness in case you want a subtle "metadata still updating" hint here too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/pages/NFTTokenPage.tsx` around lines 132 - 145, The Details
column render currently shows the "Metadata is being fetched" badge only when
metadataPending && !imageUrl, which leaves no pending indicator if
metadataPending is true but imageUrl exists and description is empty; update the
JSX conditional logic around metadataPending/imageUrl/description (the ternary
that checks metadataPending && !imageUrl, metadataUnavailable && !imageUrl,
description) so that the pending badge is also rendered when metadataPending is
true and description is empty (e.g., change the first branch to metadataPending
&& (!imageUrl || !description) or equivalent), keeping metadataUnavailable and
description branches intact.

@tac0turtle tac0turtle merged commit 1f53a6e into main Apr 24, 2026
10 checks passed
@tac0turtle tac0turtle deleted the pthmas/fix-nft-image branch April 24, 2026 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants