fix: handle credentials lifecycle in BigQuery analytics plugin#5467
fix: handle credentials lifecycle in BigQuery analytics plugin#5467caohy1988 wants to merge 4 commits intogoogle:mainfrom
Conversation
Fixes several issues introduced by the credentials parameter addition (34713fb): - Clear _credentials in __getstate__ so non-picklable credential types (e.g., compute_engine.Credentials with requests.Session) don't break Vertex AI Agent Engine deployment via cloudpickle. Credentials are re-resolved from the user-provided value or ADC on unpickle. - Clear _credentials in _reset_runtime_state so stale HTTP transport state in credential objects doesn't carry across fork boundaries. - Restore lazy storage.Client creation — only construct it when explicit credentials are provided; otherwise let GCSOffloader create a default client on first use. Avoids unnecessary GCP client initialization when GCS offloading is not triggered. - Document the benign race on _credentials resolution when multiple event loops call _create_loop_state concurrently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
Response from ADK Triaging Agent Hello @caohy1988, thank you for creating this PR! This PR is a bug fix, could you please associate the github issue with this PR? If there is no existing issue, could you please create one? In addition, it looks like you haven't signed the Contributor License Agreement (CLA) yet. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). Please visit https://cla.developers.google.com/ to see your current agreements or to sign a new one. This information will help reviewers to review your PR more efficiently. Thanks! |
Addresses review feedback: - Non-picklable user-provided credentials (e.g., compute_engine with requests.Session) broke pickle.dumps() because _user_credentials was preserved in __getstate__. Now both _credentials and _user_credentials are cleared; credentials re-resolve via ADC after unpickle. - Fork safety: _reset_runtime_state documents that user-provided credentials are kept as-is (we cannot re-create them), while ADC-resolved credentials are cleared for re-resolution. - GCS client: always pass credentials explicitly when available; GCSOffloader.__init__ always creates a client eagerly (the "lazy restoration" claim in the previous commit was incorrect — the pre-credentials code had the same eager behavior). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous commit cleared both _credentials and _user_credentials unconditionally in __getstate__, silently dropping picklable user credentials (service-account, AnonymousCredentials) after unpickle. This caused the plugin to fall back to ADC instead of the user's configured identity. Now __getstate__ tests whether _user_credentials is picklable: - Picklable: preserved — survives pickle and restored on unpickle - Non-picklable: dropped gracefully — falls back to ADC Adds two tests covering both paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A valid picklable credential subclass that defines __bool__ as False would not be restored into _credentials, causing silent ADC fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Fixes lifecycle issues with the
credentialsparameter introduced in34713fb4.Changes
1. Pickle safety — try-preserve, fallback-drop
__getstate__tests whether_user_credentialsis picklable:AnonymousCredentials): preserved — survives pickle and restored via__setstate__so the plugin uses the user's identity after unpickle.compute_engine.Credentialswithrequests.Session): dropped gracefully — falls back to ADC after unpickle._credentials(the active/resolved credentials) is always cleared since it may hold resolved ADC state. On unpickle,__setstate__restores it from_user_credentialswhen available.2. Fork safety — documented user-provided credential limitation
_reset_runtime_state()sets_credentials = _user_credentials. For ADC-resolved credentials (_user_credentials is None), this clears stale credentials for re-resolution. For user-provided credentials, the original object is kept — we cannot re-create it. The comment documents this: the user is responsible for providing fork-safe credentials.3. GCS client — credentials passed correctly
GCSOffloader.__init__always creates astorage.Clienteagerly (storage_client or storage.Client(...)at line 1329). This was also true before the credentials commit. The fix passes explicit credentials when available and lets ADC resolve when not, matching thebigquery.ClientandBigQueryWriteAsyncClientpatterns.4. Benign race on
_credentialsresolution documentedWhen multiple event loops call
_create_loop_state()concurrently, both can resolve ADC redundantly. This is idempotent and benign, now documented.Test plan
test_pickle_preserves_picklable_credentials— picklable user credentials survive round-triptest_pickle_drops_non_picklable_credentials— non-picklable credentials dropped, plugin falls back to ADCtest_pickle_safetyandtest_custom_credentials_usedpass🤖 Generated with Claude Code