Skip to content

Strip azure_blob source + GCP impersonation from v0.51-exaforce#33

Closed
sundaresanr wants to merge 1 commit intov0.51-exaforcefrom
v0.51-exaforce-parquet-only
Closed

Strip azure_blob source + GCP impersonation from v0.51-exaforce#33
sundaresanr wants to merge 1 commit intov0.51-exaforcefrom
v0.51-exaforce-parquet-only

Conversation

@sundaresanr
Copy link
Copy Markdown

Summary

Parquet-only v0.51-exaforce base. The azure_blob queue source + `src/azure/mod.rs` auth helpers pulled over from v0.45-exaforce used azure-core 0.21 APIs; upstream v0.51 is on 0.25, so those files can't compile without a proper SDK migration. Since neither feature is used in production Vector configs (confirmed: no `type: azure_blob` sources in any chart, no `impersonated_service_account` in any sink config), dropping them is the quickest path to a working v0.51-based `vector-base` image.

Deletions (6 files, ~1490 lines)

  • `src/azure/mod.rs`
  • `src/internal_events/azure_queue.rs`
  • `src/sources/azure_blob/{mod,queue,integration_tests,test}.rs`

Reverts to upstream v0.51 (1e5bc95)

  • `src/lib.rs`, `src/internal_events/mod.rs`, `src/sources/mod.rs`
  • `src/sinks/azure_blob/{config,integration_tests}.rs`, `src/sinks/azure_common/config.rs`
  • `src/gcp.rs`, `src/sinks/gcp/stackdriver/metrics/tests.rs` (drops impersonated_service_account)
  • `scripts/environment/{prepare.sh,bootstrap-ubuntu-24.04.sh}`, `scripts/integration/azure/compose.yaml` (cosmetic rebase leftovers)

Kept (parquet codec + fixups)

All the `lib/codecs` parquet pieces, `src/codecs/encoding/config.rs` wiring, `src/sinks/aws_s3/{config,sink}.rs`, `src/sinks/util/encoding.rs` batch encoder trait.

Had to fix issues left over in the merged v0.51-exaforce parquet patches:

  • `serializer.rs`: add `Parquet*` to `use super::format::`, fix `Gelf` tuple-vs-unit match, add cfg-guarded `Otlp` arm in `build_batched()`
  • `parquet.rs`: drop Rust-2024-illegal `ref mut` in BoolColumnWriter match
  • `encoding/mod.rs`: re-export `BatchSerializer`
  • `src/codecs/encoding/config.rs`: restore `vector_lib::configurable::configurable_component` (exaforce rebase accidentally dropped it)
  • `src/components/validation/resources/mod.rs`: add `Parquet` arm in serializer→deserializer match
  • `Cargo.lock`: pin pulsar 6.5.0 → 6.3.1 (exaforce rebase drift; 6.5.0 has new required `ProducerOptions` fields the sink doesn't populate)

Local verification

  • `cargo check -p vector` (default features) — green (1m11s)
  • `cargo check --tests -p vector --no-default-features --features azure-integration-tests` — green (22.9s)

Both cover the commands the publish workflow runs (`make test-integration-azure` + `make package-aarch64-unknown-linux-gnu-all`).

Downstream

Once merged, the push to `v0.51-exaforce` will trigger `exaforce-publish.yaml` → `vector-base:v0.51-exaforce-` lands in ECR → `operations` repo `goservices/vector_exec_sources/Dockerfile` gets bumped in a separate follow-up PR.

The exaforce azure_blob queue source and src/azure/mod.rs auth helpers
were written against azure-core 0.21; upstream v0.51 bumped to 0.25, so
this code can't compile on v0.51 without a proper SDK migration. Neither
the azure_blob source nor impersonated_service_account is used in
production Vector configs, so dropping both is the path of least
resistance to getting a v0.51-based vector-base image.

Deletions:
  - src/azure/mod.rs (exaforce-added)
  - src/internal_events/azure_queue.rs (exaforce-added)
  - src/sources/azure_blob/{mod,queue,integration_tests,test}.rs (all
    exaforce-added)

Reverted to upstream v0.51 (1e5bc95):
  - src/lib.rs, src/internal_events/mod.rs, src/sources/mod.rs
  - src/sinks/azure_blob/{config,integration_tests}.rs
  - src/sinks/azure_common/config.rs
  - src/gcp.rs, src/sinks/gcp/stackdriver/metrics/tests.rs
  - scripts/environment/{prepare.sh,bootstrap-ubuntu-24.04.sh},
    scripts/integration/azure/compose.yaml (cosmetic rebase leftovers)

Kept (parquet codec):
  - lib/codecs/{Cargo.toml, src/encoding/...} — parquet codec + batch
    serializer enum
  - src/codecs/encoding/config.rs — wiring
  - src/sinks/aws_s3/{config,sink}.rs — parquet output
  - src/sinks/util/encoding.rs — Encoder<Vec<Event>> for batching
  - src/components/validation/resources/mod.rs — Parquet stub

Additional codecs fixups needed on top of exaforce's original parquet
patches (they were broken on v0.51):
  - lib/codecs/src/encoding/serializer.rs: add Parquet* to format::*
    imports; fix Gelf unit-vs-tuple match arm; add #[cfg(opentelemetry)]
    Otlp branch to build_batched().
  - lib/codecs/src/encoding/format/parquet.rs: drop `ref mut` (Rust
    2024 match ergonomics).
  - lib/codecs/src/encoding/mod.rs: re-export BatchSerializer.
  - src/codecs/encoding/config.rs: restore vector_lib::configurable::
    configurable_component import (exaforce accidentally dropped it).
  - src/components/validation/resources/mod.rs: add Parquet arm to the
    serializer-to-deserializer match.
  - Cargo.lock: pin pulsar back to 6.3.1 (merge had drifted to 6.5.0
    which needs ProducerOptions fields the sink doesn't set).

Verified: cargo check -p vector (default features) and cargo check
--tests -p vector --no-default-features --features
azure-integration-tests both green locally.
@sundaresanr
Copy link
Copy Markdown
Author

Closing in favor of a cleaner two-PR sequence:

  1. Revert PR V0.51 exaforce rebase our changes #31 to restore v0.51-exaforce to pristine upstream-v0.51 state.
  2. New PR adding only the parquet codec on top of that — no dead azure_blob / GCP-impersonation code in history, no Cargo.lock drift workaround.

The strip-then-keep approach here worked but leaves ~1500 lines of azure code in v0.51-exaforce history as a "added then deleted" ghost that makes future upstream rebases harder.

@sundaresanr sundaresanr deleted the v0.51-exaforce-parquet-only branch April 17, 2026 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant