Skip to content

Support Data Center precompiled driver container for Arm (Ubuntu 24.04)#533

Open
shivakunv wants to merge 2 commits into
mainfrom
precompiled-arm-support
Open

Support Data Center precompiled driver container for Arm (Ubuntu 24.04)#533
shivakunv wants to merge 2 commits into
mainfrom
precompiled-arm-support

Conversation

@shivakunv
Copy link
Copy Markdown
Contributor

@shivakunv shivakunv commented Jan 6, 2026

Code Changes Summary:

  • Platform Support
    Added support for the ARM64 platform.
    AMD64 remains the default architecture.

  • Artifacts Update
    ARM64 build artifacts are now uploaded with the -arm64 suffix.

  • Instance Type and Region Mapping
    g4dn.xlarge:
    Architecture: AMD64
    Supported Region: us-west-1
    Used for AMD64 builds.

    g5g.xlarge:
    Architecture: ARM64
    Supported Region: us-west-2
    Used for ARM64 builds.

Fixes https://github.com/NVIDIA/cloud-native-team/issues/276

passed pipeline: https://github.com/NVIDIA/gpu-driver-container/actions/runs/22180871853

passed pipeline: https://github.com/NVIDIA/gpu-driver-container/actions/runs/22337833186

@shivakunv shivakunv changed the title Precompiled arm support Support Data Center precompiled driver container for Arm (Ubuntu 24.04) Jan 6, 2026
@shivakunv shivakunv force-pushed the precompiled-arm-support branch 2 times, most recently from 6405d48 to 574ce43 Compare January 14, 2026 17:22
@shivakunv shivakunv force-pushed the precompiled-arm-support branch 4 times, most recently from 20726a8 to 46aa0d1 Compare February 12, 2026 12:07
@shivakunv shivakunv force-pushed the precompiled-arm-support branch 3 times, most recently from c008150 to b684015 Compare February 19, 2026 13:11
@shivakunv shivakunv marked this pull request as ready for review February 19, 2026 13:12
Comment thread .github/workflows/precompiled.yaml Outdated
@shivakunv shivakunv self-assigned this Feb 19, 2026
Comment thread ubuntu24.04/precompiled/nvidia-driver Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds ARM64 (aarch64) platform support to the Ubuntu 24.04 precompiled driver container builds, while maintaining AMD64 as the default architecture. The changes enable multi-platform Docker builds and update the CI/CD pipeline to handle both architectures.

Changes:

  • Added ARM64 platform support for Ubuntu 24.04 precompiled driver containers with architecture-specific package handling
  • Updated CI workflow to build, test, and publish both AMD64 and ARM64 artifacts with platform-specific suffixes
  • Modified Holodeck test infrastructure to support ARM64 instances (g5g.xlarge in us-west-2) and Ubuntu 24.04 OS specification

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
ubuntu24.04/precompiled/nvidia-driver Added conditional installation of libnvidia-fbc1 package (AMD64 only)
ubuntu24.04/precompiled/local-repo.sh Added conditional downloads for ARM64-incompatible packages (linux-signatures-nvidia, libnvidia-fbc1)
ubuntu24.04/precompiled/Dockerfile Made i386 architecture and CUDA repository URLs conditional based on target architecture
tests/scripts/findkernelversion.sh Added optional PLATFORM_SUFFIX parameter for artifact matching and platform-specific manifest inspection
tests/scripts/ci-precompiled-helpers.sh Added PLATFORM_SUFFIX parameter support for kernel version testing
tests/holodeck_ubuntu24.04.yaml Removed file (merged into holodeck_ubuntu.yaml)
tests/holodeck_ubuntu.yaml Removed hardcoded ingressIpRanges and AMI, added OS specification support
multi-arch.mk Removed AMD64-only platform restriction for ubuntu24.04 builds
Makefile Added DOCKER_BUILD_PLATFORM_OPTIONS to base image build targets
.github/workflows/precompiled.yaml Added platform matrix dimension, platform-aware artifact naming, ARM64 e2e testing with appropriate instance types, and Holodeck version update

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/precompiled.yaml Outdated
Comment thread .github/workflows/precompiled.yaml Outdated
@shivakunv shivakunv force-pushed the precompiled-arm-support branch 3 times, most recently from ee1265d to 49429dd Compare February 21, 2026 08:03
@shivakunv shivakunv marked this pull request as draft February 23, 2026 15:33
@shivakunv shivakunv force-pushed the precompiled-arm-support branch from 32e68a1 to cdbfe9a Compare February 24, 2026 05:21
@shivakunv shivakunv marked this pull request as ready for review February 24, 2026 06:48
@shivakunv shivakunv force-pushed the precompiled-arm-support branch from cdbfe9a to e224399 Compare February 25, 2026 04:14
Comment thread .github/workflows/precompiled.yaml Outdated
Comment thread .github/workflows/precompiled.yaml Outdated
Comment thread .github/workflows/precompiled.yaml Outdated
@shivakunv shivakunv force-pushed the precompiled-arm-support branch 5 times, most recently from 2f00f8b to 4a75c51 Compare March 11, 2026 03:51
@shivakunv shivakunv force-pushed the precompiled-arm-support branch from 4a75c51 to c7ce51a Compare April 1, 2026 06:08
@shivakunv shivakunv force-pushed the precompiled-arm-support branch 2 times, most recently from 8b1afd4 to dd078b8 Compare April 9, 2026 06:50
# Fetch GPG keys for CUDA repo
RUN apt-key del 3bf863cc && \
# Fetch GPG keys for CUDA repo (architecture-specific)
RUN CUDA_ARCH=$([ "$TARGETARCH" = "arm64" ] && echo "sbsa" || echo "x86_64") && \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using sbsa? If I remember correctly, sbsa is specifically for Tegra-based arm64 machines

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate. "followed doc" is not a helpful response

Copy link
Copy Markdown
Contributor Author

@shivakunv shivakunv Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

attaching the supported distro table:

statement:

Cross development for arm64-sbsa is supported on Ubuntu 20.04, Ubuntu 22.04, Ubuntu 24.04, KylinOS 10, Red Hat Enterprise Linux 8, Red Hat Enterprise Linux 9, and SUSE Linux Enterprise Server 15.

Cross development for arm64-sbsa-jetson is only supported on Ubuntu 24.04.

Table 1 Supported Linux Distributions

image

Table 2 Native Linux Distribution Support and Validated OS Versions for CUDA 13.3
image

image

Comment thread ubuntu24.04/precompiled/nvidia-driver Outdated
@shivakunv shivakunv force-pushed the precompiled-arm-support branch from dd078b8 to 783e783 Compare April 17, 2026 13:04
@shivakunv shivakunv force-pushed the precompiled-arm-support branch from 783e783 to d44324e Compare May 6, 2026 07:44
@shivakunv shivakunv requested review from rahulait and tariq1890 May 6, 2026 09:24
@shivakunv shivakunv force-pushed the precompiled-arm-support branch from d44324e to 7d8aff1 Compare May 8, 2026 05:08
@shivakunv shivakunv force-pushed the precompiled-arm-support branch from 7d8aff1 to c035d23 Compare May 20, 2026 03:47
@shivakunv shivakunv force-pushed the precompiled-arm-support branch 7 times, most recently from 34d0170 to 0627494 Compare June 1, 2026 13:25
Comment thread .github/workflows/precompiled.yaml Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is matrix.kernel_version the right suffix here or should it be env.KERNEL_VERSION?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be env.KERNEL_VERSION . done

Comment thread base/Dockerfile Outdated
Comment thread .github/workflows/precompiled.yaml Outdated
Comment on lines +126 to +130
if [[ "${{ matrix.dist }}" == "ubuntu24.04" ]] && [[ "${{ matrix.flavor }}" != "azure-fde" ]]; then
export DOCKER_BUILD_PLATFORM_OPTIONS="--platform=linux/amd64,linux/arm64"
else
export DOCKER_BUILD_PLATFORM_OPTIONS="--platform=linux/amd64"
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is repeated in quite a few places. Can we move this to multi-arch.mk? There are already single arch overrides in that file.

# add after the existing single-arch overrides
ifeq ($(KERNEL_FLAVOR),azure-fde)
build-signed_ubuntu24.04%: DOCKER_BUILD_PLATFORM_OPTIONS = platform=linux/amd64
endif

This can then become:

  run: |
    source kernel_version.txt
    export DOCKER_BUILD_OPTIONS="--output=type=oci,dest=./driver-images-...tar"
    make DRIVER_VERSIONS=${DRIVER_VERSIONS} DRIVER_BRANCH=${{ matrix.driver_branch }} \
         KERNEL_FLAVOR=${{ matrix.flavor }} \
         KERNEL_VERSION=${KERNEL_VERSION} build-${DIST}-${DRIVER_VERSION}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.done

Comment thread .github/workflows/precompiled.yaml Outdated
# Convert array to JSON format and assign
echo "[]" > ./matrix_values_${{ matrix.dist }}_${{ matrix.lts_kernel }}.json
printf '%s\n' "${KERNEL_VERSIONS[@]}" | jq -R . | jq -s . > ./matrix_values_${{ matrix.dist }}_${{ matrix.lts_kernel }}.json
platforms_json='${{ needs.set-driver-version-matrix.outputs.platforms }}'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extract this into a separate helper. Inline scripting is hard to read.

It can look something like this:

  - name: Set kernel version
    env:
      KERNEL_FLAVORS_JSON: ${{ needs.set-driver-version-matrix.outputs.kernel_flavors
   }}
      DRIVER_BRANCHES_JSON: ${{ needs.set-driver-version-matrix.outputs.driver_branch
   }}
      EXCLUDE_PAIRS_JSON: ${{
  needs.set-driver-version-matrix.outputs.exclude_build_matrix_pairs }}
      PLATFORMS_JSON: ${{ needs.set-driver-version-matrix.outputs.platforms }}
    run: ./tests/scripts/build-kernel-matrix.sh "${{ matrix.dist }}" "${{
  matrix.lts_kernel }}"

build-kernel-matrix.sh can have something like this (untested):

  #!/bin/bash
  # Args: DIST LTS_KERNEL  (reads KERNEL_FLAVORS_JSON, DRIVER_BRANCHES_JSON,
  #                         EXCLUDE_PAIRS_JSON, PLATFORMS_JSON from env)
  set -euo pipefail
  DIST="$1"; LTS_KERNEL="$2"

  mapfile -t KERNEL_FLAVORS < <(jq -r '.[]' <<<"$KERNEL_FLAVORS_JSON")
  mapfile -t PLATFORMS < <(jq -r '.[]' <<<"$PLATFORMS_JSON")

  DRIVER_BRANCHES=()
  for b in $(jq -r '.[]' <<<"$DRIVER_BRANCHES_JSON"); do
    jq -e --arg dist "$DIST" --arg b "$b" \
      'any(.[]; .dist==$dist and .driver_branch==$b)' <<<"$EXCLUDE_PAIRS_JSON" \
      >/dev/null || DRIVER_BRANCHES+=("$b")
  done

  source ./tests/scripts/ci-precompiled-helpers.sh
  for platform in "${PLATFORMS[@]}"; do
    [[ "$platform" == arm64 && "$DIST" == ubuntu22.04 ]] && continue
    suffix=""; flavors=("${KERNEL_FLAVORS[@]}")
    if [[ "$platform" == arm64 ]]; then
      suffix="-arm64"
      flavors=( "${KERNEL_FLAVORS[@]/azure-fde}" )  # remove azure-fde
      flavors=( "${flavors[@]}" )                    # compact array
    fi
    versions=( $(get_kernel_versions_to_test flavors[@] DRIVER_BRANCHES[@] \
                "$DIST" "$LTS_KERNEL" "$suffix") )
    [[ -n "${versions[*]:-}" ]] && \
      printf '%s\n' "${versions[@]}" | jq -R . | jq -s . \
      > "./matrix_values_${DIST}_${LTS_KERNEL}${suffix}.json"
  done

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

libnvidia-encode-${DRIVER_BRANCH}-server \
libnvidia-fbc1-${DRIVER_BRANCH}-server \
libnvidia-gl-${DRIVER_BRANCH}-server
libnvidia-encode-${DRIVER_BRANCH}-server
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider splitting this into one userspace install and one kernel module install:

# Userspace packages
USERSPACE=(
  nvidia-utils-${DRIVER_BRANCH}-server
  nvidia-headless-no-dkms-${DRIVER_BRANCH}-server
  libnvidia-decode-${DRIVER_BRANCH}-server
  libnvidia-extra-${DRIVER_BRANCH}-server
  libnvidia-encode-${DRIVER_BRANCH}-server
  libnvidia-gl-${DRIVER_BRANCH}-server
)
if [ "$TARGETARCH" = "amd64" ]; then
  # libnvidia-fbc1 is not published for arm64
  USERSPACE+=( libnvidia-fbc1-${DRIVER_BRANCH}-server )
fi
# Install userspace packages
apt-get install -y --no-install-recommends "${USERSPACE[@]}"

# Kernel modules
if [ "$KERNEL_TYPE" = "kernel-open" ]; then
  KMOD=( linux-modules-nvidia-${DRIVER_BRANCH}-server-open-${KERNEL_VERSION} )
else
  KMOD=(
    linux-objects-nvidia-${DRIVER_BRANCH}-server-${KERNEL_VERSION}
    linux-modules-nvidia-${DRIVER_BRANCH}-server-${KERNEL_VERSION}
  )
fi
if [ "$TARGETARCH" = "amd64" ]; then
  # secure-boot signatures are not published for arm64
  KMOD+=( linux-signatures-nvidia-${KERNEL_VERSION} )
fi
# Install kernel modules
apt-get install -y --no-install-recommends "${KMOD[@]}"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cosmetic change. I will handle it in a separate PR.
A similar update is needed for other distro (ubuntu22.04, rhel) as well.

Signed-off-by: Shiva Kumar (SW-CLOUD) <shivaku@nvidia.com>
@shivakunv shivakunv force-pushed the precompiled-arm-support branch from 2d7ac2e to 4ecbadb Compare June 1, 2026 16:09
Comment thread Makefile Outdated
# build-ubuntu22.04-$(DRIVER_VERSION) triggers a build for a specific $(DRIVER_VERSION)
$(DISTRIBUTIONS): %: build-%
$(BUILD_TARGETS): %: $(foreach driver_version, $(DRIVER_VERSIONS), $(addprefix %-, $(driver_version)))
DRIVER_BUILD_TAG = $(if $(findstring type=oci,$(DOCKER_BUILD_OPTIONS)),,--tag $(IMAGE))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the not the right variable name

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done . used DOCKER_BUILD_TAG_OPTION

Comment thread ubuntu24.04/precompiled/nvidia-driver Outdated
linux-signatures-nvidia-${KERNEL_VERSION} \
linux-modules-nvidia-${DRIVER_BRANCH}-server-${KERNEL_VERSION}
if [ "$TARGETARCH" = "amd64" ]; then
apt-get install --no-install-recommends -y \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reduce the duplication here by conditionally installing just linux-objects-nvidia-${DRIVER_BRANCH}-server-${KERNEL_VERSION}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks . done

# Fetch GPG keys for CUDA repo
RUN apt-key del 3bf863cc && \
# Fetch GPG keys for CUDA repo (architecture-specific)
RUN CUDA_ARCH=$([ "$TARGETARCH" = "arm64" ] && echo "sbsa" || echo "x86_64") && \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate. "followed doc" is not a helpful response

pattern: driver-images-*-${{ env.KERNEL_VERSION }}-${{ env.DIST }}*
path: ./tests/
merge-multiple: true
- name: Install skopeo
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need skopeo?

Copy link
Copy Markdown
Contributor Author

@shivakunv shivakunv Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 478 (Line: 356): Pushing the multi-arch oci-archive to the registry as a manifest list.
The github actions runner's docker load --platform exists in the cli but the daemon can not hold multi-arch images.
docker load + docker push would mean loop per platform, push each, then docker manifest create to stitch the manifest list.
skopeo copy streams oci-archive: registry preserving multi-arch in one step.

Line 379: Build outputs one multi-arch oci-archive (amd64 + arm64). The e2e test needs a single platform as docker-archive (for docker load).
On the github actions runner's docker save --platform does not help either, so without skopeo we would have to build amd64 and arm64 as separate single arch images. separate artifacts need to upload on github.
skopeo copy --override-arch extracts the platform we need.

consistency: same tool in both places.

From what I recall, tried regctl earlier and hit an oci-archive error. github pipeline logs have been cleared since three months have passed. can reinvestigate if preferred.

Signed-off-by: Shiva Kumar (SW-CLOUD) <shivaku@nvidia.com>
@shivakunv shivakunv force-pushed the precompiled-arm-support branch from 6d37464 to 6b58bd4 Compare June 2, 2026 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants