pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/apache/arrow/pull/49556

sheet" href="https://github.githubassets.com/assets/actions-df19348d1682334d.css" /> GH-49555: [Python][Packaging] Add riscv64 manylinux wheel builds to release pipeline by gounthar · Pull Request #49556 · apache/arrow · GitHub
Skip to content

GH-49555: [Python][Packaging] Add riscv64 manylinux wheel builds to release pipeline#49556

Draft
gounthar wants to merge 2 commits intoapache:mainfrom
gounthar:feat/riscv64-python-wheels
Draft

GH-49555: [Python][Packaging] Add riscv64 manylinux wheel builds to release pipeline#49556
gounthar wants to merge 2 commits intoapache:mainfrom
gounthar:feat/riscv64-python-wheels

Conversation

@gounthar
Copy link

@gounthar gounthar commented Mar 19, 2026

Describe the enhancement requested

Add riscv64 to the PyArrow wheel build pipeline using manylinux_2_39 and native RISE riscv64 runners.

Changes

  • dev/tasks/tasks.yml: Add ("manylinux", "riscv64", "2-39", "manylinux_2_39_riscv64") to the Linux wheel matrix
  • dev/tasks/python-wheels/github.linux.yml: Add ubuntu-24.04-riscv runner selection and riscv64 ARCH mapping

What's still needed

  • Docker image for riscv64 wheel builds (apache/arrow-dev:riscv64-python-*-wheel-manylinux-2_39-vcpkg-*)
  • vcpkg dependency verification for riscv64
  • Testing via Crossbow

This draft PR enables the CI pipeline. The Docker image creation is a separate effort — happy to work on that as well, or coordinate with the team.

Evidence

  • Arrow C++ native build on BananaPi F3 (SpacemiT K1, rv64gc, GCC 14.2.0): SUCCESS (1h13m)
  • PyArrow install from source (Parquet, CSV, JSON, Compute, Filesystem): SUCCESS
  • import pyarrow; print(pyarrow.__version__)24.0.0a1.dev1

CI Runners

Native riscv64 runners are available for free via RISE RISC-V runners. numpy, llama.cpp, and pytorch already use them.

Fixes #49555

Note: this work is part of the RISE Project effort to improve Python ecosystem support on riscv64 platforms.

Add riscv64 to the Linux wheel build pipeline using manylinux_2_39
(first manylinux with riscv64 support) and RISE native runners
(ubuntu-24.04-riscv).

Changes:
- dev/tasks/tasks.yml: add ("manylinux", "riscv64", "2-39",
  "manylinux_2_39_riscv64") entry to the wheel matrix
- dev/tasks/python-wheels/github.linux.yml: add riscv64 runner
  selection (ubuntu-24.04-riscv) and ARCH mapping

Arrow C++ and PyArrow both build successfully on native riscv64
hardware (BananaPi F3, SpacemiT K1, rv64gc).

Note: Docker image for riscv64 wheel builds still needs to be created
(following the aarch64 pattern). This PR enables the CI pipeline;
Docker image creation is tracked separately.

Signed-off-by: Bruno Verachten <gounthar@gmail.com>
@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@github-actions github-actions bot added the awaiting review Awaiting review label Mar 19, 2026
@gounthar gounthar changed the title ci: add riscv64 manylinux wheel builds to release pipeline GH-49555: [Python][Packaging] Add riscv64 manylinux wheel builds to release pipeline Mar 19, 2026
@github-actions
Copy link

⚠️ GitHub issue #49555 has been automatically assigned in GitHub to PR creator.

@github-actions
Copy link

⚠️ GitHub issue #49555 has no components, please add labels for components.

- ci/vcpkg/riscv64-linux-static-{release,debug}.cmake: vcpkg triplets
  for riscv64 (following arm64 pattern)
- compose.yaml: add python-wheel-manylinux-2-39 service and ccache
  volume for riscv64 wheel builds (using quay.io/pypa/manylinux_2_39_riscv64)

Signed-off-by: Bruno Verachten <gounthar@gmail.com>
@pitrou
Copy link
Member

pitrou commented Mar 19, 2026

Hi @gounthar , the big question here is what happens for ongoing maintenance. I think none of the currently active Arrow maintainers has a RISC-V box at home, and debugging on CI can be painful. But I might be overstating the risks.

@pitrou
Copy link
Member

pitrou commented Mar 19, 2026

also cc @raulcd

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require the following app: https://github.com/apps/rise-risc-v-runners

To be set-up as part of the Apache organization. A couple of questions about that. Do we know what are the required permissions? Has this been set-up on other Apache projects or are we the first ones? Asking as might require some ASF INFRA research.

@gounthar
Copy link
Author

gounthar commented Mar 19, 2026

@pitrou That's a fair concern. A few things that mitigate it:

  1. RISE runners are native hardware: not QEMU emulation. Debugging and iteration is much faster (we validated the full Arrow C++ + pyarrow build on native hardware in ~1.5h total).

  2. I maintain two BananaPi F3 boards running 24/7 as riscv64 build/test machines. Happy to help debug any riscv64-specific issues that come up.

  3. The riscv64 job would be additive: if it breaks, it doesn't affect existing x86_64/aarch64 builds. Same as how aarch64 was initially added.

  4. Arrow C++ is already riscv64-compatible (PR ARROW-17440: [C++] Support RISC-V architecture #13902, 2022). The risk of riscv64-specific breakage in the wheel pipeline is low since the code itself work; it's mainly CI plumbing.

  5. RISE is committed to maintaining riscv64 CI for key projects. Ludovic Henry (RISE TSC Co-Chair) is actively providing runners and support; this isn't a fire-and-forget contribution.

I'm also happy to be a point of contact for riscv64 issues in Arrow. I've been doing this across several Python projects this week and can respond quickly.

@gounthar
Copy link
Author

gounthar commented Mar 19, 2026

@raulcd Great question. The RISE runners app requires these permissions:

  • Actions: Read and write (to register/deregister runners)
  • Metadata: Read-only

It's a GitHub App that provides self-hosted runners, similar to how some projects use BuildJet or Actuated for ARM runners.

As for Apache org setup: I'm not aware of other Apache projects using RISE runners yet, you'd likely be the first. I can check with Ludovic Henry (@luhenry, RISE TSC Co-Chair) about the process for Apache org installation. He's been setting up runners for numpy, pytorch, llama.cpp, and pyca this week.

Alternatively, we could initially run the riscv64 job on a separate fork (like we did with riseproject-dev/numpy) and upstream once the runner setup is confirmed. That way the PR can be reviewed independently of the infra question.

I'll ping Ludovic about Apache org support and report back.

@gounthar
Copy link
Author

gounthar commented Mar 19, 2026

@raulcd @pitrou: Ludovic Henry (@luhenry, RISE TSC Co-Chair) should respond directly on this PR about the Apache org runner setup. He hasn't set up RISE runners on Apache projects before but is very interested in making it happen.

@luhenry
Copy link

luhenry commented Mar 19, 2026

To be set-up as part of the Apache organization. A couple of questions about that. Do we know what are the required permissions? Has this been set-up on other Apache projects or are we the first ones? Asking as might require some ASF INFRA research.

The requested permissions are:

  • At repository level:
  • At organization level:
    • Self-hosted runners: read and write; necessary to add the self-hosted runners and runner group

We are not going to require any more credentials as we only want to be able to dynamically register self-hosted runners, and that's it. You can find all the code of the app at https://github.com/riseproject-dev/riscv-runner-app, and a more descriptive website at https://riseproject-dev.github.io/riscv-runner/

RISE is committed to maintaining riscv64 CI for key projects

RISE is part of Linux Foundation EU, and is committed to this service. We see it as a critical piece to enable RISC-V Software more broadly, which is our entire raison d'être. We are also working with PyTorch, Llama.cpp, and many other projects to enable CI on RISC-V.

Also happy to drastically increase the number of runners available for the Apache organization, given the overall importance of everything that you're doing!

For any direct board access, we are also working on a service of remote, on-demand machines accessible via SSH. Exactly for this kind of purpose where someone needs to debug a sticky issue and for which it's a great productivity loss to go through CI.

Let me know if you have any other questions.

@pitrou
Copy link
Member

pitrou commented Mar 19, 2026

Thanks for the answers @gounthar . Can we perhaps start by having a regular C++ CI job on RISC-V? The Python wheel CI builds do not run the C++ test suite.

@gounthar
Copy link
Author

gounthar commented Mar 19, 2026

@pitrou That makes a lot of sense, start with the foundation. I'll rework this PR to add a C++ CI job on riscv64 instead of jumping straight to wheel builds.

I've already verified that Arrow C++ builds and passes import pyarrow on native riscv64. I'll look at the existing C++ CI jobs (ci/docker/ubuntu-*-cpp.dockerfile and the corresponding workflows) and add a riscv64 variant.

Would you prefer:

  1. A Docker-based build (like the existing CI), or
  2. A native build on the RISE runner directly (simpler, but less isolated)?

@pitrou
Copy link
Member

pitrou commented Mar 19, 2026

Definitely a Docker-based job! You can take a look here for inspiration:

Note that this uses archery docker, which relies on compose.yaml. See https://arrow.apache.org/docs/developers/continuous_integration/docker.html, and feel free to ask any questions!

@gounthar
Copy link
Author

gounthar commented Mar 19, 2026

@pitrou Thanks for the pointer! I'll study the archery docker approach and the compose.yaml service definitions for the C++ CI job. 🙏

I'll rework this PR to add a ubuntu-cpp-riscv64 service in compose.yaml and a corresponding CI workflow, following the pattern from cpp.yml#L74. Will likely need to figure out the Docker image base (manylinux_2_39_riscv64 or a Ubuntu-based image for riscv64). 🤔

Will report back once I have a working prototype. 🤞

@gounthar
Copy link
Author

Update on the C++ CI work: got the Docker-based build running on a RISE riscv64 runner (ubuntu-24.04-riscv). The build itself completes (2257 C++ files, ~4h), but the test suite crashes systematically - all 59 test binaries fail in ~2 seconds each, which looks like a SIGILL or similar architecture-level issue.

The Docker image build had a couple of hurdles:

  • install_minio.sh crashes on unknown arches due to set -u + unbound array key (one-line fix: ${archs[$arch]:-})
  • Docker 26.1.5 on the runner has a BuildKit bug; works with DOCKER_BUILDKIT=0

I'm going to run the tests directly on native hardware (BananaPi F3) to get the actual error output - the CI redirects test logs into files so the crash reason isn't visible. Will report back with the findings.

Fork CI run: https://github.com/gounthar/arrow/actions/runs/23411999463

@gounthar
Copy link
Author

Correction on the above: the CI run did use the Docker-based approach (archery docker run ubuntu-cpp) as you suggested. The build and tests ran inside the Docker container on the RISE runner. The test crashes are from inside the container, not a bare-metal build.

I'm also going to reproduce the Docker-based build on my BananaPi F3 to get the test crash output (the CI redirects logs to files). Will use archery docker run ubuntu-cpp there as well to match the CI environment.

@pitrou
Copy link
Member

pitrou commented Mar 23, 2026

Update on the C++ CI work: got the Docker-based build running on a RISE riscv64 runner (ubuntu-24.04-riscv). The build itself completes (2257 C++ files, ~4h)

4 hours to complete a native build sounds like... a lot.

install_minio.sh crashes on unknown arches due to set -u + unbound array key (one-line fix: ${archs[$arch]:-})

Please note you can disable individual components using environment variables, you can get an idea by the cmake invocation here.

I'm going to run the tests directly on native hardware (BananaPi F3) to get the actual error output - the CI redirects test logs into files so the crash reason isn't visible.

Ideally, our test harness is able to display crash tracebacks using gdb, though the logic behind that is fragile and doesn't always work reliably.
See these instructions in compose.yaml.

@gounthar
Copy link
Author

@pitrou Good points.

The 4h includes the Docker image build (~20 min for apt packages) + C++ compilation. The RISE runner is a Scaleway EM-RV1 which isn't the fastest riscv64 hardware. On my BananaPi F3 (8 cores, SpacemiT K1), the C++ build alone takes about 1h13m. Disabling some components would help; I'll trim the feature set for the CI job.

Thanks for the pointer on disabling components via env vars and on the gdb traceback support in compose.yaml. I'm running the Docker-based build on the F3 now with archery; will check if gdb picks up the crash reason.

For the minio fix, I can submit a separate small PR for install_minio.sh if that's useful, or fold it into this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review Awaiting review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add riscv64 Python wheel builds to release pipeline

4 participants

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy