[PATCH v2] aarch64: Optimize memcmp for Kunpeng 950 with SVE
Project / Subsystem
glibc / aarch64
Date
2026-04-17
Proposer
Weihong Ye <[email protected]>
Source type
public_inbox
Consensus
Proposed
Sentiment
7/10
Technical tradeoffs
- • Uses SVE predication for branch-free handling of short inputs and tails.
- • Implements 4-way loop unrolling to maximize pipeline utilization.
- • Optimizes mismatch detection with early exit logic.
- • Potential regressions in edge cases where offsets are near 4K boundaries.
All attributes
- project
- glibc
- subsystem
- aarch64
- patch_id
- —
- discussion_id
- [email protected]
- source_type
- public_inbox
- title
- [PATCH v2] aarch64: Optimize memcmp for Kunpeng 950 with SVE
- headline
- Optimize memcmp for Kunpeng 950 with SVE
- tldr
- This patch optimizes memcmp for the Kunpeng 950 using SVE, resulting in 15% - 50% speedups for small to large inputs.
- stakes
- This improves the performance of memcmp on AArch64 systems using the Kunpeng 950 processor, benefiting applications that rely heavily on memory comparison operations.
- proposer
- Weihong Ye <[email protected]>
- consensus
- Proposed
- outcome
- proposed
- sentiment_score
- 7
- sentiment_rationale
- Positive benchmark results and addressed review comments indicate a good direction.
- technical_tradeoffs
-
- • Uses SVE predication for branch-free handling of short inputs and tails.
- • Implements 4-way loop unrolling to maximize pipeline utilization.
- • Optimizes mismatch detection with early exit logic.
- • Potential regressions in edge cases where offsets are near 4K boundaries.
- series_id
- —
- series_role
- standalone
- series_parts
- []
- tags
-
- • aarch64
- • memcmp
- • SVE
- • optimization
- • kunpeng950
- bugzilla_url
- —
- date
- 2026-04-17T00:00:00.000Z
[PATCH v2] aarch64: Optimize memcmp for Kunpeng 950 with SVE
The patch optimizes memcmp for the Kunpeng 950 by using SVE predication, 4-way loop unrolling, and optimized mismatch detection. It uses cntb instead of rdvl and mul vl addressing based on review feedback. Benchmarks show significant speedups, but regressions may occur near 4K boundaries.