[PATCH v2] aarch64: Optimize memcmp for Kunpeng 950 with SVE

Project / Subsystem

glibc / aarch64

Date

2026-04-17

Proposer

Weihong Ye <[email protected]>

Source type

public_inbox

Consensus

Proposed

Sentiment

7/10

Technical tradeoffs

  • Uses SVE predication for branch-free handling of short inputs and tails.
  • Implements 4-way loop unrolling to maximize pipeline utilization.
  • Optimizes mismatch detection with early exit logic.
  • Potential regressions in edge cases where offsets are near 4K boundaries.

All attributes

project
glibc
subsystem
aarch64
patch_id
discussion_id
[email protected]
source_type
public_inbox
title
[PATCH v2] aarch64: Optimize memcmp for Kunpeng 950 with SVE
headline
Optimize memcmp for Kunpeng 950 with SVE
tldr
This patch optimizes memcmp for the Kunpeng 950 using SVE, resulting in 15% - 50% speedups for small to large inputs.
stakes
This improves the performance of memcmp on AArch64 systems using the Kunpeng 950 processor, benefiting applications that rely heavily on memory comparison operations.
proposer
Weihong Ye <[email protected]>
consensus
Proposed
outcome
proposed
sentiment_score
7
sentiment_rationale
Positive benchmark results and addressed review comments indicate a good direction.
technical_tradeoffs
  • Uses SVE predication for branch-free handling of short inputs and tails.
  • Implements 4-way loop unrolling to maximize pipeline utilization.
  • Optimizes mismatch detection with early exit logic.
  • Potential regressions in edge cases where offsets are near 4K boundaries.
series_id
series_role
standalone
series_parts
[]
tags
  • aarch64
  • memcmp
  • SVE
  • optimization
  • kunpeng950
bugzilla_url
date
2026-04-17T00:00:00.000Z

[PATCH v2] aarch64: Optimize memcmp for Kunpeng 950 with SVE

The patch optimizes memcmp for the Kunpeng 950 by using SVE predication, 4-way loop unrolling, and optimized mismatch detection. It uses cntb instead of rdvl and mul vl addressing based on review feedback. Benchmarks show significant speedups, but regressions may occur near 4K boundaries.