PyTorch’s FlexAttention with FlashAttention-4 Is A Game Changer

6 March 2026

PyTorch has integrated FlashAttention-4 as a new backend for its FlexAttention API, delivering 1.2× to 3.2× speedups for custom AI attention mechanisms on NVIDIA’s Hopper and Blackwell GPUs. The update, detailed in a technical report released today, enables developers to write Python code that automatically compiles into highly optimized GPU kernels, eliminating the traditional trade-off between flexibility and performance in transformer model development.

The breakthrough leverages just-in-time (JIT) compilation to convert user-defined Python functions directly into CuTeDSL language kernels, according to the PyTorch Blog. This approach allows the system to access hardware features previously unavailable through standard frameworks, including programmer-managed Tensor Memory, asynchronous operations, and warp specialization on NVIDIA’s latest architectures.

The technology addresses a critical bottleneck in AI development where researchers have historically faced difficult choices between using fast but rigid pre-built kernels or flexible but slow custom implementations. FlexAttention with the new backend supports complex attention patterns including ALiBi, sliding window attention, document masking, and soft-capping, all while maintaining near-optimal performance.

Performance and Validation

Benchmarks demonstrate the FA4 backend matches or exceeds NVIDIA’s cuDNN attention performance in backward passes, though some gap remains in forward passes for standard causal attention, the PyTorch team reported. The implementation has been validated through large-scale testing, with a Llama 3 70B model trained on 64 H100 GPUs achieving identical final loss values using either the Triton or FA4 backend.

The performance gains stem from FA4’s ability to utilize deeply pipelined kernels and hardware-specific optimizations that keep tensor cores on Hopper and Blackwell GPUs fully utilized. These architectural advantages prove particularly valuable in compute-bound scenarios involving long sequence lengths, a common challenge in modern language models.

Current Limitations

The technology comes with important constraints for developers to consider. The backend exclusively supports NVIDIA Hopper and Blackwell GPUs, automatically defaulting to the Triton backend on other hardware. Additionally, the backward pass currently lacks determinism when block-sparsity is enabled, though the PyTorch team indicated a fix is in progress.

Other limitations include the inability to compute gradients for captured tensors such as learnable biases, and potential recompilation overhead when scalar values change between function calls. The kernel is also optimized for specific block sizes: 128×128 on Hopper and 256×128 on Blackwell, which may not suit all use cases.

Despite these constraints, the integration represents a significant advance for transformer model development, enabling researchers to experiment with novel attention mechanisms without sacrificing the performance needed for production deployment on modern data center GPUs.

Sources

PyTorch Blog

Get a glimpse of the future straight to your inbox. Subscribe to discover tomorrow’s tech trends, exclusive tips, and offers just for our community.

Subscribe to the newsletter

What you’ll learn, in a nutshell

Get the brochure

⏳ The video will be available soon

Upcoming starting dates

Take your future into your own hands. Choose your desired start date,
and begin your application by filling out the appointment form.

- Bootcamp
Tuesday 7 July 2026
Analytics Engineer
Remote
English
- Bootcamp
Tuesday 8 September 2026
Analytics Engineer
Remote
English
- Bootcamp
Tuesday 3 November 2026
Analytics Engineer
Remote
English

Upcoming starting dates

Take your future into your own hands. Choose your desired start date,
and begin your application by filling out the appointment form.

No upcoming dates

THE TEaM

They won’t leave until you land your dream job and celebrate with you 🍾

Liora is more than a training. It’s a whole team walking forward with you, step by step, until you get hired.
Mentors, coaches, instructors… all committed to your success.

Estelle

Career Associate

Vincent

Career Associate

Magali

Career Associate

Bilal

Career Associate

Kahina

Career Associate

THE SUPPORT

Support built for your success

Our structured support and expert training open real career opportunities in data, cyber, and tech.

Premium resources just for you

A private platform with exclusive insights on market shifts and career strategy.
A Slack space to log in, ask questions, and grow with fellow learners.
Stay updated with expert tips on trends, events, and career moves.

Individual career coaching, tailored for you

From day one, our Career Team supports you with personalized coaching. We help you:

Shape your career path around your goals and experience.
Find the right opportunities and fine-tune your job search strategy.
Get personalized advice to level up your job hunt.

High-impact career workshops

Our expert-led group sessions help you prepare for the job market: from polishing your CV and LinkedIn to nailing interviews, building a smart job search strategy, crafting your pitch, and building your network.

A strong network that opens doors

We connect you with recruiters through job fairs, speed-dating sessions, and curated industry events.

52k€

Average gross salary of our alumni

Real proof that our programs lead to high-quality, high-paying jobs in data, tech, and AI.

9.53/10

Satisfaction for individual coaching

With 1000+ coachings delivered each year, our live support gives you direct access to industry experts to ask, unblock, and accelerate your job hunting process.

9.1/10

Satisfaction for group workshops

Hands-on sessions that help you improve your CV, LinkedIn, interview skills, and job search strategy.

71%

Employment rate

within 6 months of graduating a clear sign of how effective our training and career support really are.

70+

career-focused workshops every year

covering key topics like employability, networking, career transitions, and personal branding tailored to every learner.

recruitment fairs per year

Whether online or in person, these exclusive events create real connections between our talent and recruiters.

PyTorch’s FlexAttention with FlashAttention-4 Is A Game Changer

Performance and Validation

Current Limitations

Sources

Upcoming starting dates

Tuesday 7 July 2026

Tuesday 8 September 2026

Tuesday 3 November 2026

Upcoming starting dates

They won’t leave until you land your dream job and celebrate with you 🍾

Estelle

Vincent

Magali

Bilal

Kahina

Support built for your success

Premium resources just for you

Individual career coaching, tailored for you

High-impact career workshops

A strong network that opens doors

The impact of our support in numbers

Average gross salary of our alumni

Satisfaction for individual coaching

Satisfaction for group workshops

Employment rate

career-focused workshops every year

recruitment fairs per year

They benefited from our Career Support

PyTorch’s FlexAttention with FlashAttention-4 Is A Game Changer

The newsletter of the future

Performance and Validation

Current Limitations

Sources

The newsletter of the future

Tuesday 7 July 2026

Tuesday 8 September 2026

Tuesday 3 November 2026