PyTorch released version 2.11 today, delivering performance improvements of up to 600x for specific AI operations while adding support for next-generation NVIDIA and Intel GPUs. The update, built from 2,723 contributions by 432 developers, introduces differentiable collectives for distributed training, FlashAttention-4 backend, and expanded Apple Silicon compatibility, marking a significant advancement for machine learning researchers and developers worldwide.
The new capabilities position PyTorch at the forefront of AI framework competition as organizations race to optimize training and inference for increasingly complex models. The differentiable collectives feature fundamentally changes how researchers can approach distributed training algorithms by allowing gradients to be computed directly through collective communication operations, eliminating the need for custom implementations.
Performance gains in the release are particularly striking for linear algebra operations. The torch.linalg.lstsq function achieves speedups ranging from 1.7x to 620x, while torch.linalg.svd delivers 2x to 400x improvements. These enhancements stem from replacing the legacy MAGMA backend with optimized cuSOLVER and cuBLAS implementations.
FlexAttention, now powered by the FlashAttention-4 backend, provides 1.2x to 3.2x speedups for compute-bound attention workloads on NVIDIA’s Hopper and Blackwell GPUs. This optimization uses just-in-time compilation to generate kernels specifically tailored for these next-generation architectures.
Hardware Compatibility Shifts

A significant change accompanies the performance improvements: PyTorch 2.11’s default installation now ships with CUDA 13.0, dropping support for older GPU architectures. Volta, Pascal, and Maxwell GPUs are no longer supported in the default build, though users can still access CUDA 12.6 builds for legacy hardware compatibility.
The update expands cross-platform support with enhanced Apple Silicon capabilities, adding new distribution functions and improved error reporting for MPS operations. Intel GPU users gain XPUGraph support, a feature similar to CUDA Graphs that reduces CPU overhead by capturing and replaying sequences of operations.
The release also marks progress in PyTorch’s production deployment capabilities. The torch.export API now supports exporting RNN modules including LSTM and GRU for GPU execution, broadening the range of models ready for production inference. This advancement aligns with PyTorch’s continued deprecation of TorchScript in favor of the export ecosystem.
Security and Migration Considerations
Security improvements include hardening of torch.hub.load, which now prompts users for confirmation before executing code from untrusted repositories. Organizations upgrading from PyTorch 2.10 will need to address several breaking changes, particularly around CUDA compatibility and API modifications in attention mechanisms.
The collaborative nature of the release, built from 2,723 contributions by 432 developers, underscores PyTorch’s position as a community-driven project competing with proprietary alternatives from major tech companies.
Sources
- pytorch.org/blog


























