vllm.attention.backends.registry ¶
Attention backend registry
MAMBA_TYPE_TO_BACKEND_MAP module-attribute ¶
MAMBA_TYPE_TO_BACKEND_MAP = {
"mamba1": name,
"mamba2": name,
"short_conv": name,
"linear_attention": name,
"gdn_attention": name,
"custom": name,
}
_MAMBA_ATTN_OVERRIDES module-attribute ¶
_MAMBA_ATTN_OVERRIDES: dict[
MambaAttentionBackendEnum, str
] = {}
AttentionBackendEnum ¶
Bases: Enum
Enumeration of all supported attention backends.
The enum value is the default class path, but this can be overridden at runtime using register_backend().
To get the actual backend class (respecting overrides), use: backend.get_class()
Source code in vllm/attention/backends/registry.py
CPU_ATTN class-attribute instance-attribute ¶
CUTLASS_MLA class-attribute instance-attribute ¶
FLASHINFER class-attribute instance-attribute ¶
FLASHINFER_MLA class-attribute instance-attribute ¶
FLASHMLA class-attribute instance-attribute ¶
FLASHMLA_SPARSE class-attribute instance-attribute ¶
FLASH_ATTN class-attribute instance-attribute ¶
FLASH_ATTN_MLA class-attribute instance-attribute ¶
FLEX_ATTENTION class-attribute instance-attribute ¶
IPEX class-attribute instance-attribute ¶
NO_ATTENTION class-attribute instance-attribute ¶
PALLAS class-attribute instance-attribute ¶
ROCM_AITER_FA class-attribute instance-attribute ¶
ROCM_AITER_MLA class-attribute instance-attribute ¶
ROCM_AITER_MLA_SPARSE class-attribute instance-attribute ¶
ROCM_AITER_MLA_SPARSE = "vllm.v1.attention.backends.mla.rocm_aiter_mla_sparse.ROCMAiterMLASparseBackend"
ROCM_AITER_TRITON_MLA class-attribute instance-attribute ¶
ROCM_AITER_UNIFIED_ATTN class-attribute instance-attribute ¶
ROCM_AITER_UNIFIED_ATTN = "vllm.v1.attention.backends.rocm_aiter_unified_attn.RocmAiterUnifiedAttentionBackend"
ROCM_ATTN class-attribute instance-attribute ¶
TREE_ATTN class-attribute instance-attribute ¶
TRITON_ATTN class-attribute instance-attribute ¶
TRITON_MLA class-attribute instance-attribute ¶
XFORMERS class-attribute instance-attribute ¶
clear_override ¶
get_class ¶
get_class() -> type[AttentionBackend]
Get the backend class (respects overrides).
Returns:
| Type | Description |
|---|---|
type[AttentionBackend] | The backend class |
Raises:
| Type | Description |
|---|---|
ImportError | If the backend class cannot be imported |
ValueError | If Backend.CUSTOM is used without being registered |
Source code in vllm/attention/backends/registry.py
get_path ¶
Get the class path for this backend (respects overrides).
Returns:
| Type | Description |
|---|---|
str | The fully qualified class path string |
Raises:
| Type | Description |
|---|---|
ValueError | If Backend.CUSTOM is used without being registered |
Source code in vllm/attention/backends/registry.py
MambaAttentionBackendEnum ¶
Bases: Enum
Enumeration of all supported mamba attention backends.
The enum value is the default class path, but this can be overridden at runtime using register_backend().
To get the actual backend class (respecting overrides), use: backend.get_class()
Source code in vllm/attention/backends/registry.py
GDN_ATTN class-attribute instance-attribute ¶
LINEAR class-attribute instance-attribute ¶
MAMBA1 class-attribute instance-attribute ¶
MAMBA2 class-attribute instance-attribute ¶
SHORT_CONV class-attribute instance-attribute ¶
clear_override ¶
get_class ¶
get_class() -> type[AttentionBackend]
Get the backend class (respects overrides).
Returns:
| Type | Description |
|---|---|
type[AttentionBackend] | The backend class |
Raises:
| Type | Description |
|---|---|
ImportError | If the backend class cannot be imported |
ValueError | If Backend.CUSTOM is used without being registered |
Source code in vllm/attention/backends/registry.py
get_path ¶
Get the class path for this backend (respects overrides).
Returns:
| Type | Description |
|---|---|
str | The fully qualified class path string |
Raises:
| Type | Description |
|---|---|
ValueError | If Backend.CUSTOM is used without being registered |
Source code in vllm/attention/backends/registry.py
_AttentionBackendEnumMeta ¶
Bases: EnumMeta
Metaclass for AttentionBackendEnum to provide better error messages.
Source code in vllm/attention/backends/registry.py
__getitem__ ¶
__getitem__(name: str)
Get backend by name with helpful error messages.
Source code in vllm/attention/backends/registry.py
_Backend ¶
Deprecated: Use AttentionBackendEnum instead.
This class is provided for backwards compatibility with plugins and will be removed in a future release.
_BackendMeta ¶
Bases: type
Metaclass to provide deprecation warnings when accessing _Backend.
Source code in vllm/attention/backends/registry.py
__getattribute__ ¶
__getattribute__(name: str)
Source code in vllm/attention/backends/registry.py
register_backend ¶
register_backend(
backend: AttentionBackendEnum
| MambaAttentionBackendEnum,
is_mamba: bool = False,
class_path: str | None = None,
) -> Callable[[type], type]
Register or override a backend implementation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
backend | AttentionBackendEnum | MambaAttentionBackendEnum | The AttentionBackendEnum member to register | required |
class_path | str | None | Optional class path. If not provided and used as decorator, will be auto-generated from the class. | None |
Returns:
| Type | Description |
|---|---|
Callable[[type], type] | Decorator function if class_path is None, otherwise a no-op |
Examples:
Override an existing attention backend¶
@register_backend(AttentionBackendEnum.FLASH_ATTN) class MyCustomFlashAttn: ...
Override an existing mamba attention backend¶
@register_backend(MambaAttentionBackendEnum.LINEAR, is_mamba=True) class MyCustomMambaAttn: ...
Register a custom third-party attention backend¶
@register_backend(AttentionBackendEnum.CUSTOM) class MyCustomBackend: ...
Direct registration¶
register_backend( AttentionBackendEnum.CUSTOM, "my.module.MyCustomBackend" )