vllm.v1.attention.backends.mla.aiter_triton_mla ¶
AiterTritonMLABackend ¶
Bases: MLACommonBackend
Source code in vllm/v1/attention/backends/mla/aiter_triton_mla.py
get_builder_cls staticmethod ¶
get_builder_cls() -> type[AiterMLAMetadataBuilder]
get_impl_cls staticmethod ¶
get_impl_cls() -> type[AiterTritonMLAImpl]
AiterTritonMLAImpl ¶
Bases: AiterMLAImpl
Source code in vllm/v1/attention/backends/mla/aiter_triton_mla.py
__init__ ¶
__init__(
num_heads: int,
head_size: int,
scale: float,
num_kv_heads: int,
alibi_slopes: list[float] | None,
sliding_window: int | None,
kv_cache_dtype: str,
logits_soft_cap: float | None,
attn_type: str,
kv_sharing_target_layer_name: str | None,
**mla_args,
) -> None
Source code in vllm/v1/attention/backends/mla/aiter_triton_mla.py
_flash_attn_varlen_diff_headdims ¶
_flash_attn_varlen_diff_headdims(
q,
k,
v,
return_softmax_lse=False,
softmax_scale=None,
**kwargs,
)