llm-random

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

We introduce MoE-Mamba, which reaches the same performance as Mamba in 2.2x less training steps while preserving the inference performance gains of Mamba against the Transformer.

Jan 9, 2024

Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski, Sebastian Jaszczur

Mixture of Tokens

We introduce Mixture of Tokens, a new, fully-differentiable Transformer architecture that builds on top of Mixture of Experts, while avoiding its problems. It achieves the same performance as the vanilla Transformer with \(3\times\) wall-clock speedup and \(4\times\)FLOPS reduction.

Oct 24, 2023

Szymon Antoniak *, Sebastian Jaszczur * †, Michał Krutul, Maciej Pióro, Jakub Krajewski, Jan Ludziejewski, Tomasz Odrzygóźdź, Marek Cygan ‡

Neuron Recycling

Sparse neural networks have garnered attention due to their theoretical promise of lowered computational demands and memory savings. However, to this date, the theoretical…

Jul 11, 2023

Jakub Krajewski *, Maciej Pióro *, Sebastian Jaszczur †, Marek Cygan ‡

Stay up to date

Categories

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Mixture of Tokens

Neuron Recycling