llm-random
  • About
Stay up to date

Our group researches LLMs. Sign up to learn when we publish new research.

* indicates required
/* real people should not fill this in and expect good things - do not remove this or risk form bot signups */

Categories
All (3)

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

We introduce MoE-Mamba, which reaches the same performance as Mamba in 2.2x less training steps while preserving the inference performance gains of Mamba against the Transformer.

Jan 9, 2024
Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski, Sebastian Jaszczur

Mixture of Tokens

We introduce Mixture of Tokens, a new, fully-differentiable Transformer architecture that builds on top of Mixture of Experts, while avoiding its problems. It achieves the same performance as the vanilla Transformer with \(3\times\) wall-clock speedup and \(4\times\)FLOPS reduction.

Oct 24, 2023
Szymon Antoniak *, Sebastian Jaszczur * †, Michał Krutul, Maciej Pióro, Jakub Krajewski, Jan Ludziejewski, Tomasz Odrzygóźdź, Marek Cygan ‡

Neuron Recycling

Sparse neural networks have garnered attention due to their theoretical promise of lowered computational demands and memory savings. However, to this date, the theoretical…
Jul 11, 2023
Jakub Krajewski *, Maciej Pióro *, Sebastian Jaszczur †, Marek Cygan ‡
No matching items