ZeCO: Zero-Communication Overhead Sequence Parallelism for Linear Attention

Publication
In The Thirty-ninth Annual Conference on Neural Information Processing Systems