Hierarchical Transformers are More Efficient Language Models
Virtualhttps://youtu.be/soqWNyrdjkw Speaker: Piotr Nawrot, University of Warsaw Title: Hierarchical Transformers are More Efficient Language Models Abstract: Transformer models yield impressive results on many NLP and sequence modeling tasks. Remarkably, Transformers can handle long sequences which allows them to produce long coherent outputs: full paragraphs produced by GPT-3 or well-structured images produced by DALL-E. These large language […]