Hierarchical Transformers are More Efficient Language Models
https://youtu.be/soqWNyrdjkw Speaker: Piotr Nawrot, University of Warsaw Title: Hierarchical Transformers are More Efficient Language Models Abstract: Transformer models yield impressive results on many NLP and sequence modeling tasks. Remarkably, Transformers can […]