Transformers have revolutionized the field of deep learning, offering unparalleled performance in tasks like natural language processing and computer vision. However, their complexity often translates to significant computational demands. Recent advancements, including Shaped Attention, the removal of certain parameters, and parallel block architectures, propose innovative ways to simplify transformers without compromising their effectiveness. …