The paper presents Hiformer, an innovative Transformer-based model tailored for recommender systems, emphasizing efficient heterogeneous feature interaction learning. Traditional Transformer architectures face significant hurdles in recommender systems, notably in capturing the complex interplay of diverse features and achieving acceptable serving latency for web-scale applications. …
BERT, known for its masked language modeling (MLM) approach, has been a cornerstone in pre-training models for NLP. XLNet built on this by introducing permuted language modeling (PLM) to capture the dependency among predicted tokens. However, XLNet fell short in utilizing the full position information within a sentence, leading to discrepancies between pre-training and fine-tuning phases. MPNet emerges as a novel solution that amalgamates the strengths of BERT and XLNet while overcoming their limitations. By leveraging permuted language modeling and incorporating auxiliary position information, MPNet provides a comprehensive view of the sentence structure. This method not only enhances the model's understanding of language but also aligns more closely with downstream tasks. Pre-trained on an extensive dataset exceeding 160GB and fine-tuned across various benchmarks like GLUE and SQuAD, MPNet demonstrates superior performance over existing models, including BERT, XLNet, and RoBERTa. For further details and access to the pre-trained models, visit Microsoft's MPNet repository. …