Neural networks often exhibit a puzzling phenomenon called "polysemanticity" where many unrelated concepts are packed into a single neuron, making interpretability challenging. This paper provides toy models to understand polysemanticity as a result of models storing additional sparse features in "superposition". Key findings include: …
The paper proposes a new technique called "Constitutional AI" (CAI) to train AI systems like chatbots to be helpful, honest, and harmless without needing human feedback labels identifying harmful behaviors. Instead, the training relies entirely on AI-generated feedback guided by simple principles. This makes it possible to control AI behavior more precisely with far less human input. …
The world of machine learning has been witnessing monumental growth, powered by the scaling of models. "Scaling Laws for Autoregressive Generative Modeling" is a pivotal paper in this context, offering profound insights into the mechanics of this scaling. This blog post distills the paper's essence for a clearer understanding. …