BayJarvis: Blogs on anthropic

paper Toy Models of Superposition - 2024-04-03

Neural networks often exhibit a puzzling phenomenon called "polysemanticity" where many unrelated concepts are packed into a single neuron, making interpretability challenging. This paper provides toy models to understand polysemanticity as a result of models storing additional sparse features in "superposition". Key findings include: …

paper Constitutional AI - Training AI Systems to Be Helpful and Harmless Using AI Feedback - 2023-11-04

The paper proposes a new technique called "Constitutional AI" (CAI) to train AI systems like chatbots to be helpful, honest, and harmless without needing human feedback labels identifying harmful behaviors. Instead, the training relies entirely on AI-generated feedback guided by simple principles. This makes it possible to control AI behavior more precisely with far less human input. …

paper Scaling Laws for Autoregressive Generative Modeling: A Review - 2023-10-11

The world of machine learning has been witnessing monumental growth, powered by the scaling of models. "Scaling Laws for Autoregressive Generative Modeling" is a pivotal paper in this context, offering profound insights into the mechanics of this scaling. This blog post distills the paper's essence for a clearer understanding. …