Hacker News new | past | comments | ask | show | jobs | submit login

How do you all add and subtract concepts in the rabbit poem?





Features correspond to vectors in activation space. So you can just do vector arithmetic!

If you aren't familiar with thinking about features, you might find it helpful to look at our previous work on features in superposition:

- https://transformer-circuits.pub/2022/toy_model/index.html

- https://transformer-circuits.pub/2023/monosemantic-features/...

- https://transformer-circuits.pub/2024/scaling-monosemanticit...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: