Progressing on Open-Ended Interpretability through Programming Language and User Interface Design
Part of the reason that open-endedness research is so difficult is that many of the things AI agents create are weird . They aren't constrained to our ways of thinking, and so they'll find solutions that are really hard for us to understand. This makes it very difficult to diagnose where things have gone wrong, because you can hardly even tell what is going on. Maybe a system developed an entire culture and simulated an entire universe inside that uninterpretable mess of symbols. Maybe it's just utilizing noise and random combinations because that's easier than designing small things and the simulation favors that. When we can't distinguish between these two cases, we have a bad problem and our research cannot go to space today. So how do we fix this? One answer is to just try and figure out what's going on anyway. I think that's a valid approach, and some of that will be needed. See the fantastic Circuits/Reverse Engineering Neural Networks work for some e...