Progressing on Open-Ended Interpretability through Programming Language and User Interface Design

Part of the reason that open-endedness research is so difficult is that many of the things AI agents create are weird. They aren't constrained to our ways of thinking, and so they'll find solutions that are really hard for us to understand. This makes it very difficult to diagnose where things have gone wrong, because you can hardly even tell what is going on. Maybe a system developed an entire culture and simulated an entire universe inside that uninterpretable mess of symbols. Maybe it's just utilizing noise and random combinations because that's easier than designing small things and the simulation favors that. When we can't distinguish between these two cases, we have a bad problem and our research cannot go to space today.

So how do we fix this?

One answer is to just try and figure out what's going on anyway. I think that's a valid approach, and some of that will be needed. See the fantastic Circuits/Reverse Engineering Neural Networks work for some examples of this.

Another answer is to stick with situations where we can understand what's going on. Agents that build physical creatures to solve physical tasks are a particularly natural choice, because you can see what's going on. Karl Sims was one of the first major people to start working in this direction, and Ken Stanley and his team seem to be continuing to work in this direction (see Enhanced POET). I've spent some time on this kind of research as well, and it's really fun. I think that "physically embodied" agents are a great direction, and I suspect we'll continue to get lots of insight from these directions that'll be more generally applicable.

However, so far AI research has taught us that human level robotics and locomotion seems to be much more difficult to achieve than human level language (perhaps because nature has had much more time to spend on locomotion than language, so language is further from diminishing returns). So it would be nice if we could have some kind of languagey environment where we could do open-endedness research.

One natural choice is programming languages. This provides a set of composable primitives that agents can use to solve problems. But again, you run into interpretability issues pretty quickly here.

I was stuck on this issue for quite a while, in the weird place where you know it's a problem but you don't know where to look for sagely advice or how hard the problem actually is.

I was recently reading Bret Victor's thoughts on Learnable Programming and I realized that the uninterpretability of things the neural nets create isn't a fundamental problem with symbols: it's a damning critique of our programming languages, and programming environments. If we can't look at a program composed by a neural network and fairly quickly learn what's going on (or have a clear path to doing so), that's a fault of our programming language and programming environment, not necessarily a limitation of our abilities.

One of my past jobs was reading old c++ military code and writing unit tests for it, finding bugs in it, and fixing those bugs. Reading code is a skill one can learn, but it's greatly benefited by our code being constrained to code that a human writing it can understand. You still get horrible messes occasionally that take some time to untangle, but most of the time things are laid out fairly nicely. Yet, this isn't because the programming language encourages this design. If you sampled randomly from the space of all valid (or even useful for real world problems!) programs, most of them would be horrible gibberish that take a significant amount of time to decipher. And still, even with the help of reading code written by other humans with shared cognitive reference frames, reading code requires understanding the control flow, calling trees, object dependencies, etc. - holding lots of things in your head at a time or writing them out on paper for reference. It's no wonder that neural networks come up with uninterpretable programs: that's just the easiest way for them to code in the languages we gave them. Most programs written by an optimization process that isn't constrained to think like humans (and doesn't have human background understanding) will write messy uninterpretable code when given languages like c++, assembly, or python. And then, it doesn't help that our tools for interpreting what the code is doing are lacking.

This is analogous to ML developing language: the space of potential forms of communication is very vast, and it's unlikely that ML will converge on the particular form of language developed by humans, because human language was shaped by very particular constraints that may not apply to ML systems.

What's the solution here? Unfortunately, there isn't a silver bullet answer. Programming language design for education is a really difficult problem. Designing these things requires an iterative design process (see chapter 2) where you try some ideas out, see what works, and repeat. Eventually you start to abstract general design principles that you can sometimes use, but more often you just get a sort of "sixth sense" of what works and what doesn't: but ultimately you still need to explore and tinker. 

But still, now there's directions to explore and sagely wisdom to borrow from. It's no longer some mysterious problem: it's a well studied problem with lots of good advice learned by people that have already trudged the difficult path. Most likely, only some of the design principles will apply (we are no longer designing programming languages to be used and read by humans, we are designing them to be used by ML and read by humans), so again, iterative design is really important here. But it gives us a way forward, which is progress :) 

Comments

Popular posts from this blog

My prediction of AGI in 2021-2023

Feedback Loops in Opinion Formation

Inductive Linguistic Biases