Progressing on Open-Ended Interpretability through Programming Language and User Interface Design

Part of the reason that open-endedness research is so difficult is that many of the things AI agents create are weird . They aren't constrained to our ways of thinking, and so they'll find solutions that are really hard for us to understand. This makes it very difficult to diagnose where things have gone wrong, because you can hardly even tell what is going on. Maybe a system developed an entire culture and simulated an entire universe inside that uninterpretable mess of symbols. Maybe it's just utilizing noise and random combinations because that's easier than designing small things and the simulation favors that. When we can't distinguish between these two cases, we have a bad problem and our research cannot go to space today. So how do we fix this? One answer is to just try and figure out what's going on anyway. I think that's a valid approach, and some of that will be needed. See the fantastic Circuits/Reverse Engineering Neural Networks work for some e

Current projects

I'm generally working in the area of trying to merge open-endedness, artificial creativity, and complex systems. I may update this post occasionally as I learn things, and will probably shift projects occasionally if things don't pan out. I'll try to have a policy of openly sharing my ideas because I don't care if I'm scooped. I just want cool stuff to be built, I don't care who does it (though don't worry, when I collaborate with others I don't use this policy because I understand others feel differently). My goal is to figure out if there's a way to make systems generate open-ended content that seems "human culture-like" in it's form. While I like Stanley's approach and will use many of the insights of POET and it's follow ups, just body locomotion seems like it'll cap out in a way that's distinct from culture. Maybe there's a way to transfer via some reduction, but I'd like to see if it's possible to push

Open-ended concept creation through scaffolding

(The scholars program is over, but I'm going to continue doing independent AI research for at least a year, as I continue to build my research chops. My posts might be less frequent, as I also spend some of my time now doing game design, see my other blog ) Most of my research direction is trying to determine if it's possible to create a language model with very little (or no) training data. Ideally, it should still have scaling laws. If we could do this, we could arbitrarily scale our models to continually increase capabilities, and it wouldn't be limited by the presence of data. Yet, in some ways this is non-sensical. The set of all distributions is very large, so a single model cannot "learn" them. Thus, it must focus on learning a subset of all distributions. Somehow we'd need a way of specifying a distribution that isn't too large to be intractable, while still containing human language and most of the things we care about. And even if we did that, th

OpenAI Scholars Project

My blog post is split up into three parts: Literature Review (incomplete, this is a very large field of study): Lenses of Analysis on Opinion Spread . This post is an older version that contains more detailed discussion and why I didn't include some things (like advertising), for those that are interested. Experiments and Analysis Section: Training models on their own outputs . Speculative Section:  What we'll (probably) see before we build artificial cultures . This is just some research ideas and thoughts I had after doing lots of reading, take them with a grain of salt.

Training models on their own outputs

Lets start by considering the question: what is the relationship between model inputs and model outputs? We can imagine three categories settings might fit into: - General setting : Model outputs are used as later model inputs. More generally, there's some map from model outputs to model inputs. For example, model actions -> next world state (in RL, the next Go board position, etc.), model output distribution -> generated text, etc.   - Fixed input set : While the set of inputs are fixed ahead of time, the model can influence which inputs are chosen. Call the thing that decides inputs based on model outputs the environment. Active learning, and auto-induced distributional shift seem to fall into this category.  - Input order unaffected by model outputs : (input, model output) pairs are used as targets for training, but model outputs don't lead to different inputs, and the input distribution is fixed and unaffected by the model. This is a standard learning setting. Because

What we'll (probably) see before we build artificial cultures

Introduction  A common research goal in AI is to build a virtual society that is as complex and rich and interesting as human society. Human culture seems to have this really cool open-ended nature that we haven't been able to replicate in silicon yet, and we'd like to do that.  Right now, the research has been mostly around four things: - Grounded language - Variants of cooperative and competitive games - Making agents learn in more complex environments - Teaching agents to learn learning algorithms (meta-learning) However I wanted to get a better picture of what it'll look like when we start getting closer to human culture. Of course, we can't know the answer, and AI tends to break our expectations. Still, I'd like to form a better hypothesis* to try and guide my future research in this direction. The most natural thing to do is to look at what happened for humans. This is where theories of evolutionary history of culture comes in. There's a pretty big debate

Continuing building the research mountain

(not my biweekly post, just some thoughts I've had about research) Previous posts: Beware the important problem: aka Scope Creep in Research ,  Letting the problem shape the direction you go , first part of  Research Processes and Inductive Linguistic Biases My outside view of research was that it was like "slowly chiseling away at problems, breaking off pieces of them until eventually we break down the whole thing". But these days, I feel like a much better analogy is "building mountains". We have some really high up cloud we are trying to reach, in a pretend world where clouds are fixed and don't move. We can't jump up there immediately (there's no viable approach), so we need to start with some simple sub problem. It makes sense to pick the smallest sub problem you can find, a minimum viable product, some tiny spec of cloud slightly off the ground. You build a small mound around that research direction, and now you can reach a little higher. You l