Posts

Showing posts from April, 2021

OpenAI Scholars Project

My blog post is split up into three parts: Literature Review (incomplete, this is a very large field of study): Lenses of Analysis on Opinion Spread . This post is an older version that contains more detailed discussion and why I didn't include some things (like advertising), for those that are interested. Experiments and Analysis Section: Training models on their own outputs . Speculative Section:  What we'll (probably) see before we build artificial cultures . This is just some research ideas and thoughts I had after doing lots of reading, take them with a grain of salt.

Training models on their own outputs

Image
Lets start by considering the question: what is the relationship between model inputs and model outputs? We can imagine three categories settings might fit into: - General setting : Model outputs are used as later model inputs. More generally, there's some map from model outputs to model inputs. For example, model actions -> next world state (in RL, the next Go board position, etc.), model output distribution -> generated text, etc.   - Fixed input set : While the set of inputs are fixed ahead of time, the model can influence which inputs are chosen. Call the thing that decides inputs based on model outputs the environment. Active learning, and auto-induced distributional shift seem to fall into this category.  - Input order unaffected by model outputs : (input, model output) pairs are used as targets for training, but model outputs don't lead to different inputs, and the input distribution is fixed and unaffected by the model. This is a standard learning setting. B...