Feedback Loops in Opinion Formation

The particular problem I want to try and approach is "how do we talk about systems changing human values and opinions, through the feedback loops that exist in those systems?"

(note: since I've wrote this post, I realized how much Scope Creep matters in research, and my project has been pruned to Training Machine Learning Models on data produced by those models)

This post is a summary of my initial thoughts on the problem, and some of the directions I plan on taking my research during the next two months. Nothing here should be read as conclusive or final, this is mostly documenting my thoughts in progress.

Research Directions


Here are a few ways I can think of to approach this problem, alongside their pros and cons. I know pro-con analysis isn't great, but in this case it seemed like the most sensible way to present things.

Making a formal model of human cultural evolution

Pros:
- Allows running intervention experiments to verify hypothesis
- Can help pave the pathway towards deeper understanding of culture
- May be useful for data aug in reinforcement learning, in a similar way to curiosity driven learning
- Really cool

Cons:
- Very hard problem, it's an obvious thing that lots of people are working on, and probably requires lots of compute to get working
- Might be missing out on some details that exist in human culture, and so intervention experiments may only be somewhat informative
- Arguably, any model that is sufficient will have emergent behavior: "complex behavior that is not obvious by just looking at the individual parts". This means that having fully specified system dynamics would not be sufficient to give us a detailed understanding, and we'd still have to do the "slow walk upwards" of positing theories, finding when the work and don't work, and further refining the theories.


Making a data-driven simulation of human internet interactions


Pros:
- Modeling something helps give a more detailed understanding of the underlying dynamics

Cons:
- Agent based models seem to require lots of parameter tuning
- Requires lots of scraping data and data cleaning
- Modern approaches are using neural nets, which require better clarity techniques to give much insight
- May not be robust to distributional shifts, and distributional shifts of opinions is the thing I'm trying to model
- Lots of existing research already


Studying simplified formal systems using modeling and game theory

Pros:
- Can do lots of formal analysis and gain deep insight that might be applicable to the real world
- There's some relevant prior work already, but still plenty of places to explore
- Once you formalize a setup, you can analyze places where it's missing important details and refine it
- May help inform approaches to "Making a formal model of human cultural evolution"

Cons:
- By definition, will be missing out on pieces that might be relevant for understanding the real world
- Might end up giving bad recommendations if the model is too abstract
- Agent-based models exist, and require lots of parameter tuning to get the dynamics right. Formal analysis is very dependent on the choice of agent-based model, and insights should be verified against human data. This implies that a large amount of the analysis might end up being unless because it studies the section of the parameter space that isn't relevant in practice.

The particular things I want to model are:
- Opinion spread
- Interaction of recommendation systems with opinion spread


Looking for underlying "Laws" in opinion formation data that can be used to validate models, see how well existing models capture known laws, or measure the variables of laws in different social contexts


Pros:
- There's a lot of models, and "laws" not captured by existing models provide new opportunities to improve modeling
- Models that preserve known laws can stay simple enough to be analyzable, as opposed to "fitting to real world data"
- Insight into how well various laws apply to new datasets like "The Pile" could be useful to inform future modeling efforts (though I'd have to tweak the extraction scripts to also record time data)

Cons:
- Finding new laws is unlikely (someone probably would have found it already), but the other ideas here are still valid to pursue
- Data extraction is a pain, so it's probably good to use existing datasets


Studying what happens when machine learning models are trained on their own outputs

Pros:
- Pre-trained nets seem to be "mode covering", so in a way they model the distribution of "opinions" on the segment of the internet they see
- While it is experimental, the results here are likely to be relevant for other models
- Allows for easy testing of "what happens when you have x% of the data generated by models" and "what happens when you have the temperature at p", both of which are very relevant for real world situations
- Tweaks to this framework can help model disinformation risks

Cons:
- May not capture the "viral spread" of ideas in the way that happens in reality. In principle this could be addressed, but isn't captured in the naïve model

Generally I'm thinking that Studying simplified formal systems using game theory and Fine-tuning neural networks on their own outputs seem like the two directions that I have the most potential to contribute towards in my limited time remaining in the Scholar's program, especially since it's similar to the expertise I have.

Studying simplified formal systems using game theory

Models of shaping public opinion

There seems to be some literature among Chinese government labs, and admittedly some of them make me nervous. For example, Evolutionary Game Model of Public Opinion Information Propagation in Online Social Networks has some unsettling remarks about how it's the government's role to "to guide public opinion and purify cyberspace (reducing the number of negative public opinion in OSN)". It's clear that the Chinese government is hoping to use research about opinion formation to improve its ability to control free speech.

However, this kind of research is also really important to help us understand the formation of consensus and the dynamics of opinion formation in settings where we have free speech. For example, see Algorithmic bias amplifies opinion polarization: A bounded confidence model. So I think unfortunately this is just one of those "dual use technologies", which seems to be pretty common in AI research.

I'm also less concerned about the governments ability to manipulate public opinions after reading advertising literature. In theory, advertising agencies' entire purpose is to manipulate public opinion in favor of their product, to get consumers to consume their product. Managing Brands in the Age of DIYBranding: The COBRA approach gives a fairly good summary of the history of advertising:
- Before we had decent communication technology, advertisers had a lot of say in the public exchange of information, and consumers were mostly passive. This lead to a model of "dumb consumer" where advertisers were trying for a stimulus-response type conditioning.
- Once we started to get decent communication technology, consumers had a more active role in the production of brand image. Not only were consumers mere informed, they also began to play an active role in shaping brand image, and this stimulus-response approach became outdated and ineffective.
- Because advertisers now have very little control over what consumers do, the new approach, especially on social media, is to maximize brand engagement. 

It seems sensible that as consumers become more informed, they'd flock to the products that are the highest quality, doing some weighted sum of attributes they care about. This is certainly a component of advertising, but it's not a very big one. My impression is that this can disqualify products, but there's a "minimum bar" past which other advertising things matter more.

Because advertisers naturally want to have control over consumer behavior and brand perception, their view of consumers is essentially a "social immune system". They use some method of advertising, it works for a while, but eventually people develop a resistance to that method and they have to shift to another method. I do worry that machine learning might be able to "continually create new methods before we have a chance to develop immunities", it's unclear to me how quickly the "social immune system can adapt". For example, it seems plausible that the competitive, adversarial nature of marketing may lead to AI developing social immune responses as well as methods of advertising, so perhaps our cultural development would just speed up (note this argument doesn't apply to governments).

Anyway, a more realistic approach is just giving up control of brand image, and letting the consumers shape the story. This is why the shift has moved to maximizing consumer engagement with the brand. This can happen through setting up things that encourage users to produce content themselves, but the nuances of how to do that properly is really tricky because ultimately companies really want to have control over things, and bad PR is easy to encounter. Still, I feel like this is something that Blaseball did really well.

Anyway, the takeaway is that shaping public opinion is very difficult, at least with our current technology. I think that with most formal models, these effects can mostly be abstracted away, so advertising literature ended up being less helpful than I expected.

Cultural Evolution and Models of Opinion Dynamics

Currently, I'm in the stage of "okay I understand the tools and methods alright, but still haven't quite formulated a concrete problem", so that's what is next to do. Here's the relevant reading I've been working through:

Opinion dynamics: models, extensions and external effects gives a great summary of many models of opinion dynamics. I've also been working through A Tutorial on Modeling and Analysis of Dynamic Social Networks. Part I (see also part II).

Aligning Popularity and Quality in Online Cultural Markets, Disentangling the Effects of Social Signals, and Popularity Signals in Trial-Offer Markets have some techniques of feedback analysis that seem useful, though opinions are more "fuzzy" and harder to measure than markets.

Degenerate Feedback Loops in Recommender Systems and Feedback Loop and Bias Amplification in Recommender Systems are decent starts in discussing feedback loops in recommendation systems.

Cultural Evolutionary theory has been really insightful. A systems approach to cultural evolution is getting at this point of "maybe a single model can capture a wide range of dynamics, and some dynamics only emerge when you study how simple systems interact".

Evolution Without Variation and Selection and Modeling Discontinuities in Cultural Evolution study Reflexively Autocatalytic and Food set-generated (RAF) networks. Chasing the tail: The emergence of autocatalytic networks has a good summary of the theory behind these models. The high level picture is that when you consider cultural development, it has these "spikes" where some new thing is developed, and it causes a trickle effect in changing a lot of other things. I always explained these spikes as "escaping a local optima", and while that is sorta true, there's a bit more nuance here that "escaping a local optima" doesn't cover.

The insight of RAF networks is that the changes that "made a big difference" are new technology that caused other changes that "catalyzed" more use of the new technology. For example, consider the invention of the car. As the technology started being used more, infrastructure started being built, which then further incentivized the growth of more people using cars. The "initial bump" to making a car and building the initial technology was hard, but once you had cars there was eventually feedback loops that lead to more cars (I get this is oversimplified, and car companies actively sabotaged other industries in some cases, but still as a high level picture it's fairly accurate). RAF networks arose in the origins of life research, when they were trying to understand how a chemical soup could result in complex molecules. It turns out there are some "self-catalyzing networks", where the initial reactions to make some complex molecule A are slow and happen rarely, but once you have A it'll catalyze making some molecule B which then catalyzes production of molecule A, which then catalyzes production of B, etc. (for those unfamiliar, "catalyze" here just means "speed up production of"). This neatly captures this idea of some "initial difficult bump" that is then self-perpetuating. These "self-catalyzing loops" are what is studied in RAF networks, and there's a lot of cool complexity theory results covered in the Chasing the Tail paper linked above.

I think RAF networks are relevant because they seem promising as a way of modeling the cultural drift of opinions over time, but it's possible there are simpler models. For example, Taylor's law in innovation processes studies Zipf's lawHeap's LawTaylor's Law and shows how they can all be modeled in a fairly simple Urn setting. I really appreciated this paper because it helped make this point of "you can cut through lots of the modeling noise by making sure that the models meet the 'minimum bar' of matching these fairly simple empirical laws we see in real world systems".

Finally, Cumulative culture and complex cultural traditions made this important distinction that when we talk about "cultural evolution" we are actually mixing up 4 separate concepts: adaptiveness, complexity, efficiency, and disparity.

I'm learning that when one is doing research, they can end up doing tons of reading and very little actual research (I still have lots of reading I want to do in Psychology and Neuroscience on this topic!). I fall into this trap because I've had plenty of times where I spend a few weeks on some problem, then do a little more thorough searching and find someone that has already done the thing I was working on in much more detail. Still, it's a tradeoff, and reading does take time, so I'm learning that eventually you just gotta start tinkering with things. The more simple it is, the more it's okay to tinker, I think. Also, if you're short on research time sometimes you just have to accept that you might miss stuff and that's okay. 

In the spirit of trying things, before I did a lot of reading, I tried out a simple opinion model where we have K users, each with a 2D opinion vector. During each step, each user a is randomly paired with another user b. Initially I tried just doing a dot b: if their dot product is negative they get pushed apart, if their dot product is positive they get pulled together. This leads to them all diverging to infinity, so I clamped their magnitudes at 1.0. This lead to "polarization", where users would group at two separate points, but which points those were depended on initial conditions.

There's a generalization of a dot product called a "Bilinear form" where you do a M b^T, for some matrix M. I found there were three different kinds of behaviors you'd see.

Polarization:





Centrism:




And "Chaos", where they chase each other around in circles:



Anyway, that's just a fun toy model, I'm not sure if it's actually useful for opinion modeling after doing more reading. In general, I'm still very much in the brainstorming/learning stage here, and will be posting updates as I make this part of my direction more concrete.

Training Machine Learning Models on data produced by those models

I haven't really found any literature yet on this topic, so instead I've just started doing some analysis and experiments.

There are three categories of things to look at here:

- Classifiers
- Continuous generative models
- Discrete generative models

Classifiers

Starting with a linear binary classifier, I can do some detailed experiments and analysis. Here's how the setup works:

1. The classifier will be "dot product with a vector, the label is the sign" (linear classifier with no offset)
2. Generate data points sampled from Normal(0, 1) in D dimensions.
3. Generate a random vector as our "ground truth". Make N labeled points, and use this to train a classifier.
4. Use that classifier to produce more data.
5. Train a new classifier on that data
6. Go to 4 and repeat

This captures the notion of a model being trained on the data it outputs. The high level picture of the behaviour is fairly simple: there is some error incurred, dependent on N and D. That error is how far you'll be off after making a new model. Each time you do a step, you continue to do a random walk with step size sampled from that distribution.

X axis is steps, Y axis is angle between true classifier and current classifier, in radians (0 is same vector, PI is completely flipped)

That picture was with N=1000 points at each step. If you only have N=10 points at each step, your error is larger so your step size is much higher.


Instead of throwing out the data after each step, if you continue adding it to a dataset that grows larger at each step, it's a random walk with decaying step size (this used N=10 because N=1000's dataset gets too large to run 10000 steps)

Those results were in 2D. As dimension D gets larger, your error is larger and you'll need a larger N.

X axis is N, Y axis is difference in angle after 1 step, plotting the mean of 1000 different random initializations

I'm currently working on analytic results to formalize the behaviour seen in the above experiments.

Two more things to note:

- If you generate a portion of the data at each step from the very first model, the random walk will have a "bias" towards reality. For very small amounts of data (for the linear classifier, just 1/100 is enough) this is sufficient to "pull" the random walk towards the correct place, so it rarely diverges very far before coming back. Of course, the larger the step size, the more chance it'll go far away before coming back, so the total amount of data (which determines step size) and the proportion of correct data (which determines "pull towards correct") both matter.

- A more general form of the above setting is multiple models producing data, each of which has a different step size. One way you can do this is by each model i sampling some proportion p_(i,j) of it's data from model j. Fixing the original model is just a step size of 0. You can analyze this by looking at the center of mass of the models (weighed by how much data they each contribute). If all step sizes are > 0, it'll still be doing a random walk, and the same analysis applies as above.

Continuous Generative Models

I'm going to use a simple "fitting a gaussian" model. I expect very similar dynamics to linear classifiers (random walk with step size sampled from the distribution of estimation errors), but I'll be trying this out in the next few days. Because this is a random walk, we would expect that after n steps we are sqrt(n) away.

Discrete Generative Models

The simplest setting is probably polya urn models, which are heavily studied.

However I think n-gram models are more relevant to the idea of "language models training on data produced by other language models", so I'm going to be studying those.

There's a few non-trivial decisions one has to make here that can matter for what your outcome is:

How do you generate the data?

I'm assuming that the model is originally given data that looks like this

<START> a a b a <END>

<START> a b a c <END>

<START> a b c d <END>

etc.

For bigrams I do this. In general, for n-grams, I have (n-1) <START> tokens. I think it's more useful to think about "context length" instead of n-gram, where a bigram model has context length 1. Then I can just say that there are contextLength <START> tokens.

I have the <START> and <END> tokens because this is realistic to how language models are fed data in the real world, and also because it lets me avoid questions about where to start generation.

However, now when I'm generating data from the model, you have an issue of not being able to control how long the outputs are. There are a couple different options:

- Truncate the generated data at some fixed length.

There are two decisions to make here:

- What happens if you get an <END> token before you reach that length?

APPEND: You could start generating a new sequence (starting with <START> tokens) and append it to your sequence.

END_EARLY: You could just allow the generated sequence to be shorter. To keep roughly the same amount of data at each step you can just generate sequences until the number of total characters you have generated is roughly what you wanted.

If you think about it, these two solutions are basically the same. With END_EARLY, we are generating stuff until we have about as many characters as we want. With APPEND, we are generating stuff until we have as many samples as we wanted, each of some length (so, as many characters as we wanted). From the models' perspective, both settings seem the same due to the <START> tokens not letting you see content from the previous sample. However, APPEND ends up needing to truncate more (see the bullet point below), so I prefer END_EARLY.

- What happens if you never get an <END> token but you have reached your target length?

If you append the <END> token after truncation, this is not a good idea because it'll add edges to <END> that didn't exist in the original graph, so the analysis gets much more messy and I think it's less reflective of reality. Bot comments might be cut off at the end of a sentence, but that's a plausible <END> boundary. It seems unlikely they would be often cut off in the middle of a word, for example.

If you don't append the <END> token after truncation, your learned model at the next step can end up with "sinks" that don't end in the <END> token. This happens if the only time it generated a string like "a b b" in a dataset is at the end of a truncated sequence. This isn't technically a problem, but it makes the analysis more messy when all these sinks start popping up, so it's better to avoid them.

You could just reject the sequence and generate a new one. This seems like the best solution to me. Just make sure that the "rejection size" is large enough that you don't significantly bias the sampling. This is another good reason for using END_EARLY.

So in summary:

- Start with a contextSize <START> tokens, and generate text until we reach an <END> token

- If the generated sequence is longer than some maxSize (maxSize is roughly size of largest input of originalDataset*10 so the vast majority of sampled things aren't rejected), throw it out.

- Keep generating sequences until number of tokens generated is >= dataSize

This will have some variance in length per run, but for large enough dataSize this should be negligible.

Using this strategy, we can say a few things:

- For any contextSize, after training our first model we can consider the graph where each node is a context (x_1, x_2, ..., x_contextSize) and each edge (x_1, x_2, ..., x_contextSize) -> (x_2, x_3, ..., x_contextSize, x_next) is the transition probability of generating x_next given that context. Thus, the behavior doesn't actually change for larger n-gram, the size of the graph just grows. Just looking at the graph structure itself (which is all that matters), WLOG we can consider bigram models. Any higher grams just restrict the graph so edges can't exist unless the context overlaps, and this constraint doesn't apply to bigrams. Of course, when considering how far we have diverged from the "original data distribution" we need to consider what our contextSize is.

- The probability of every edge in our n-gram graph goes on a random walk, up or down. The step size is smaller the more that edge is represented in the data. For every node that has multiple edges going out of it, eventually one edge will walk to probability 1.0 and the others will walk to zero. 

- We can say a little more than that. Every path from <START> to <END> that doesn't visit any nodes more than once has a non-zero probability of becoming the "converged path" that our random walk eventually converges to. Every other edge will be zero, and that converged path will have probability one. Thus, we will converge to determinism at the limit, and prune away everything except that converged path.

- "Not visiting nodes more than once" is important because any "loopbacks" will eventually be pruned away, because there is some point in that loop where a branch happened, and all branches eventually get pruned to a single output edge path.

- The character following any node is only dependent on that node, so at the limit of data the behavior of the random walk is only dependent on that node, nothing upstream or downstream. However, simple paths have deterministic length, while cycles will have some variability in how frequently they show up. This means that the step size can vary (for example, cycles might be overrepresented in the dataset, and thus their step size will be smaller), but I'm still figuring out the precise details of this.

I ran some experiments and verified that this is actually what happens for simple n-gram models. This isn't surprising and wasn't technically necessary, but I think it's good to check. I started with a word-level bigram model shakespeare and after 10000 steps it converged to "Please you see then?" and "By being miss\'d, I will not wish thee apart Cousin of duty,".

The next things to do are:

- This setup assumes that bigram models don't start with any prior over the sample distribution, and only consider the data they see. In other words, it assumes that the probability of any unobserved transition is 0, which is what makes a random walk to 0.0 or 1.0 so final. In reality, models usually start with a random prior on the distribution, so we could model this by giving every edge a small random amount of observations.

This would make "converging to determinism" no longer an issue, and we'd just see a random walk around the state space. For every output edge, it'll just randomly walk around. The distribution could get arbitrarily far from the original distribution, but it won't ever get stuck and (I think?) there will be a non-zero probability of getting back to the right place eventually.

If you give every edge a "fixed probability" then this is equivalent to having some data generated by a uniform prior, and it acts as a "pull towards uniform".

- Alternatively, we could use a prior of the previous model

This is sort of like the "append" model, but you can do a more general thing. The n-gram model is represented by counts for each edge. We could do some momentum thing, where newCount = p*observedCount + (1-p)*previousCount.

This is probably a more reasonable setting, because we are trying to model how the real distribution will be shifted. Assuming we aren't rounding the counts, this would also prevent things from ever converging to zero. 

Fine tuning a GPT-2 model on it's own outputs is another way of doing this with a more realistic prior.

- Consider what happens when a small amount of the data is pulled from the first model at each step (it has some data from the "true distribution")

Does this successfully "anchor" it? It seems like this would prevent converging to determinism (because for any edge in the original model there is a non-zero probability that has some representatives at any time step), but it's unclear how much of the data you need in order to prevent it from becoming too different. The "different step sizes for cycles vs paths" effects mentioned above might also be relevant here, where a small proportion might be enough to anchor the cycles but not enough to anchor the paths.

- Examine how the difference between the model and the actual distribution affects the above behavior. 

If we think of the "true distribution" as some n-gram model (for some n), we can consider smaller or larger n and see what kinds of distortions that gives us. This is one way of talking about what kind of problems we have due to imperfect modeling.

 - Try these experiments out with GPT-2

This seems helpful mostly to validate the above assumptions. The nuances of the distribution they have been trained on, how reliable the priors I have are, etc. matter.

Technically it would be better to train GPT-2 from scratch at each stage. However that would take a long time and I'm not sure that's particularly valuable. I think it makes more sense to start from the current distribution and adjust

Comments

Popular posts from this blog

My prediction of AGI in 2021-2023

Inductive Linguistic Biases

OpenAI Scholars Project