Decision Making; Reward Matters

In one of the previous post(https://leakyintegral.blogspot.com/2021/03/decision-making-evidence-everybody.html), we had discussed the drift-diffusion model the experimental evidence found for validating it. We can also discuss some of the insights that I got from the paper “The time course of perceptual choice: The leaky, competing accumulator model,” Usher et al(https://doi.org/10.1037/0033-295x.108.3.550). This paper and the result presented in it seems to be further pushing the discussion on decision making.But what about the time? Whenever an agent has to take a decision, we will have time constrain. It is limited, and the agent would expect a reward. It is the optimization of the reward that is making a better decision. Previously the model we had discussed was completely silent about these aspects.

(adsbygoogle = window.adsbygoogle || []).push({});

First, let’s look at what we had discussed in the case of decision making. In that model, we are essentially integrating the shreds of evidence to a constant threshold. Let’s now ask why that model in the first place. Any reader who has the least knowledge of statistics would agree that the model makes good mathematical sense. The conditional probabilities could be explained quite well. And also, such a build-up of neural activity is seen as explained in the previous post. It also gives an accurate explanation of the process. Hence, it is a widely accepted model.

But, since it is integral, we can ask how the response changes when there is a change in the evidence itself. Let me make it more clear. Suppose you have a pattern in which there are N blue and N+1 green dots. Each time when you are observing it, you understand the underlying pattern. If you can recognize that green dots are more in that, that is because of the accumulated evidence that we have. Now, let’s say things are just reversed. Then do you think a model that integrates all those conditional probabilities would be fast enough to respond to that change, and it can make a faster prediction? Of course, no, when there is a change in perceptual evidence, in simpler terms, the system you are looking at, the integrating model would be slow in giving the right prediction.

(adsbygoogle = window.adsbygoogle || []).push({});

If you are considering the equation that we have paused in the previous post, we can see that we are assuming that the sequential samples of evidence are independent. Then we can use such an expression. But, for realistic problems, it suffices to say that this isn’t completely true. There could be some redundancy among the elements of the set, e. For taking this into account, we need to extend the Bayes rule to n+1 variable for nth evidence such that we are actually calculating the likelihood of e_n given A, e₁,e₂,…, e_n-1.

log(1-a/a)< log(P(A)/P(B))+ [log(P(e₁|A)/P(e₁|B))+…+log(P(e_n|A, e₁,e₂,…, e_n-1)/P(e_n|A, e₁,e₂,…, e_n-1))

But if e_n is independent of e₁,e₂,…, e_n-1, then:

P(e_n|A, e₁,e₂,…, e_n-1)= P(e_n|A)

That is, if the samples are independent, we can sum the log-likelihood. This is very intuitive from the statistics that when two events are independent, the probabilities could be multiplied to get the probability of both occurring together.

Consider the case when e_n is completely predicted by previous samples. Then we gets log(1), and we are adding zero to the accumulated evidence, As it is redundant.

P(e_n|A, e₁,e₂,…, e_n-1)=1

So, each of the probability terms within the log should be multiplied with

M_n|A = P( e₁,e₂,…, e_n-1,e_n|A)/(P( e₁,e₂,…, e_n-1|A)*P(e_n|A))

And also in case of B. where M_n|A is a measure of mutual information between nth sample and previous ones.

What does this suggest? This suggests that we should only accumulate the evidence up to an extent where it is novel. Avoid the mutual information that it contains with the previous events, such that the probabilities are now independent. So for implementing this, we should consider the independence between the terms.

Ideally, one should calculate all the mutual information to get the final result. But, as the set of evidence becomes larger and larger, the computational cost would become larger. And also, one shouldn’t have enough data or experience to calculate all those joint probabilities. So, can we expect that the brain is doing it in an ideal way? Instead, we can predict the next sample, and if there is a difference between the next sample and the prediction from all the previous samples, it has some degree of novelty. And accumulate only those who have this difference, that is, those we failed to predict. So, we are interested in the change that is happening from our prediction. But, we can also have noise in the system, which can also contribute to this change. That could be filtered out using a lowpass filter. So, the strength of the novel evidence would decrease as the number of samples increases. The integration essentially gives a curve that looks much like a saturation curve, where there won’t be any change after some point. Wait, what were we doing? we were doing a leaky integration!

Now, let’s ask, why a leaky integral? We had seen that the model has to respond to changes in the system under observation. Compared to an integral, a leaky integral responds more quickly. Still, it can meet the threshold with the urgency signal. So, we made it quick enough to respond to external changes.

(adsbygoogle = window.adsbygoogle || []).push({});

But, what about accuracy? It is seen that the integrating model minimizes the time required to reach the threshold. But what is the level of accuracy we are looking at? Is it worth having a higher accuracy? Suppose some weird animal is coming to attack you. You need to understand the threat based on the new animal’s perceptual information that you have never seen before. You are 90% confident that it is harmful. Will you wait until getting 95% or 99% accuracy? Of course not. It is actually a speed-accuracy trade off that maximizes the reward rate.

p(t) is the probability of the favorable outcome, U is the utility of that outcome, C is the cost, t is the time expenditure on decision making, m is the time spent moving. The term d is the delay before one can try again.

We have seen how the graph of the probability function may look like. For this expression, we can expect a graph having a peak and a fatty right-handed tail similar to poison distribution with the given form. So we are interested in the maxima where there are zero slopes for this function.

RR’=0

This gives us:

p\'(t)= p(t)*(t+m+d)+ U/C

This is the best time t to take action. Depending on how hard the task is, we can get different points in the probability curve, which gives us a curve that determines how long we have to wait, assuming that the agent is acting as soon as it crosses that line.

But the neural data suggest that the threshold is almost constant, and there could be another signal, which we can call an urgency signal, which pushes the neural activity towards the threshold even if the required evidence isn’t accumulated. So we actually multiply our result after leaky integration with urgency signal.

The model we had discussed is called an urgency gating model. Where we use a leaky integral, and is better than the previous model, we have discussed. Response time indeed matters for a real-world situation in which we need to consider the reward that an agent is getting.

(adsbygoogle = window.adsbygoogle || []).push({});

References:

[1] Usher, M. & McClelland, J.L. The time course of perceptual choice: the leaky, competing accumulator model. Psychol. Rev. 108, 550-592

Share this:

Related

Leave a comment Cancel reply