AI and Decision Making

Nov 17, 2023

One common thread in Data Science: The Hard Parts is that data science creates value by improving our decision-making capabilities. While I now take this premise for granted, it took me some time to fully understand its power and generality. In this post I will argue why this is indeed the case, and reevaluate its relevance in the age of Generative AI.1

LLMs, prediction and decisions

To motivate the discussion, let’s see how large language models (LLMs) work. To be sure, I will be talking exclusively about autoregressive language models like GPT-4, Llama-2 or PaLM-2.

With these models you try to predict the next word in a sentence given the context provided by previous words:2

\(\text{Prob}(x_n|x_{<n})\)

It’s quite astonishing that from this relatively simple prediction problem we get impressive performance in tasks ranging from text understanding and generation to reasoning, problem solving and so much more.

But what matters for this discussion is that underlying GPT-4 there is a prediction model. Moreover, to move from GPT-4 to ChatGPT we transform this prediction into a decision by choosing the word with the highest probability.

For instance, in the sentence “To become a data scientist I need to learn _____”, you may end up with the following candidate words and probabilities:

programming (35%)
statistics (22%)
ML (17%)
analytics (7%)
visualization (3%)
(other words) (16%)

These probabilities are obtained from the model, and ChatGPT then replaces the blank with the most likely word (“programming”). It then repeats the process many times, each time using the updated context to come up with a new word, hereby creating full sentences, paragraphs and even books.

This trick of transforming a prediction from a classification model into a decision arises very naturally in many other settings. In Chapter 14 of Data Science: The Hard Parts I use it to show how smart thresholding (of the predicted score) and confusion matrix optimization can quickly lead to data-driven decisions.

What’s in a decision

All decisions have a common structure, which includes the set of alternatives to choose from (known as the choice or action space), and a method to rank these alternatives (known as a reward, utility or loss function).

A third ingredient is the state of the world in which the decision is made. A textbook example is whether to take an umbrella or not before leaving home for the office. This choice itself depends on whether it rains or not (the states). To be sure, your reward depends on the state of the world and the action taken, since if it rains, you’re better off having an umbrella, but if it doesn’t, you’ll have to carry it around with you all day. Unfortunately, the weather is unknown to you, so a fourth ingredient is a probability distribution over all relevant states of the world.

To recapitulate, most decisions can be tackled if we have a set of alternatives, the states of the world, the rewards derived from each alternative in each state of the world, and a probability distribution over states.

How should we decide?

This question has been studied throughout the ages, and naturally, each person may come up with a different answer. However, one commonly used method is to choose the alternative that maximizes the expected reward.

For the case of the umbrella, and assuming you have all of the necessary inputs, you can compute the expected reward of action a (take or leave the umbrella), and choose the action with the highest expected reward:

\(\begin{eqnarray} Er(a) &=& \underbrace{p \times r(a,r)}_{\text{rains}} + \underbrace{(1-p) \times r(a,s)}_{\text{doesn't rain}} \hspace{20mm} (1) \end{eqnarray}\)

The beauty of this approach is that, in one single equation, it considers both the underlying uncertainty and the rewards. You can then sort all of the alternatives and choose the top one.3 In practice, you have transformed predictions into decisions.

What is machine learning (ML)?

Equation (1) can be operationalized by making a prediction of the expected rewards, and prediction is exactly where ML shines. This is quite handy in the age of big data and vast compute resources, since we can now easily transform better predictions into better decisions.

For instance, just this week, researchers from DeepMind published a paper showing that Graph Neural Networks can be used to accurately make 10-days-ahead weather predictions that are also very fast to compute. So literally, you can now solve very difficult problems like the weather prediction example above. You just need probability estimates that can be plugged into Equation (1). Alternatively, you can skip this intermediate step and model expected rewards directly, possibly moving from classification to regression.

Once you start viewing prediction as a natural input for improved decision-making you will also start finding ML use cases everywhere.

Deep reinforcement learning and sequential decisions problems

The link between decision making and machine learning is even more direct with reinforcement learning, where we move from static to sequential decisions. As the name suggests, here you make choices in different time periods. This is a considerably richer setting, where some complex phenomena can arise. For instance, making a decision today can alter the action space you face tomorrow, your beliefs or the expected rewards themselves.

In chess, for example, each player takes turns and makes a decision that changes the future state (current board configuration) and the set of possible actions. Many other games of strategy and video games have a similar property, and thus can all be cast as sequential decision problems. But this also applies to decisions outside of the realm of games, for instance when designing smart robots or autonomous vehicles, when deciding to invest your time and money in learning deep neural networks, or even if you’re considering getting married or having children.

In sequential decisions you use the same approach of maximizing the expected returns, but the possible time-dependence makes the problem substantially more complex.4 The fields of dynamic programming and reinforcement learning were developed to provide solutions to these and other problems.

Just to give you a flavor of the type of complexities that arise in dynamic settings, consider the multi-armed bandit problem. In this problem you have k slot machines, and you need to choose one and only one to play each time. To make it sequential, you will repeat this many times. If you knew the probabilities of winning and the corresponding payoffs, you can apply the same expected reward maximization procedure as in a one-shot, or static problem.

However, these are unknown to you, so you must choose between exploration and exploitation. Say you chose the first machine on the first round, and won. Should you keep playing this slot machine for the remaining periods, or should you keep exploring other machines? If you decide to exploit, you may be missing out on superior strategies. But exploring is also costly.

You can use the following code snippet to simulate k-armed bandits, and solve it with a greedy or epsilon-greedy algorithm.5

def run_greedy(probs, seed, length, payoffs, epsilon,print_results=False):
    """
    Simulate and solve multi-armed bandits using a greedy algorithm
    probs: array with winning probabilities for each bandit
    seed: random seed for replication
    length: number of periods you want to play
    payoffs: array with the payoff for each bandit if you win
    epsilon: if 0, greedy bandits, if >0 epsilon-greedy
    """
    np.random.seed(seed)
    k = len(probs)
    rewards = np.zeros((length, k))
    exp_rew_mat = np.zeros((length, k))
    bandits = np.arange(k)
    won, rews, draws = [], [], []
    exp_rew = np.array([0.5]*k) # this is the current exp. reward
    cnt_rew = np.array([1]*k)
    for t in range(length):
        exp_rew_mat[t, :] = exp_rew
        if np.random.rand() > epsilon:
          ind_max = np.random.choice(np.flatnonzero(exp_rew == np.max(exp_rew)))
        else:
          ind_max = np.random.randint(k)
        curr_band = [ind_max]

        oth_band = list(set(bandits) - set(curr_band))
        rnd_pay = np.random.rand()
        draw_t = rnd_pay < probs[ind_max]
        pay = payoffs[curr_band[0]] if draw_t  else 0
        rewards[t, curr_band] = pay
        rewards[t, oth_band] = 0

        won.append(curr_band[0])
        rews.append(pay)
        draws.append(rnd_pay)
        # update expected rewards
        cnt_rew[ind_max] += 1
        exp_rew[ind_max] = exp_rew[ind_max] + (1/cnt_rew[ind_max])*(pay - exp_rew[ind_max])

    rewards_df = pd.DataFrame(rewards, index=range(length), columns=[f'b{i}' for i in bandits])
    rewards_df['won'] = won
    rewards_df['reward'] = rews
    rewards_df['draws'] = draws

    rewards_df[[f'exp_rew_b{i}' for i in bandits]] = exp_rew_mat
    total_rews = dict(rewards_df[[f'b{i}' for i in bandits]].sum())
    if print_results:
      print(f'Rewards = {total_rews}')

    return rewards_df

# run a simulation with two bandits.  It's optimal to always play the first bandit: but if you had bad luck this may not be the case!
payoffs = np.array([2,1])
probs = [0.5]*len(payoffs)
seed = 12
length = 10000
greedy0 = run_greedy(probs, seed, length, payoffs, epsilon=0)
greedy05 = run_greedy(probs, seed, length, payoffs, epsilon=0.05)

LLMs and agents

One wonders if the same principle applies to LLMs. Is it true that these models create value by enhancing our decision-making capabilities? Unfortunately, “creating value” has become more of a sales pitch used by many in the data space, so let’s simplify it to make it more practical.

Would you pay to gain access to ChatGPT?

If your answer is positive, the business case must also be positive.6

Let’s go through some simple examples:

Text auto-completion: arguably, the most direct use case of LLMs is text auto-completion. I wouldn’t pay for it, but writers may become many times more productive, and would thus be willing to pay for it. In this simple form it creates value without improving your decision-making capabilities, it just makes you more productive.
Writing complete texts: but LLMs are so powerful that you can actually create complete texts with just a simple prompt. Since customers are willing to pay for such texts, some value must have been created. Moreover, I also argue that the creative process itself involves some sort of non-trivial decision making.
Software development: one of the most common use cases for LLMs is writing code. There’s some evidence that it makes developers more productive, so again, I argue that there is positive value created, and at least some improved decision-making capabilities.7
Customer support and back-office automation: automating customer support and back-office flows are also common use cases. This saves costs by substituting human labor with more affordable AI. Nonetheless, it is a form of value creation, but one without improved decision-making. Interestingly, one paper finds that LLMs can actually augment customer support agents, so there might be some improved decision-making (and without substitution).
Personal assistants and knowledge bases: AIs can also become personal assistants, even more so if they’re connected to a knowledge base that makes sense for a specific context. One such example is Shopify Magic, but there are plenty of similar solutions in the market. Personal assistants can make you more productive without improving your decision making (just by delegation of costly, but simple, tasks), but knowledge-based personal assistants can indeed improve the quality of our decisions.
Agents: LLMs have memorized quite a bit of information allowing them to excel at many tasks and tests while also exhibiting zero- and few-shot learning capabilities. Wouldn’t it be great if we actually let them perform some actions for us? One straightforward way to proceed is to allow the AI to make API calls, these themselves performing some type of actions.8 One can even envision the LLMs acting as the brain of the agent, or the OS, as Andrej Karpathy has proposed recently (figure below).

Final comments

ML and more data are the engines of the current AI revolution, as they allow us to make better predictions. My aim in this post was to highlight that at its core, prediction is the fuel for better decision-making. So how can you help your organization create value through AI? The “low-hanging fruit” is to go look at the strategic decisions underlying your own business and turbo charge them with ML. To me, this is a “first principles” approach towards AI. The more challenging path is to explore the new set of possibilities that these advanced techniques enable.

Analytical Skills for AI and Data Science goes through this argument in detail, but most importantly, it also shows you how to do it in practice.

Instead of words, most LLMs use smaller units called tokens, but the idea is the same.

You may encounter some difficulties, however. See Chapter 7 in Analytical Skills for AI and Data Science.

Sutton’s and Barro’s book “Reinforcement Learning: An Introduction” provides a thorough introduction to the subject. On deep reinforcement learning, I found Plaat’s book Deep Reinforcement Learning accessible (and he also provides accompanying Python code).

See Sutton’s book for details.

Chapter 5 in Data Science: The Hard Parts uses the concept of “incrementality” to help you build business cases like this.

See here and here, for example. Even if you distrust the authors (from Microsoft and Github, so there might be a conflict of interest), if you’ve done some coding yourself you’ll share the opinion that ChatGPT has made you more productive. My personal Aha! moment with LLMs came while coding, so you can count me on the group that’s willing to pay for this.

One way to proceed is to use OpenAI function calling. Alternatively you can finetune an LLM like Llama-2 with a corpus of API calls (see for instance the Gorilla, Toolformer and ToolLLM papers).

The Decision Matrix: Bridging AI and Decision Making

Discussion about this post