The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies

By: Stephan Zheng, Alex Trott, Sunil Srinivasa, Nikhil Naik, Melvin Gruesbeck, David Parkes, Richard Socher
The AI Economist learns dynamic tax policies that optimize equality along with productivity in simulated economies, outperforming alternative tax systems.
  • Economic inequality is accelerating globally and is a growing concern due to its negative impact on economic opportunity, health and social welfare. Taxes are important tools for governments to reduce inequality. However, finding a tax policy that optimizes equality along with productivity is an unsolved problem. The AI Economist brings reinforcement learning (RL) to tax policy design for the first time to provide a purely simulation and data-driven solution.
  • The AI Economist uses a two-level RL framework (agents and tax policy) to learn dynamic tax policies in principled economic simulations. This framework does not use prior world knowledge or modeling assumptions, can directly optimize for any socioeconomic objective, and learns from observable data alone.
  • Our experiments show the AI Economist can improve the trade-off between equality and productivity by 16%, compared to a prominent tax framework proposed by Emmanuel Saez, with even larger gains over an adaptation of the US Federal income tax and the free market.
  • Without endorsing this particular schedule, it is interesting that the AI Economist implements qualitatively different tax schedules than the baselines, with higher top tax rates and lower rates for middle incomes. Furthermore, it is robust in the face of emergent tax gaming strategies.
  • Moreover, the AI Economist is effective in simulations with human participants, achieving competitive equality-productivity trade-offs with the baselines and significantly higher income-weighted average social welfare. This suggests promise in using this approach to improve social outcomes in real economies.
  • Our vision for the AI Economist is to enable an objective study of policy impact on real-world economies, at a level of complexity that traditional economic research cannot easily address. We believe the intersection of machine learning and economics presents a wide range of exciting research directions, and gives ample opportunity for machine learning to have positive social impact.

Reinforcement Learning for Social Good

Our work fits within a larger context of recent advances in RL. RL has been used to train AIs to win competitive games, such as Go, Dota, and Starcraft. In those settings, the RL objective is inherently adversarial (“beat-the-other-team”). Machine learning has also been used for the design of auction rules. In this work, we instead focus on the opportunity to use AI to promote social welfare through the design of optimal tax policies in dynamic economies.

The Challenge of Optimal Tax Design

Many studies have shown that high income inequality can negatively impact economic growth and economic opportunity. Taxes can help reduce inequality, but it is hard to find the optimal tax policy. Economic theory cannot fully model the complexities of the real world. Instead, tax theory relies on simplifying assumptions that are hard to validate, for example, about the effect of taxes on how much people work. Moreover, real-world experimentation with taxes is almost impossible.

The Economic Model of Income, Labor, Skill, and Utility

Classic tax theory focuses on people who earn income by performing labor. A worker gains utility from income but incurs the cost of labor effort. At some point, the extra utility from additional income does not outweigh the cost of additional effort.

For instance, working on weekends might earn you more money, but the effort might not be worth it to you.

A key assumption is that people differ in their skill level. Low-skilled workers receive a lower hourly wage, and so earn less money than high-skilled workers for the same amount of labor. This leads to inequality.

The Central Tax Dilemma

As a policy goal, a government may prefer to tax and redistribute income in order to improve equality. However, higher taxation can discourage work and may particularly affect high-skilled workers. An optimal tax policy optimizes this balance between equality and productivity.

An illustrative view of how taxes can impact social outcomes. Left: Taxes can improve equality by transferring income. However, taxes can also decrease productivity, because they can discourage work. The AI Economist seeks a tax policy that optimizes this trade-off. The Pareto boundary is the set of maximal trade-offs. Right: Taxes impact productivity (total income, represented by the area of the big squares), and equality (the relative difference in sizes of the smaller squares). In this illustration, the AI Economist achieves the best trade-off.

A prominent tax framework, proposed by Emmanuel Saez, derives a simple optimal tax formula. However, this formula requires knowing how labor responds to changes in tax rates (“elasticity”) and makes strong assumptions, for example, that the economy is static and workers do not gain new skills. Other work has studied dynamic economic systems, but needs simplifying assumptions in order to attain analytical solutions. For an extended overview of related work, see our technical paper.

Economic Simulation as a Learning Environment

The AI Economist is a purely simulation and data-driven approach to the design of optimal tax policies. It uses a principled economic simulation with both workers and a policy maker (a "planner" in the economics literature), all of whom are collectively learning using reinforcement learning.

Left: Economic simulation in which four agents collect and trade resources, build houses and earn income. Right: wealth is not distributed evenly. The size of the disk shows total productivity, while colors show the percentage each agent owns. The trade-off between equality and productivity is measured by multiplying equality and productivity. 

Gathering, Trading and Building

The simulation uses a two-dimensional world. There are two types of resources: wood and stone. Resources are scarce: they appear in the world at a limited rate. Workers move around, gather and trade resources, and earn income by building houses (this costs stone and wood). Houses block access: workers cannot move through the houses built by others. The simulation runs this economy over the course of an episode, which is analogous to a “working career.”

Skill and Utility

A key feature is that workers have different skills. Higher-skilled workers earn more income for building houses, and thus more utility. Building houses also takes effort, which lowers utility. Workers also pay income taxes and the collected tax is redistributed evenly among the workers. Together, these economic factors and the various competitive drivers mean that workers need to be strategic in order to maximize their utility.

Emergent Specialization

Our economic simulation produces rich behavior when AI agents (the agents are the workers in the economy) learn to maximize their utility. A salient feature is specialization: AI agents with lower skill become gatherer-and-sellers and earn income by collecting and selling stone and wood. Agents with higher skill specialize as buyer-and-builders and purchase stone and wood in order to more quickly build houses.

Top: The state of the simulation over the course of a single play-through. Middle: Highly skilled agents choose to do more work, and earn larger incomes and experience more utility. Bottom: Agents trade with each other throughout. Upward bars indicate the agent mostly sold resources (earning coin) and downward bars indicate that it mostly bought resources (spending coin).

We do not impose such roles or behaviors directly. Rather, specialization emerges because differently skilled workers learn to balance their income and effort. This demonstrates the richness of the economic simulation and builds trust that agents respond to economic drivers. Complex emergent economic behavior has been previously studied in economics through agent-based modeling, but this has largely proceeded without the benefit of recent advances in AI.

Optimal Tax Design as Learned Reward Design Using Reinforcement Learning

Reinforcement learning is a powerful framework in which agents learn from experience collected through trial-and-error. We use model-free RL, in which agents do not use any prior world knowledge or modeling assumptions. Another benefit of RL is that agents can optimize for any objective.

In our setting, this means that a tax policy can be learned that optimizes any social objective, and without knowledge of workers’ utility functions or skills.

Two-level Reinforcement Learning: a learning framework in which agents and a policy maker both learn. Agents face a non-stationary learning problem. The policy maker changes taxes, which affects the effective post-tax income agents earn. Hence, for the same amount of work, agents experience different utility over time and need to constantly readjust what they believe is optimal behavior that maximizes utility.

Finding optimal taxes when both workers and the policy maker are learning poses a challenging two-level RL problem:

  • In the inner loop, self-interested workers perform labor, receive income, and pay taxes. They learn through trial-and-error how to adapt their behavior to maximize utility. Given a fixed tax policy, this is a standard multi-agent reinforcement learning problem with a fixed reward function.
  • In the outer loop, tax policies are adapted in order to optimize the social objective. This creates a non-stationary learning environment in which the RL agents need to constantly adapt to a changing utility landscape.
  • In other words: because the post-tax income for the same type and amount of labor can change over time, agent decisions that were optimal in the past, might not be optimal in the present.

This two-level learning problem poses a technical challenge, as the simultaneous behavioral changes of  agents and changes to the tax policy can lead to unstable learning behavior. We have found that a combination of techniques, including the use of learning curricula and entropy regularization, enable stable convergence. These are described in our technical paper.

AI-Driven Tax Policies Improve the Trade-off between Equality and Productivity

Our reinforcement learning approach produces dynamic tax policies that yield a substantially better trade-off between equality and productivity than baseline methods.

Periodic Income Taxes

We compared the AI Economist with

  • The free market with no taxation or redistribution.
  • A scaled 2018 US Federal tax schedule. This is a progressive tax: marginal tax rates increase with income.
  • An analytical tax formula proposed by Emmanuel Saez. In our setting, this yields a regressive tax schedule: marginal tax rates decrease with income.

All tax policies make use of seven income brackets, following the framework of the US Federal Income Tax schedule, but varying in their tax rates. The total tax is calculated by summing up the tax for each bracket in which there is income.

Episodes are divided into ten tax periods of equal length. Throughout each tax period agents interact with the environment to earn income and, at the end of the period, incomes are taxed according to the period's tax schedule and redistributed evenly to workers. The tax policy of the AI Economist allows the tax schedule to vary across periods.

We set-up the economic environment such that the fraction of worker incomes per income bracket is in rough alignment with those in the US economy.

Improved Economic Outcomes with the AI Economist


Our experiments show that the AI Economist achieves at least a 16% gain in the trade-off between equality and productivity compared to the next best framework, which is provided by the Saez framework. The AI Economist improves equality by 47% compared to the free-market at only an 11% decrease in productivity.

Social outcomes with AI agents. Productivity (left panel, higher is better) measures the total amount of income generated within an episode (analogous to GDP). Taxation results in a decrease of productivity versus the free market, but the loss in productivity is the smallest under the AI Economist. Income equality (middle panel, higher is better) is highest under the AI Economist. Here, equality is defined as 1 - Gini index and computed on incomes (after taxes and subsidies) at the end of an episode (higher Gini index means incomes are less equal). The product of equality and productivity (right panel, higher is better) measures the balance between equality and productivity. Black lines indicate standard error.
Comparison of sample episodes under different tax policies: Saez (left), AI Economist (middle) and free market (right). The trade-off between equality and productivity is optimized under the AI Economist.

Wealth Transfer and Tax Impact

Compared with the baselines, the AI Economist features a more idiosyncratic structure: a blend of progressive and regressive schedules. In particular, it sets a higher top tax rate (income above 510), a lower tax rate for incomes between 160 and 510, and both higher and lower tax rates on incomes below 160.

Left: The total tax payment for each income level. The total tax initially grows faster under AI and Saez taxes, but more slowly towards higher incomes. Right: Tax rate schedules for the AI Economist and baselines. The marginal tax rate computes how much tax (in %) is paid for each extra coin in income. 

The collected taxes are redistributed evenly among the agents. In effect, the lower-income agents receive a net subsidy, even though their tax rates are higher (before subsidies). In other words, under the AI Economist, the lowest incomes have a lower tax burden compared to baselines.

Wealth transfer in Coins after taxes and subsidies. Lower-skilled agents receive a net subsidy.

We observed that under the Saez framework, the gatherer-and-sellers collect fewer resources than with the AI Economist. This forces the buyer-and-builders to spend more time to collect resources themselves, which lowers their productivity. At the same time, the Saez framework yields less equality through redistribution as its tax schedule is more regressive. In sum, this yields a worse balance between productivity and equality.

Robustness Against Tax Gaming Strategies

Finding optimal taxes can be challenging because AI agents can learn to "game" tax schemes. In our simulation, agents learn that they can lower their average effective tax by alternating between earning high and low incomes, rather than earning a smooth income across periods. This tax gaming occurs for the Saez tax and AI Economist due to their regressive tax rates (higher income brackets have lower tax rates). The performance of the AI Economist demonstrates that it is effective even in the face of strategic agent behavior, and the emergence of this behavior underscores the richness of the simulation-based framework.

Incomes and Taxes during an episode. We see considerable variability from period to period. The left panel shows per-period income for a high skill agent versus a hypothetical agent that earns the same income each period. The right panel shows the total tax paid. AI agents have learned that their effective tax is lower if they alternate between earning high and low incomes.

AI-Driven Tax Policy and Real People

We also explored whether the AI Economist is effective in experiments with human participants. These experiments used a simpler ruleset to provide better usability, for instance, removing the ability to trade. However, the same economic drivers and trade-offs applied. Participants were paid real money for the utility they gained from building houses. Hence, participants were incentivized to build the number of houses that would maximize their utility. The stakes were sufficiently high: participants were paid at an average rate of at least twice the US minimum wage.

We tested all methods in a zero-shot transfer learning setting, by using the tax rates from the AI-only world in the human setting without retraining. This is an interesting evaluation, because retraining a tax policy might require a large amount of human data. The only modification was to scale down all income brackets by a factor of three to account for lower human productivity compared to AI agents. The full details are in our technical paper.

Comparison of the behavior of AI agents (left) and human participants (right). Humans seem more prone to adversarial blocking behavior, for instance, the dark blue worker builds houses to block the entrance to the bottom-right quadrant. This prevents others from reaching more resources.

Camelback Tax Schedule Yields Strong Equality-Productivity Trade-offs

For the experiments with human participants, we selected from the set of trained AI-Driven policies a tax schedule shaped like a camelback. We compared this schedule with baselines in experiments with participants recruited on Amazon Mechanical Turk.

Comparison of the camelback tax schedule that performs well in experiments with both AI and real people with US Federal and Saez. The camelback has a distinct rise and fall in its schedule. 

In 125 games with 100+ US-based participants, the camelback schedule achieved an equality-vs-productivity trade-off that is significantly better than the free market and competitive with other baselines. Participants were paid more than $20/hour on average. Compared with AI agents, people were more prone to suboptimal adversarial behaviors, such as blocking other workers. This significantly increased the variance in productivity.

Comparison of experiments under the Saez tax (left), AI Economist (middle) and free market (right) together with human participants. Social outcomes are best under the AI Economist.

Interestingly, the camelback schedule is qualitatively different from the baselines. However, the relative performance of the camelback versus the baselines is consistent across the experiments with only AI and with only people.

Social outcomes in experiments with human participants. The AI Economist achieves equality-productivity trade-offs that are competitive with Saez and US Federal and outperforms the free market (with statistical significance). Black lines indicate standard error. Humans are more prone to blocking behavior, which results in higher variance in productivity.

Significantly Higher Weighted Social Welfare

The camelback tax also statistically significantly outperformed all baselines in regard to an alternative, established social welfare metric that weights the utility of lower income workers more than higher income ones.

The AI Economist achieves statistically significantly higher inverse-income weighted social welfare than baselines in experiments with human participants. Black lines indicate standard error.

Future Directions

The strong zero-shot transfer performance on human play is surprising and encouraging. The camelback tax was competitive with, or outperformed, baselines, without recalibration and while being applied with different rulesets and worker behaviors. As such, these results suggest promise in the use of the AI Economist as a tool for finding good tax policies for real economies.

AI-based economic simulations still have limitations. They do not yet model human-behavioral factors and interactions between people, including social considerations, and they consider a relatively small economy. However, these kinds of simulations provide a transparent and objective view on the economic consequences of different tax policies. Moreover, this simulation and data-driven approach can be used together with any social objective in order to automatically find a tax policy with strong performance. Future simulations could improve the fidelity of economic agents using real-world data, while advances in large-scale RL and engineering could increase the scope of economic simulations.

We believe that this kind of research has great potential for increasing equality and productivity in real economies, helping to promote more just and healthy societies. We also hope that the AI Economist can foster transparency, reproducibility, and an open and facts-based discussion about applying machine learning to economic decision-making through our public research publications and open-source code. As such, our hope is that future economic AI models can robustly and transparently augment real-world economic policy-making and, in doing so, improve social welfare.

Ethical Considerations

Ethics, trust, and transparency are an integral part of Salesforce’s approach to AI research. While the current version of the AI Economist is a limited representation of the real world, and is not a tool that could be currently used with malintent to reconfigure tax policy, we recognize that it could be possible to manipulate future, large-scale iterations of the AI Economist to increase inequality and hide this action behind the results of an AI system.

Furthermore, either out of ignorance or malice, bad training data may result in biased recommendations, particularly in cases where users train the tool using their own data. For instance, the under-representation of communities and segments of the work-force in the training data could lead to bias in AI-Driven tax policies. This work also opens up the possibility of using richer, observational data to set individual taxation, an area where we anticipate a strong need for robust debate.

We encourage anyone utilizing the AI Economist to publish a model card or data sheet that describes the ethical considerations of AI-Driven tax schedules in order to increase transparency, and by extension, trust, in the system.

In order to responsibly publish this research, we have taken the following measures:

  • To ensure accountability on our part, we shared the paper and an assessment of the ethical risks, mitigation strategies, and assessment of safety to publish with the following external reviewers: Dr. Simon Chesterman, Provost’s Chair and Dean of the National University of Singapore Faculty of Law, and Lofred Madzou, AI Project Lead at the World Economic Forum's Center for the Fourth Industrial Revolution. None of the reviewers identified additional ethical concerns or mitigation strategies that should be employed. All affirmed that the research is safe to publish.
  • To increase transparency, we are publishing this blog post, as well as a comprehensive technical paper, thereby allowing robust debate and broad multi-disciplinary discussion of our work.
  • To further promote transparency, we will have a timed open-source release of our code for the simulation and training. This does not prevent future misuse, but we believe, at the current level of fidelity, transparency is key to promote grounded discussion and future research.

With these mitigation strategies and other considerations in place, we believe this research is safe to publish.

Additional Information

Paper: https://arxiv.org/abs/2004.13332

  • The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies, Stephan Zheng, Alexander Trott, Sunil Srinivasa, Nikhil Naik, Melvin Gruesbeck, David C. Parkes, Richard Socher.

Acknowledgements

This work was a joint effort with contributions from:

Stephan Zheng, Alex Trott, Sunil Srinivasa, Nikhil Naik, Melvin Gruesbeck, Kathy Baxter, David Parkes, and Richard Socher.

We thank Lofred Madzou, Simon Chesterman, Rob Reich, Mia de Kuijper, Scott Kominers, Gabriel Kriendler, Stefanie Stantcheva, and Thomas Piketty for valuable discussions.

We also thank the following people for their valuable support:

  • Video animation: Leo Ogawa Lillrank, Ivy Lillrank.
  • Marketing: Katherine Siu.
  • Public Relations: Kate Wesson, Steve Mnich.

References