Replacing Bureaucrats with Automated Sorcerers?

Abstract Increasingly, federal agencies employ artificial intelligence to help direct their enforcement efforts, adjudicate claims and other matters, and craft regulations or regulatory approaches. Theoretically, artificial intelligence could enable agencies to address endemic problems, most notably 1) the inconsistent decision-making and departure from policy attributable to low-level officials' exercise of discretion; and 2) the imprecise nature of agency rules. But two characteristics of artificial intelligence, its opaqueness and the nonintuitive nature of its correlations, threaten core values of administrative law. Administrative law reflects the principles that 1) persons be judged individually according to announced criteria; 2) administrative regulations reflect some means-end rationality; and 3) administrative decisions be subject to review by external actors and transparent to the public. Artificial intelligence has adverse implications for all three of those critical norms. The resultant tension, at least for now, will constrain administrative agencies' most ambitious potential uses of artificial intelligence.

T he government has long used computers to store and process vast quantities of information. 4 But human beings fully controlled the computers and wrote their algorithms. Programmers had to do all the work of modeling reality: that is, attempting to ensure that their algorithm reflected the actual world, as well as incorporating the agencies' objectives. 5 AI/ML is much less dependent on the programmer. 6 It finds associations and relationships in data, correlations that are both unseen by its programmers and nonintuitive. As to the latter, for example, an AI/ML algorithm might predict a person's preferred style of shoe based upon the type of fruit the person typically purchases for breakfast. 7 Thus, AI/ML results do not represent cause and effect; correlation does not equal causation. Indeed, as in the example above, AI/ML algorithms may rely upon correlations that defy intuitive expectations about relevance; no one would posit that shoppers consider their breakfast choices when making shoe selections.
The opaque and nonintuitive associations on which AI/ML relies, that is, AI/ ML's "black box" quality, have consequences for administrative law. 8 Even knowing the inputs and the algorithm's results, the algorithm's human creator cannot necessarily fully explain, especially in terms of cause and effect, how the algorithm reached those results. The programmer may also be unable to provide an intuitive rationale for the algorithm's results. While computer experts can describe the algorithm's conclusion that people with a particular combination of attributes generally warrant a particular type of treatment, they cannot claim that the algorithm has established that any particular individual with that combination of attributes deserves such treatment. 9 AI/ML can be used in either a supervised or unsupervised manner. In supervised learning, training data are used to develop a model with features to predict known labels or outcomes. In unsupervised learning, a model is trained to identify patterns without such labels. 10 AI/ML is particularly useful in performing four functions: identifying clusters or associations within a population; identifying outliers within a population; developing associational rules; and solving prediction problems of classification and regression. 11 AI/ML is currently less useful when a problem requires "estimating the causal effect of an intervention." 12 Nor can such algorithms resolve nonempirical questions, such as normatively inflected ones, like ethical decisions. 13 Presumably, AI/ML is ill-suited for resolving some empirical questions that frequently arise in administrative and judicial contexts, such as resolving witnesses' differing accounts of past events. In those situations, the data inputs are unclear.
A recent Administrative Conference of the United States (ACUS) study uncovered considerable agency experimentation with or use of AI/ML. 14 Agencies largely employed human-supervised AI/ML algorithms, and 91 Bernard W. Bell their results were generally used to assist agency decision-makers and agency management in making their own decisions. A few examples follow.
The Securities and Exchange Commission (SEC) uses AI/ML to monitor the securities markets for potential insider trading. The SEC's ARTEMIS system focuses on detecting serial inside traders. A natural language program sifts through 8-K forms submitted by companies to announce important events that occur between their regular securities filings. SEC staff then use a natural language processing algorithm to sift through the forms. Then, a machine learning algorithm identifies trigger events or market changes that warrant investigation. An official reviews the output and decides whether further investigation is justified. If so, SEC staff send a blue sheet request to broker/dealers for relevant trading records. The blue sheet data are analyzed with previously requested blue sheet data by an unsupervised learning model to detect anomalies indicating the presence of insider trading.
The Social Security Administration (SSA) uses several methods to increase the efficiency of its disability benefits claim adjudication process. It has attempted to apply algorithms to claim metadata to create clusters of similar cases it can assign to the same administrative law judge (ALJ). It has also developed an AI/ML analysis of claims to determine the probability of an award of benefits based solely on certain attributes of the claims. Officials use the results in establishing the order in which claims are assigned, moving ones likely to be granted to the head of the line. However, the actual determination of the claim is made by the ALJ.
AI/ML assists adjudicators in preparing disability decisions. The SSA's Insight program allows adjudicators to identify errors in their draft decisions, such as erroneous citations (that is, nonexistent regulation numbers) and misapplication of the vocational grid (the metric used to determine whether sufficient work exists in the national economy for those of a claimant's level of exertional ability, age, and education). Insight also assists the SSA in identifying common errors made by ALJs, outlier ALJs, and areas in which SSA policies need clarification. 15 The ACUS report discusses the use of AI/ML to sift through the massive number of comments made in response to the Federal Communications Commission's proposed rollback of its net neutrality rules and the Consumer Finance Protection Bureau's use of AI/ML to classify the complaints it receives. 16 Algorithms have been deployed to assist agencies in predicting an industry's potential response to various alternative formulations of a contemplated regulation. A dministrative agencies perform a wide array of functions. Administrative law scholars tend to focus on three broad categories of agency action that lie at the heart of the government's coercive powers: enforcement, adjudication, and rulemaking. These categories derive from the distinction between legislating, enforcing the law, and adjudicating legal disputes.
Enforcement. Enforcement involves monitoring regulated entities, identifying statutory or regulatory violations, and pursuing sanctions for such violations. Enforcement is largely an executive function.
Moreover, enforcement has heretofore been considered inherently discretionary: agencies' limited resources simply do not allow them to be present everywhere at all times, much less pursue every potential regulatory violation. 20 Choosing which regulated entities or activities to investigate can be excluded from the realm of consequential decisions. If the entity or person under investigation has been complying with the law (or if the government cannot amass sufficient evidence to prove otherwise), no adverse consequence will ensue. Generally, the cost of undergoing investigation and defending oneself in an unsuccessful government enforcement action is not considered a harm. 21 Adjudication. Adjudication involves resolving individuals' rights against, claims of entitlements from, or obligations to the government. Thus, decisions regarding Social Security disability benefits, veterans' benefits, entitlement to a particular immigration status, and the grant or revocation of government licenses or permits, as well as liability for civil fines or injunctive-type relief, are all adjudications. In mass justice agencies, these adjudications differ substantially from traditional judicial determinations. Traditional judicial decisions often involve competing claims of right and frequently require making moral judgments in the course of resolving cases. The specification of rights and obligations is often intertwined with a determination of the applicable facts. 22 AI/ML algorithms might make quite good predictions regarding the results in such cases, but we are chary about leaving the actual decision to an AI/ML algorithm.
Mass adjudication by administrative agencies can often be much more routinized. Consider insurance companies' resolution of automobile accident claims. The judicially crafted law is complex. Liability turns on each actor's "reasonableness," a judgment based on a mixture of law and fact. The complexity represents an effort to decide whether the injured plaintiff is morally deserving of recovery from the defendant driver. Fully litigating such cases requires questioning all witnesses to the accident closely. But insurance companies seeking to resolve mass claims without litigation use traffic laws to resolve liability issues, as an imperfect but efficient metric. 23 Similarly, the SSA disability determinations could be considered expressions of a societal value judgment regarding which members of society qualify as the deserving poor. 24 Such a determination could be unstructured and allow significant Bernard W. Bell room for adjudicators' application of moral judgments and intuition. But the SSA has, of necessity, established a rigid, routinized, five-step process for evaluating disability claims. 25 And the final step involves assessing whether sufficient jobs the claimant can perform exist in the national economy. That too was routinized by use of a grid, which provided a yes/no answer for each combination of applicants' age, education, and exertional capacity. 26 Another aspect of agency adjudication warrants attention. Much of traditional litigation, particularly suits for damages, involves assessing historical facts, the who, what, when, where, and why of past events. But agency adjudications can involve predictions as well as historical facts. Thus, licensing decisions are grounded on predictions regarding the likelihood that the applicant will comport with professional standards. Likewise, the last step of the SSA disability determination, whether a person with certain age, educational, and exertional limitations could find a sufficient number of jobs available in the national economy, is a prediction. On the other hand, whether an employer committed an unfair labor practice in treating an employee aversely for union activity is a question of historical fact.
AI/ML excels at making predictions-that is its sine qua non-and predictions are all we have with regard to future events (or present events we may want to address without taking a wait-and-see approach). 27 But for an issue such as whether a particular entity engaged in a specific unfair labor practice, we might want to focus on the witness accounts and documentary evidence relevant to that situation, rather than AI/ML-generated correlations. 28 Or to use an example from toxic torts, epidemiological and toxicological studies establishing general causation between a toxin and a toxic harm may be fine for estimating risks to a population exposed to a toxin, but do not prove what courts in toxic torts must determine: namely, whether the harm the plaintiff suffered was caused by the plaintiff's exposure to a toxin. 29 Rulemaking. Rulemaking involves promulgation of imperatives of general applicability akin to statutes. As administrative law scholars Cary Coglianese and David Lehr suggest, AI/ML's use in rulemaking is limited because that process involves normative judgments and requires "overlay[ing] causal interpretations on the relationship between possible regulations and estimated effect." 30 The product of agency rulemaking-regulations-may resemble formal legislation, but the rulemaking process is designed to be far less onerous. Agencies often promulgate such regulations by "notice-and-comment" procedures. 31 Those procedures seem deceptively simple, but in practice require the agency to identify and categorize assertions made in thousands of comments regarding the rule's propriety. And with the emphasis on the Office of Management and Budget's (OMB) regulatory review of proposed regulatory actions, a significant part of the rulemaking process consists of assessing the overall costs and benefits attendant the rule. 32 These legislative rules differ from the guidance rules used to constrain lowerlevel officials' discretion, direct their decision-making, or advise the public. Leg-

Replacing Bureaucrats with Automated Sorcerers?
islative rules that are the product of notice-and-comment rulemaking have the "force of law": violation of the rule itself is unlawful, even if the action does not violate the statutory standard implemented by the rule. Rules in the second sense, guidance rules that merely constrain lower-level officials' discretion or provide guidance to the public, do not replace the legal standard enunciated in the statute upon which they elaborate. They lack the force of law; an agency's sanction against violators of such guidance rules can be upheld only if the agency can show that the rule-violator's conduct has transgressed the underlying statute.
For example, a federal statute grants the Federal Trade Commission (FTC) the power to enjoin unfair and deceptive trade practices. The FTC could issue a guidance rule specifying that gas station operators' failure to post octane ratings on gas pumps is inherently deceptive. The guidance might well be based on extensive consumer research the FTC has conducted. If the FTC promulgates a guidance rule, each time it goes to court to enforce an order it enters against a rule-violator, it will have to prove that the gas station's failure to post octane ratings was deceptive. If, however, the FTC promulgates a force of law rule, that is, a "legislative rule," when it goes to court to enforce an order against a violator, it need merely show that the octane ratings were not posted. The gas station operator can no longer mount a defense asserting that its customers were not confused or deceived by the lack of posted octane ratings.
Legislative rules can be analogized to algorithms. The human lawgiver correlates a trait with a particular mischief the legislative rule is designed to address. The correlation may often be imperfect; but rules are inherently imperfect. However, we would probably not accept laws based on a nonintuitive correlation of traits to the mischief to be prevented, even if the correlation turns out to be a pretty good predictor. Even with respect to legislatures, whose legislative judgments reflected in economic and social legislation are given a particularly wide berth, courts purport to require some "rational basis" for associating the trait that is targeted with the mischief to be prevented. 33 The demands for some intuitive connection, some cause-and-effect relationship between a trait targeted and a harm to be prevented, is even greater when agencies promulgate regulations. 34 And to carry the analogy further to guidance rules, it is not clear at all that a nonintuitive connection would be allowed as a guidance rule used to direct the resolution of agency adjudications.
Bernard W. Bell on training, and retraining, its lower-level employees. And sometimes agency leadership may encounter bureaucratic resistance, yet another reason some line-level employees' determinations might not comport with the leadership's policy. 37 Agencies' internal structures reflect the fundamental tension between rulelike and standard-like decision metrics. Rules are decision metrics that do not vary significantly depending on the circumstances. 38 Rules facilitate decisional consistency, assist line officials' efforts to follow agency policy, and allow superiors to more easily detect departures. But rules are invariably over-inclusive or under-inclusive: they sweep within them nonproblematic cases or fail to capture problematic cases, or both. 39 And the simpler the rule, the larger the subset of undesirable results.
For instance, due to the increasing heart attack risks as people age, in 1960, the Federal Aviation Administration promulgated the following rule: "No individual who has reached his 60th birthday shall be utilized or serve as a pilot on any aircraft while engaged in air carrier operations." 40 The rule is over-inclusive: many pilots over sixty have a very low heart attack risk, far lower than that of many pilots under sixty. A case-by-case determination based on medical records would surely have led to a more calibrated response. Even a rule that took into account not only age, but multiple health factors would produce a smaller number of decisions in which relatively risk-free pilots would be grounded.
Some of a rules' inherent limitations can be counteracted by according discretion to line employees. Reintroducing, or retaining, elements of discretion can be particularly important when decisions must be based on circumstances or factors that either: 1) were not envisioned by rule-drafters (rules can quickly be undermined by new scientific, economic, social, or other developments); or 2) cannot be quantified. 41 So agency leadership must accord low-level decision-makers some discretion. 42 But what if rules could be fine-grained, to take virtually innumerable factors into account? The subset of wrong decisions would become narrower. 43 Agencies must also contend with various external forces. Agencies' legitimacy rests upon their responsiveness to the elected officials of the executive and legislative branches, namely the president and Congress. The president and Congress must retain the capacity to assert control over agencies, through the exercise of the executive authority and congressional oversight, inter alia, and change agency behavior by enactment of statutes modifying the law. But even such legislative and executive oversight is insufficient to ensure agency fidelity to law. 44 Thus, agency decisions are generally subject to judicial review as well, to ensure that agencies remain faithful to their statutory mandates. Nevertheless, judicial review is generally deferential. On-the-record adjudications need only be based upon "substantial evidence." Less informal adjudications and regulations need only satisfy the "arbitrary and capricious" standard of review. 45

Replacing Bureaucrats with Automated Sorcerers?
The public, and both the relevant regulated entities and the beneficiaries, must have notice of their obligations. Regulated entities must be able to predict how agencies will decide cases, and beneficiaries must also be able to determine when a challenge to a regulated entity's actions is warranted. Moreover, no agency can long prosper without the general support of the public, or at least key constituencies. 46 W hat are the implications of governments' use of AI/ML? Use of AI/ML algorithms will increase uniformity of adjudicatory and enforcement decisions, and their more fine-grained metrics should minimize the subset of incorrect decisions. 47 But agencies will face a basic decision: should the algorithms' decisions be binding or nonbinding?
If binding, many fewer line officials, that is, bureaucrats, will be needed to implement the program on the ground, and those that remain may well experience a decline in status within the agency. But in embracing AI/ML algorithms, agency leadership may merely have traded one management problem for another: managing the data specialists assuming a more central role in the agency's implementation of its programs. They will make decisions about the algorithm, the data used to train it, and the tweaks necessary to keep it current. Nonexpert leadership may feel even less capable of managing data scientists than the line officials they replaced.
If the algorithm is nonbinding, the key question will be when to permit human intervention. There is reason to believe that permitting overrides will produce no better results than relying on the algorithm itself. 48 Of course, agency leadership may disagree. In that case, the challenge will be to structure human intervention so as to avoid reintroducing the very problem the AI/ML algorithm was created to solve: unstructured, intuitive discretion leading to discrepant treatment of regulated entities and beneficiaries.
The uniformity wrought by AI/ML algorithms will come at the cost of increasing the opacity of the decision-making criteria and, potentially, the intuitiveness of the decision metric. E xplainability is critical within the agency. It is critical to any attempt to have a line-level, or upper-level, override system. If one does not know the weight the AI/ML algorithm accorded various criteria, how is one supposed to know whether it gave that consideration appropriate weight? At the same time, the algorithms' opacity might lead to staff resistance to such AI/ML decisions. 49 Lack of explainability poses challenges to agency managers seeking to retain control over policy, because not even the agency head can reliably discern with precision the policy the AI/ML algorithm applies in producing its decisions. At best, agency leadership will be dependent on computer and data processing specialists as critical intermediaries in attempting to manage the algorithm.

Bernard W. Bell
AI/ML algorithm's lack of explainability impedes the agency's navigation of its external environment as well. It complicates relationships with Congress and the components of the Executive Office of the President (EOP), like the OMB, with which the agency interacts. The more opaque and less intuitive the explanation of the AI/ML's metrics and decision-making process, the harder it will be to convince members of Congress and the relevant EOP components of the soundness of the agency's decisions. And the more fine-grained the nonintuitive distinctions between applicants for assistance or regulated entities, the more those distinctions will be viewed as literally arbitrary (that is, turning on inexplicable distinctions) and, well, bureaucratic. The reaction of the general public will presumably be even more extreme than that of elected leaders and their staff.
But let us turn to the implications of AI/ML's lack of explainability for judicial review. While judicial review of agency decision-making is deferential, it is hardly perfunctory. 50 In many circumstances, agency decisions are a type of prediction, even though they may not be framed in that way. Does licensing this pilot pose a risk to public safety? Is this applicant for benefits unable to obtain a job? These are questions in which AI/ML algorithms excel. But, as noted earlier, some agency decisions require a determination regarding past events. Sometimes the facts, one might say "the data," are in dispute. Two people might have a different account of a key conversation between a management official and an employee central to determining whether an unfair labor practice occurred. Current AI/ML algorithms are unlikely to provide much assistance in resolving such a contest.
In addition, if a statute is applicable, an AI/ML algorithm might be incapable of producing a decision explaining the result to the satisfaction of a court. The Supreme Court's decision in Allentown Mack Sales & Service v. NLRB provides a cautionary tale. There, a company refused to bargain with a union, asserting a "reasonable doubt" that a majority of its workforce continued to support the union. In practice, the National Labor Relations Board (NLRB) required employers making such an assertion to prove the union's loss of majority support. The Court held that an agency's application of a rule of conduct or a standard of proof that diverged from the formally announced rule or standard violated basic principles of adjudication. 51 But that is what an AI/ML algorithm does: it creates a standard different from that announced, which may well be nonintuitive, and then consistently applies it sub rosa. AI/ML algorithms reveal that certain data inputs are commonly associated with particular outcomes to which we accord legal significance, but fail to show the basis for believing that the correlation held in a particular circumstance that occurred in the past. In other words, AI/ML can make predictions about the future, but offers little insight into how the record in the particular case leads to particular conclusions with respect to legally significant historical facts.
And often, in close cases, an agency can support either decision open to it. Is the reviewing court to be satisfied with reversing only "clearly erroneous" AI/ML-

Replacing Bureaucrats with Automated Sorcerers?
produced decisions? Some have suggested that courts review the process for decision-making rather than the outcomes produced by AI/ML algorithms. 52 That approach certainly has appeal, but how is the nonexpert (perhaps mostly technophobic) judiciary supposed to review the AI/ML algorithms? Courts faced a similar dilemma when Congress created new regulatory agencies for complex scientific and technological subjects, accorded other agencies more rulemaking power, and permitted more pre-enforcement challenges to regulations. The court's response was a "hard-look" approach, ensuring that the relevant factors were considered, irrelevant factors were not, and that public participation was guaranteed. 53 Explainability, in another sense, is also important with respect to legislative rules. Let us say that an agency seeks to make explicit what is implicit in an AI/ ML algorithm. Assume an AI/ML algorithm finds a correlation between long-haul truck drivers involved in accidents and 1) drivers' credit scores; 2) certain genomic markers; and 3) a family history of alcohol abuse. The agency could license or de-license based on a grid capturing the correlation. How would such a rule fare?
First, correlation does not equal causation. Some additional factor(s) more intuitively relevant to a driver's dangerousness might be propelling the relationship between it and the three variables. Given the basic requirement for some logical relationship between a regulation and its purposes, courts will surely demand either some intuitive relationship or nonintuitive causal relationship between the variables and truck driver dangerousness. After all, even if there is a fairly high correlation between the variables and truck driver dangerousness, many individuals will be excluded from truck driving due to apparently irrelevant factors. The agency will presumably have to provide the intuitive or causal relationship for the regulation to avoid its invalidation as "arbitrary and capricious." 54 The example points to another problem. We want to base regulatory limitations (or provision of benefits) on people's conduct, not their traits, either immutable, like genomic markers or family history, or mutable but irrelevant, like credit score. One's reward or punishment by the government should turn on conduct to be encouraged or deterred, not accidents of birth. And to the extent the correlation involves a mutable marker, potential truck drivers will focus on improving their performance on a characteristic that does not improve their driving, like raising their credit score, rather than improving their capabilities as drivers. And, of course, some characteristics, like race and gender, cannot be used, unless the agency can proffer a strong justification that is not based on treating an individual as sharing the characteristic of his or her group. 55 Third, the function of notice-and-comment requirements would be undermined if the agency can conclude that its process for developing the algorithm is sound, and thus that the correlation is valid, even though the reason the correlation makes sense remains a mystery. Commenters themselves would have to in-Bernard W. Bell vestigate the correlation to either prove it is coincidental (essentially disproving all possible reasons for the existence of the correlation) or identify the underlying causes driving the correlation.
In short, even if an agency reveals its AI/ML algorithms' magic, by attempting to capture an AI/ML-discovered correlation in a legislative rule, the agency's attempt to promulgate a counterintuitive rule will likely fail.
Briefly turning to agency enforcement efforts, courts have recognized, particularly in the Freedom of Information Act (FOIA) context, the inherent tension between making sure there is no "secret law" and preventing circumvention of the law. 56 Transparency may mean that the enforcement criteria will become the effective rule, replacing the law being enforced. And given the complexity of AI/ML algorithms, transparency could have a disparate effect depending on the wealth and sophistication of the regulated entity. 57 Nevertheless, to the extent transparency is desirable, it will be more difficult to achieve when the AI/ML algorithm is proprietary, as the FOIA probably allows the agency to withhold such information and the government may feel compelled to do so. 58 T echnology tends to make fools of those who venture predictions. Nevertheless, the potential that AI/ML will reduce the number and status of line-level employees is present. But before AI/ML makes significant inroads, agencies will have to grapple with making AI/ML algorithms' "black box" magic more transparent and intuitive.