Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis

For many government departments, uncertainty aversion is a source of barriers in the advancement of data openness. A more active response to potential risks is needed and necessitates an in-depth examination of risks related to open government data (OGD). With a cross-case study in which three cases from the United Kingdom, the United States and China are examined, this study identifies potential risks that might emerge at different stages of the life cycle of OGD programs and constructs a taxonomy model for them. The taxonomy model distinguishes the “risks from OGD” from the “risks to OGD”, which can help government departments make better responses. Finally, risk response strategies are suggested based on the research results.


INTRODUCTION
Government information disclosure has played an important role in the democratic development of human society since Sweden passed the Freedom of the Press Act in 1766. In particular, since the United States issued the Freedom of Information Act (FOIA) in 1966, many countries have passed laws or regulations to protect the "right to know" of the public. With the advent of the era of big data, significant attention has been focused on the value of open government data (OGD) in promoting government transparency and accountability, public participation and social innovation [1,2,3,4]. More than 70 countries have joined Some of these barriers exist in reality, while others simply derive from an insufficient understanding of the possible risks. It was reported that a database of registered usernames and email addresses of the data.gov.uk had been leaked [26]. In 2018 BBC reported that sensitive information of the military has been disclosed in a data visualization map of a fitness tracking company [27]. Besides, some people were also concerned that the geospatial data published on data.gov by the Department of Agriculture of the USA might be used to locate crops targeted for eradication via infestation, or even to commit acts of biological warfare [28]. As shown in above cases, the fear of the potential risks may hinder the advance of OGD. The culture of risk aversion is one source of existing institutional barriers to OGD [13]. Uncertainty avoidance plays a negative moderating role in the relationship between other organization resources and the OGD capacity of government departments [14].
The best strategy for addressing risks is understanding and managing them more effectively, rather than ignoring or avoiding them. Therefore, we aim to address the following three questions: "What kinds of risks are involved when government departments implement OGD initiatives?", "How are these risks distributed over the lifecycle of OGD?" and "What strategies could be adopted?" To answer these questions, we conducted a cross-case study on three OGD-related cases from three countries, the cease of the care.data program in the UK, the IRS data breach in the USA and the tardy progress of the OGD program in China. We then identified 14 risks and categorized them into a taxonomy model which distinguishes "risks to OGD" from "risks from OGD". This study deepens the current understanding of OGD-related risks and brings new insights into the mechanism design for advancing OGD.

Risks Associated with OGD
Risk refers to the uncertainty of future results in a given condition and a particular period [29]. The risks related to OGD that are frequently discussed are those pertaining to data leakage [30,31,32], invasion of privacy [33,34,35] and other information security issues. [36] summarized 11 government risks in data release: copyright, trade secret protection, privacy, the security of the infrastructure, publication of improper data or information that might lead to negative attitude towards public institutions, inaccurate data, misinterpretation of the data, absence of data consumers, less willing to cooperate, overlapping of data and increased number of requests for data. The Office for National Statistics of UK also proposed to better balance openness with privacy protection [37]. [38] believes that the risks posed to the official statistics department by big data are also related to mission drift, damage to reputation and the loss of public trust, inconsistent access and continuity, the fragmentation of approaches across jurisdictions, resource constraints and cut-backs, privatization and competition, etc. Besides, OGD are also vulnerable to risks in terms of effectiveness, relevance and trust [39]. [15] categorized the risks that may hinder OGD as those related to governance, economic issues, licenses and legal frameworks, data characteristics, metadata, access and skills. Based on a single case study of Shanghai, [40] summarized the potential risks of OGD from the levels of legislation, management and data.

Risk Management Related to OGD
In response to risks faced by government departments, the National Audit Office (NAO) of the UK proposed embedding risk management into their core decision-making and planning management processes. The NAO put forth the notion that risk management, including identifying, evaluating, processing and reporting, can help governments make credible decisions and support innovation [41]. In 2013, a subordinate of the US Treadway Commission released the Internal Control-Integrated Framework to address enterprise risk management in five aspects: controlled environment, risk assessment, control activities, information and communication, and regulatory activities [42]. [36] proposed mitigation strategies for the identified risks of OGD, including monitoring and assessment of the demand for data, proper specification of data sets to avoid duplication, compliance assessment, data anonymization and data aggregation, quality control of data publication, establishment of internal and external data catalogue, linking to data sets already published, properly formulated terms and prompts for data originating from third parties, clearly explained duties, and continuous monitoring of the impacts of OGD initiatives. [43] holds an opinion that the application of tax-related data quality control and data technology is a key element in control of tax risks. In 2017, the British government updated the Data Protection Act promulgated in 1998 to reinforce the rights of citizens, such as the right to be forgotten, personal data, privacy information and data migration stipulating that identifying personal information from anonymous data and tampering with them will result in criminal charges [44]. In 2018, the General Data Protection Regulation of the EU came into force [45], focusing on protecting and empowering all EU citizens regarding data privacy and reshaping the way all organizations including governments approach data privacy.

Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
The afore-mentioned regulations and studies have fully discussed risk management strategies associated with OGD, but they did not address the context in which a specific risk occurs, for example, the stage in the data lifecycle, and its source and consequence. Besides, a single country context may not be sufficient to reveal the variety of risks in more comprehensive institutional backgrounds. These defects may weaken the effectiveness of suggested strategies. In view of this, we have conducted a cross-case study in international context from the perspective of the lifecycle of OGD.

Lifecycle Model of OGD
Lifecycle model can guide the process of opening up data [46]. Lifecycle analysis draws on the biological analysis method, and divides the development process of objects (records, data, products, projects, organizations, etc.), from the stage of generation to that of extinction into several stages. Among others, OGD, is also subject to changes in its lifecycle. An analysis based on the lifecycle of OGD can help identify the risks that exist at different stages. Meanwhile, risk management itself can also be divided into three phases of latency, occurrence and crisis response [47], which necessitates different measures.
According to [2], the OGD lifecycle comprises three sections, a pre-processing section (data creation, selection, harmonization and publishing), an exploitation section (data interlinking, discovery, exploration, and exploitation), and a maintenance section (data curation). Based on an investigation conducted in The Netherlands, [46] developed a community-driven open data lifecycle model, which comprises identification of data, data preparation, data issue, and data reuse and data evaluation. [48] discussed the barriers of OGD in China based on a data-centered lifecycle model, including data organization and processing, storage and distribution, discovery and acquisition, and appreciation and evaluation.
With different research purposes, the above-mentioned OGD lifecycle models are either data management centered or value realization centered. As the risks of OGD are mainly taken by government departments, including both the data providers and users, a government centered lifecycle model is needed. A lifecycle model of OGD should consist of at least five stages: data creation and collection, data organization, data release, data utilization and data maintenance.

Analytical Framework
Based on the literature review above and the lifecycle of OGD, we develop an analytical framework for the subsequent case study, as shown in Figure 1.

Research Procedure
The present study adopts a cross-case study method. Although previous studies have discussed some OGD-related risks, they are neither integrated into a whole nor detailed from the perspective of the OGD lifecycle. To bridge these theoretical gaps, this study adopts the case study method. Compared with a single case study, a cross-case study is used to make generalization, and examine themes, similarities and differences across cases in quantitative or qualitative analysis. The research design involving multiple cases is generally regarded as more robust than that of a single case study, as it provides the observation and analysis of a phenomenon in several settings [49].
In this study, an analysis across three OGD cases from different countries is conducted. A content analysis method is also used to examine the lifecycle distributions of OGD-related risks. The research steps are as follows: 1). Select three cases of OGD in the USA, the UK and China based on their theoretical potentials and representativeness in different stages of the OGD lifecycle. 2). Collect case data through the Internet search and semi-structured interviews; 3). Identify the risks associated with OGD using within-case and cross-case analysis; 4). Categorize all identified risks into a taxonomy model; 5). Analyze the distributions of all risks over five stages of the OGD lifecycle; 6). Suggest countermeasures for OGD risk management.

Case Selection
Three cases are selected for their theoretical richness and representative ability to answer the research questions and address the different stages of OGD programs. They are the UK healthcare data program "care.data", the American Internal Revenue Service (IRS) data breach event and the Open Data initiative of Shanghai, China. The "care.data" case is typical, in that it reveals the risks in data collection and sharing. The IRS data breach is not a typical OGD case, but it reveals the mismanagement and malicious use of government data; it is therefore representative in revealing the risks that are characteristic of the OGD maintenance phase. The case of the Shanghai OGD is of typical significance, as it reveals the risk concerns of government departments at the early stage of an OGD initiative. The three cases, which are both countryand industry-specific, focusing on different phases of the OGD lifecycle, can jointly support the identification of risks in the entire lifecycle.

Data Collection
Case data are collected via the Internet search and semi-structured interviews. The former applies to cases from the UK and the USA, and covers news reports and online commentaries; the latter apply to the case of China and cover three one-to-one in-depth interviews and one focuses on group interview with the heads of seven government departments. All the interviews lasted for nine hours in total, and 55,600 Chinese words were transcribed within one week of the interviews.

Data Analysis
A bottom-up coding approach combined with a cross-case comparison is adopted to analyze the data. The data analysis is a continuous process, starting with data collection. First, all concepts or entities related to risks or real harms are identified; they are then compared and categorized within the case. Second, the results of three cases are listed, analyzed, compared and categorized. Third, the sources and consequences of all risks are analyzed. Fourth, a taxonomy model of OGD-related risks is constructed. Finally, the distribution of the risks over the five stages of the OGD lifecycle is analyzed and strategies to address them are suggested.

Case Introduction
In 2013, the long-established National Health Service (NHS) of the UK and the Health and Social Care Information Centre (HSCIC) initiated a program called care.data, aiming to improve the safety and care of patients by using information; the program also helped create an extensive health records database, whose target users include pharmacies, mental health services, opticians, dentists, and education and training institutions, and that will eventually support all healthcare facilities. The data have been anonymized and only cover the patient's age range, gender and area of residence, but in exceptional circumstances, such chinaXiv:202211.00467v1 as during a pandemic, a researcher can apply to the Minister of Health for the removal of these privacy protections. Researchers believe the data will help them develop new treatments and evaluate NHS services. However, the "care.data" project, surprisingly, does not run smoothly. In February 2014, the NHS acknowledged a serious crisis of confidence regarding the "care.data" project, and informed family doctors to postpone the uploading of patient data for up to six months. In the fall of 2014, the NHS decided to endeavor four new pilots to collect medical health data for two million patients, but the first pilot was not officially launched until June 2015. Due to poor communication with the public, such as the absence of press conferences, and the failed delivery of brochures to families, the NHS's data collection was collectively denounced by patients, doctors, the British Medical Association, the privacy campaign group Big Brother Watch and the Association of Medical Research Charities; consequently, one million people withdrew from the program. Under enormous pressure, the care.data was stopped by the NHS on July 6, 2016.

Risk Identification of Case 1
The NHS is an important source of population data. The care.data is a typical example of government data collection and utilization. Through the analysis of data collected from online news reports, blogs, comments, etc, the following risks are identified: 1). Privacy leakage risk. Health data are of a personal and sensitive nature. Mudie, an opponent of care. data, said: "The human cost to the patient whose identity and medical history are made public is potentially disastrous. Careers could be ended, jobs lost, insurance refused and relationships destroyed if sensitive medical facts are made public or used by private firms, other people or, indeed, the media" [50]. Although the patient data collected by the care.data had been anonymized, data users typically used open data in conjunction with the holder's closed data and accessible data, and the public were worried that malicious analyzers could use specific techniques to identify the patient's private information [51]. 2). Risks from implicit operational specifications. A lack of explicit operational specifications has led to various difficulties in the opening and collection of data. The accusations from the public are: "the ambiguous standards for obtaining health data posing a risk of trust between doctors and patients [52]", "the regulations over data access, data inspection and balance not yet established or implemented [53]", etc. The NHS did not systematically organize and process the collected data sets, leading to perplexed usage and potential safety hazards when cooperating with other agencies. 3). Risks of improper data use, especially in cooperation with commercial organizations. The focus of the public concern is that sharing sensitive medical information with commercial companies can be risky without the explicit consent of patients. The care.data hands over coded patient data to the insurance industry to help actuaries calculate the average premiums, but the public believe that it may identify an individual's tendency to become ill, causing the insured to be biased against when attempting to buy insurance. In the cooperation between NHS and Google's DeepMind, the public are also worried that commercial organizations will use patients' private data for profit. "Any effort to ignore the use of data and only discuss open data policy will fail as they are faced with the disorder in reality and compromise in practice [53]." 4). Data quality risks. Data quality risks induced by data collection methods, such as incompleteness, distortion, and inaccuracies. For example, care.data requires general practitioners to collect patient data; however, as a result, data pertaining to young and healthy males may be missing. Some people obtain the prescription drugs but flush them down the toilet when unmonitored, causing distortions in the drug's performance data, etc. 5). Risks from immature techniques. One of the reasons for the public's opposition to care.data is the lack of technical sophistication. Sheila Bird, a professor of statistics of Strathclyde University, stated: "Data-sharing as proposed by care.data was disastrously incompetent -both ethically and technically. Professionals rebelled and prevailed in out-casting care.data, thereby ensuring that future proposals will not succeed unless both technically proficient and in the public interest" [54]. 6). The risk of public trust crisis prompted by the immaturity of government regulation. Of all the objections against care.data, many could be regarded as a function of the public trust crisis caused by a flawed government supervision system. For example, "when you propose to share our most confidential medical records, ambiguous promises and fictitious regulatory frameworks are disturbing to the public" [50]. 7). The risk of poor communication. Huge external pressures on care.data stemmed from lack of publicity and ineffective communication. While the project is valuable, it requires the understanding, support and cooperation from citizens and other organizations. In November 2014, the All Party Parliamentary Group in the British Parliament investigated the care.data project, accusing it of lacking transparency and poor publicity, which eventually led to its failure. 8). The risk of unsustainable funding for public communication. Between October 2013 and 2014, the NHS announced £ 2 million to publicize the care.data project to the general public, but the actual cost of advocacy was merely £ 1 million. The survey found that less than one-third of the public received publicity brochures because of insufficient investment in public communication. This eventually led to the project being forced to cease due to tremendous external pressure. 9). The risk of oversized external pressures. After the care.data program was announced in early 2013, some privacy organizations launched the Medical Data Confidentiality Initiative, calling attention to security risks in the use of medical data. Since then, these organizations have applied significant pressure to initiatives in the collection of medical data. 10). The risk of unprofessional information governance. In January 2015, the NHS's Independent Information Governance Oversight Panel released a report indicating that the initial commitment to the care.data project was not completed, and partly because of the lack of experts in information governance.

Case Introduction
In 2015, a data breach took place in the Internal Revenue Service (IRS) of the United States [55]. With the help of companies such as IBM, the IRS set up a complex data platform networked with other government agencies. The site has an application called "Get Transcript" that allows citizens to easily access to previous chinaXiv:202211.00467v1 Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis tax records. In 2015, hackers illegally accessed about 724,000 taxpayers' tax returns via the Get Transcript app. This malicious behavior was not detected until three months later. After the investigation, J. Russell George, the Treasury Inspector General for the tax administration, accused the IRS of not deploying the Web systems according to the requirements and recommendations, which caused serious data breaches in the event of reduced staffing and increased capital input [56]. 2   1). The risk of hacking. Hacking is a constant threat to government applications, and its immediate consequences are leakages of privacy, trade secrets or even the security information of national infrastructure. 2). The risk of poorly implemented government regulations. After the implementation of the My Data project, the US Open Data Action Plan established clear requirements regarding network and data security systems for government departments, but the IRS did not follow these requirements seriously, resulting in a colossal security breach. 3). The risk of poor communication and cooperation between departments. After the data breach event, the IRS, the Treasury Department and other departments were at odds with each other. The poor communication and cooperation between them resulted in an inadequate response to the risk. 4). The risk of delayed system updates and maintenance. The maintenance and update of an open system platform must be carried out periodically; otherwise, security risks are likely to occur. In this case, some IRS applications were outdated and had many security vulnerabilities; this was coupled with poor maintenance of the systems and platforms. Together, these factors were vulnerable to hacking, which remained undetected for three months. 5). The risk of unsustainability in capital investment. According to the IRS, an increasing number of government information systems security services are being outsourced; with the delay of at least $ 400 million in investments due to IT budget cuts, many of the operations, including the maintenance and replacement of old IT systems, are affected, thereby resulting in system failure and security loopholes.

Case 3: The Tardy Progress of the OGD Program in China
During the short history of OGD in China, no major crisis has taken place yet. Therefore it is impossible to have an event as a case study that exhibits the lifecycle. Instead, we only investigate the relevant risk factors by examining a representative city government. Between December 2016 and May 2017, we interviewed seven government departments in Shanghai, including the Human Resources and Social Security Bureau, the Agriculture Commission, the Food and Drug Administration, the Audit Bureau, the Health and Family Planning Commission, the Trade and Industry Bureau, and the Planning and Land Resources Administration. According to the analysis of interview data, the risk concerns of the departments related to OGD can be summarized into the following six aspects: 1). Risk of data distortion. Government statistical data, though eye-catching and frequently used by the public, are prone to distortion in the layer-by-layer hierarchical reporting system; this has become a systemic problem. One interviewee said: "It is not that our attitude to OGD is inactive, but that the authenticity of data reported by the subordinate departments cannot be guaranteed". 2). Risk of low data quality. Given that some government departments' data are collected from enterprises or reported by subordinate departments and other diversified channels, some data are thus flawed with inconsistent format, unstandardized metadata, incompleteness and other quality problems, which forestalls some departments from engaging in OGD pilots. One respondent noted: "The data collected directly from medical institutions and health administration bureaus at county or district level were found countless quality problems. We've been doing data cleaning for two-three consecutive years, so a lot of data have not yet been released." 3). Risk of data aggregation. Although open data have been anonymized and desensitized, some departments are still worried about the existence of security risks for collective data release. One interviewee said: "The bulk of our data are disease information or disease prevention information, if released, it implicates personal privacy." 4). Risk of undefined operational norms. Some departments expressed that due to the lack of technical standards and codes of practice, they are quite at sea for some problems in open data. One interviewee suggested that "the scope of disclosure also needs to be stipulated. We are willing to participate in the construction and also want to provide useful data, but the specification of the application needs to be well defined." 5). Risk of imperfect mechanisms. The lack of supervision, feedback and incentive mechanism led to the slow progress of the OGD project. On account of the scanty feedback on the use of open data, the enthusiasm of some departments for OGD has been dampened. One interviewee said: "We are in want of a mechanism for feedback and supervision, we would like to know where and to what extent the data we provide have realized their value. All these require feedback, which is also a positive incentive for us." 6). Risk of data value-sparseness. Some departments want to know if their data are really useful to the public. "Every year we provide a lot of open data to the community (for social services), and a great deal of work has been done, but I am still a stranger to the data they (neighborhoods) are using and how they use them." "We'd love to know who is using the data."

RISK INTEGRATION AND CLASSIFICATION
Through iterative comparison and merging, the risks identified from the three cases are integrated into 14 types of risks: content legitimacy, data quality, data value, data management, platform support, information security, organizational support, resource input, institutional support, business process, use, imperfect regulations and standards, scarce external resources and external pressures. The 14 types of risks are then further categorized based on their sources and consequences, respectively.

Classification Based on the Risk Source
According to the source, the 14 types of risks are categorized into five classes: data (related) risks, technology (related) risks, management (related) risks, utilization (related) risks and external environment (related) risks.

Data Risks
Most of the data published on OGD websites are submitted by various government departments, while a small part of them are created by the OGD department itself. These data vary in their attributes accordingly with the clear-cut department functions. Some attributes may lead to uncertain consequences if the data are open. For instance, some data concern citizens' privacy, and their inappropriate disclosure may bring harm to the individuals involved. These risks arising from data attributes are categorized as data risk, which is mainly related to the legitimacy of the open content, the data quality, and the data value.
The risk in the legitimacy of open content refers to the possibility that the open content does not comply with laws, regulations, or other social norms. The risk of low data quality includes the possible distortion, error, obsolescence, or incompleteness of data content, format chaos, absent links or bad metadata. This kind of risks may bring inestimable losses to data users or damage the government's credibility. The main problems lie in the phase of data collection and pre-processing. Some desensitized data also suffer from quality degradation after de-identification. The risk of low data value refers to the possible investment loss of resources due to opening data that have trivial value, have no clear users, or are rarely used.

Technology Risks
Technology risks result from inadequate technical capacity and tardy understanding of new technologies, such as data management, platform support and information security. Technology problems in data management may lead to unsatisfactory effects of OGD, such as the use of outdated techniques for data collection, improper methods for data cataloguing, unstandardized metadata and improper data formats that encumber analysis and utilization. The low capacity and delayed updates of the OGD platform bring disastrous consequences, as evinced in case 2. Because of the loopholes in the information security technologies of OGD platforms, worrisome consequences, such as privacy leakage, data tampering and corruption, platform damage or falsely authorized certification, may occur.

Management Risks
Management risks are caused by problems that exist in OGD-related organizational structures, business processes, management styles, or mechanism designs; they comprise the following four types: organizational risks, resource input risks, institutional risks and business process risks. The organizational problems may exist in the lack of special position setting, the unbalanced allocation of power and responsibility between the data provider and receiver, poor inter-departmental communication and cooperation, weak OGD promotion strategies, poor management of business outsourcing, etc. Insufficient investment in talent or

Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
funds in OGD-related businesses may directly hinder the promotion of OGD. Incomplete management rules, business guidelines, incentive mechanisms or ill-conceived business processes upon OGD may also bring unwanted consequences.

Utilization Risks
Utilization risks arise from misuse, malicious or improper use, or insufficient use of the open data, which may result from insufficient literacy or capacity of OGD users. This may result in erroneous decision making, invasion of privacy, or insufficient exploitation of data value.

Environment Risks
Environment risk refers to a harmful impact upon the process, mode, or outcome of an OGD program due to the limitations of current institutions, resources, or public expectations, including: (1) Risk of imperfect regulations and standards. Some obstacles to OGD may be attributable to imperfect laws or regulations, inoperable technical standards, or rigid government administration systems. (2) Risk of scarce external resources, e.g., the lack of data governance experts, insufficient talent supply, or immature knowledge of OGD. (3) Risk of excessive external pressure. As shown in case 1, online negative opinions of pressure groups exerted significant pressure on the initiative of OGD.

Classification Based on the Risk Consequence
Judging from the consequences, the afore-mentioned 14 subtypes of risks can be divided into two classes: risks to OGD and risks from OGD. The risks to OGD may hinder the smooth running of an OGD program but will not undermine the legitimacy of OGD itself. These risks comprise10 sub-categories: risks in data management, information security, platform support, organizational adjustment, resource investment, institutional provision, business process, imperfect regulations and standards, scare external resources, and excessive external pressures. The risks from OGD refers to the possible negative effects caused by OGD, which may lead to challenges to the legitimacy of the OGD, including illegal contents, low data value, poor data quality and improper data utilization. The risks from OGD may lead to a sceptical attitude to OGD initiative, while the risks to OGD may influence the success of OGD program. The fear of both may become the actual barriers to OGD.

A Holistic Taxonomy Model of Risks Associated with OGD
Based on the analysis above, a holistic taxonomy model of OGD-related risks is constructed, as shown in Figure 2. First, all risks identified from the three cases are listed together and compared. Those that are similar are clustered and categorized into an upper class according to their sources and consequences, respectively. Next, the correspondence between the source-based and consequence-based classifications is analyzed. Finally, a holistic taxonomy model of OGD-related risks is constructed. This model can help government departments and the public better understand OGD-related risks and thus devise more efficient response strategies. chinaXiv:202211.00467v1

Distribution of Risks from Five Sources
Using a content analysis for the qualitative data of the three cases, the lifecycle distributions of risks from five sources are revealed, as shown in Table 1. The frequency of each type of risk occurring at every stage of the OGD lifecycle is calculated. At different stages, the distributions of the 14 types of risks are shown in Figure 3. Among the five stages, the data collection stage is risk intensive. This implies that an appropriate risk management plan should be made before the OGD program begins.

The Occurrence Stages of 14 Types of Risks
A radar chart is made to show the number of the occurrence stages of 14 risks. As shown in Figure 4, the risks that may occur at all five stages include those related to data value, data management, platform support, information security, resource input, institutional risk, and flawed regulations and standards. This indicates the importance of a strong technical platform and a sound management system for the success of the OGD program. The risks that may occur at four stages include organizational and business process risk, which still imply the importance of OGD project management. The risks that may occur at three stages are those related to content legitimacy, data quality, scarce external sources and external pressures.
Among the risks that occur at all five stages, only the risk related to data value is a risk from OGD. This means that, except their natural characteristics, the value of the open data should be improved at every stage of the OGD lifecycle. The other three types of risks from OGD emerge at three or fewer stages. The risk of data use only occurs at the utilization stage. This implies that a specific type of risk that may be brought by OGD should be controlled or avoided at a few specific stages. Most of the risks to OGD could be mitigated by improving project management and a few of them could be avoided by strengthening regulations and technical standards, e.g., the enforcement of EU GDPR.

RISK MANAGEMENT STRATEGIES FOR OGD PROJECTS
Risks from OGD and risks to OGD have different sources and consequences and are distributed at different stages in the OGD lifecycle. Strategies in response to them are suggested as follows:

Data Governance Strategy for the Risk in Content Legitimacy
The legitimacy of the open content is the first concern of many government departments, and has even become the default explanation for any deficiency in data openness. In response to this kind of risk, the government must accelerate the enactment of related laws and regulations (e.g., privacy protection act, and data security act), establish examination criteria for privacy-involved data and build an effective data governance system to ensure the legitimacy of open contents. In May 2018, the GDPR was enforced by the EU, empower people with more control over their personal data collected by all companies operating in the EU [45]. In the Nanhai district of Foshan city in Guangdong province, China, a data governance committee was established to determine the legitimacy of open data [30].

Value Appraisal and Feedback Mechanism for the Risk in Data Value
Among the departments that are inactive to data openness, some are unconvinced of the value of the data to be released. To ensure the value of data that have yet to be disclosed, it is necessary to establish a mechanism for data value appraisal mechanism. Further, a feedback mechanism should be set up to inform the data providing departments about the results of data utilization, thus giving them sufficient information to determine the priorities for future releases.

Quality Management and Data Provenance Strategy for the Risk in Data Quality
In response to the risk in data quality, guidelines and evaluation standards for stable data formats and metadata, secure platform and data provenance are needed to ensure the authenticity, integrity and usability of data at each stage of the lifecycle of an OGD program. For instance, the OPEN Government Data Act of the USA requires federal agencies to publish their information online based on an underlying open standard that is maintained by a standard organization [57].

Legislation Strategy for the Risk in Data Utilization
In response to data utilization risks, it is necessary to clearly stipulate the purpose, means, scope, and results of data utilization by speeding up the legislation, and mete out legal penalties for the malicious use of data.

Through the Lifecycle of the Risk Itself
During the different periods of the risk lifecycle, different strategies are needed to warn, evaluate and respond appropriately. [58] proposed the establishment of a data risk warning mechanism, an internal control mechanism and the fostering of risk coping ability. (1) During the period of latent risks, government departments must actively analyze and identify potential risks, make efforts to avoid and transfer them, and establish reserved countermeasures for use in times of crisis. In this phase, although the government of Shanghai had foreseen possible adverse consequences at the early stages of their OGD program, they were not able to formulate a complete risk response plan because they lacked an understanding of OGD-related risks. (2) In the risk occurrence phase, it is necessary to initiate the emergency plan immediately and take effective measures to minimize losses as much as possible, including timely cut-loss, communication enhancement and winning public understanding. Due to the failure of taking measures in time to pledge public understanding since 2 million publicity funds were not invested as originally planned, the care.data initiative of UK had to be stopped under public pressures [50]. (3) After the risk occurs, an emergency plan should be initiated immediately to minimize any adverse effects. Due to its limited capacity for crisis management and poor inter-departmental communication, the IRS failed to spot and solve the problem in time after the hacker invasion, and allowed the data-theft to last undetected for three months [59].

Through the Lifecycle of an OGD Project
At all stages of the lifecycle of an OGD project, it is necessary to implement corresponding risk control strategies. 1). At the stage of data collection, a complete OGD plan is needed. A detailed handbook should be developed to set the data open scope, conditions and technical means, especially the security level of data involving secrets. 2). At the data organization stage, a set of operable technique standards is needed so as to ensure trustworthy data quality, such as metadata, data preprocess, data catalogue, and data documentation. 3). At the stage of data release and utilization, detailed regulations should be made to ensure proper data use. Besides, emergency plans should be formulated in response to possible privacy leakage, malicious use, hacker and virus and weak publicity. 4). At the maintenance stage, plans should be made to prevent unsustainable resource investment to ensure the smooth progress of the project.

CONCLUSION AND DISCUSSION
Risk aversion is a potential factor of resistance to OGD, which stems largely from the lack of understanding rather than their uncontrollability. The construction of a taxonomy model of risks can help government departments understand them better, and thus avoid conservative inaction. By conducting a cross-case analysis on three cases of OGD in the UK, the USA, and China, this study identified 14 types of risks associated with OGD. According to the risk source, the 14 types of risks are classified into five classes:

Building a Holistic Taxonomy Model for OGD-Related Risks: Based on a Lifecycle Analysis
data, technology, management, utilization, and environmental risks. According to the risk consequence, the 14 types of risks are classified into two groups: risks to OGD and risks from OGD. A holistic taxonomy model of OGD-related risks is then constructed. Based on the model, the distribution of each type of risk across five stages of the OGD lifecycle is analyzed. It is found that the stage of data collection is riskintensive and seven types of risks may emerge at all five stages. It is also found that most of the risks to OGD could be avoided or mitigated by improving risk management throughout the lifecycle of an OGD project. In response to the risks from OGD, it is necessary to improve data governance, data value appraisal and feedback, quality management, and data provenance and speed up legislation.
Among the 14 types of identified risks, some have been discussed in previous studies, such as privacy leakage [60,61], low data quality [17,62], and improper use [63], but beyond these, this study also identifies several new risks associated with OGD, including those concerning content legitimacy, data value, weak data management, excessive external pressures, institutional flaws, and external resource scarcity. Furthermore, this study also makes the following contributions to this field: firstly, it distinguishes the risks to OGD from the risks from OGD, and thus differentiates the legitimacy of OGD from its smooth realization. This has a theoretical implications on deepening the understanding of OGD-related risks and supporting the rationality of an OGD initiative. Secondly, this study identifies OGD-related risks at different stages of the lifecycle of an OGD program from three cases in different countries and builds a holistic taxonomy model after clustering them. This fills the gap left by the deficiencies of previous studies that focused on a single country and presented the risks fragmentarily. Third, this study analyzes the distribution of each type of risk over five stages of the lifecycle of OGD and suggests specific strategies in response to them. This has practical implications for government departments that are promoting OGD.
The limitations of this study include: (1) Due to the authors' geographical limitations, the data for the cases of the UK and the USA are mainly collected from online sources. Although Google renders a comprehensive search, the absence of interviews with related government officials has prevented the researchers from directly assessing the attitudes of government staff on this point. To make up for this limitation, several rounds of in-depth interviews were conducted in the case of Shanghai. (2) The taxonomy model of OGD-related risks is built with a bottom-up induction approach from three cases. Although it can deepen the understanding of the OGD-related risks, it might have not involved all the potential risks that face other government departments. In the future, we will extend our approach to improve the taxonomy model of OGD-related risks with more cases and to empirically test the correlation between risk management and OGD performance.

AUTHOR CONTRIBUTIONS
F. Wang (wangfangnk@nankai.edu.cn) designed the whole framework, examined the results of data analysis and wrote the final paper. A. Zhao (zhaoanstudy@163.com) collected most of the data of the first two cases and made preliminary data analysis. H. Zhao (zhaohong@mail.nankai.edu.cn) analyzed part of the data and revised the draft. J. Chu (chujun620@163.com) took part in all the interviews and transcribed the records. chinaXiv:202211.00467v1