Optimizing ASReview Simulations: A generic Multiprocessing Solution for ‘Light-data’ and ‘Heavy-data’ Users

ABSTRACT Active learning can be used for optimizing and speeding up the screening phase of systematic reviews. Running simulation studies mimicking the screening process can be used to test the performance of different machine-learning models or to study the impact of different training data. This paper presents an architecture design with a multiprocessing computational strategy for running many such simulation studies in parallel, using the ASReview Makita workflow generator and Kubernetes software for deployment with cloud technologies. We provide a technical explanation of the proposed cloud architecture and its usage. In addition to that, we conducted 1140 simulations investigating the computational time using various numbers of CPUs and RAM settings. Our analysis demonstrates the degree to which simulations can be accelerated with multiprocessing computing usage. The parallel computation strategy and the architecture design that was developed in the present paper can contribute to future research with more optimal simulation time and, at the same time, ensure the safe completion of the needed processes.


INTRODUCTION
In today's academic world, the number of scientific papers on any topic is growing.While this wealth of textual data is a treasure trove for data scientists, it simultaneously presents significant challenges for anyone who wants to screen this literature systematically.While conducting a traditional systematic review implies marking scientific papers as relevant or irrelevant "by hand", the process of labeling can be optimized with the use of machine learning (ML) -especially pipelines based on active learning [1][2][3], a constant interaction between a human and a machine learning model.Many simulation studies have been published investigating the performance of different machine learning models mimicking the interactive screening process; see for a systematic overview Teijema, Seuren et al. [4].They identified 48 simulation studies testing the performance of active learning, encompassing 208 labeled datasets and leveraging 15 distinct machine learning models.A resounding conclusion was drawn: Active learning surpasses random reading, potentially saving up to 95% of the work.
To simulate the selection process of a systematic review one needs the meta-data of the records identified with a search query and all labeling decisions on each record.Using this information, an algorithm can automatically reenact the screening process as if a researcher were doing the labeling.The algorithm will process the papers using the human labels in the order as predicted by the machine learning model.With active learning, this is in the order from most to least likely.Each time a relevant paper is identified, the recall goes up by one unit.Typically, the performance of a model is evaluated using metrics like the percentage of relevant articles that have been found at a certain point during the screening phase [5], the work saved compared to manual screening to arrive at a given level of recall (WSS) [6], or the Average Time to Discovery (ATD) -a measure of how long it takes on average (expressed as a percentage of screened articles) to find a relevant record [7].These metrics are pivotal as they provide insights into the efficiency and accuracy of the models, thereby guiding the optimization of screening prioritization.
Since the landscape of machine learning and data processing is dynamic, with new models and techniques emerging regularly, it is imperative to continuously evaluate which model performs best for what type of data.While some software, like DistillerSR [8], supports running simulation studies, the capability to conduct large-scale simulation studies comparing many different machine learning models on a wide range of datasets is not integrated.Teijema, Seuren et al. [4] found that nearly all studies running simulations on active learning for systematic reviews use custom, single-use code for their simulation study.The open-source research infrastructure ASReview [9] is specialized in running transparent and fully reproducible simulation studies.Its prowess lies in its simulation capability, mimicking user interactions with pre-labeled data, thus replicating the exact dynamics of a systematic review but under controlled situations as if run in a lab.The software is designed to allow parameter variations like classifiers, prior knowledge, and feature extractor selection.Users can use this modular framework to verify built-in models or craft custom models.Running a simulation study in ASReview is straightforward via the WebApp or command-line interface.However, setting up a large-scale simulation study with many different machine learning models applied to many different datasets may easily result in hundreds of lines of code.The workflow generator, Make-It-Automatic-package (Makita) [10], is designed to ease such cases, providing a template generator that can compose all the code necessary to execute hundreds or thousands of simulations.Makita offers users three templates for multiple types of research questions: The basic template randomly varies prior relevant knowledge records and prepares just two simulations; the ARFI (All Relevant Fixed Irrelevant) template varies prior relevant knowledge records per simulation, and the Multiple Models template which varies ML models per simulation.Additionally, Makita's infrastructure also allows the creation of custom templates to the researcher's needs.
The issue, however, is that running the resulting code might involve many hours of computational time.For example, some ML algorithms, such as neural networks (NN), could take 100-fold more time than simpler models [11 p.8].Other ML models, for instance, SVMs, might have a square increase in the training time, which depends on the number of input records [12 p.12].What is more, running simulations multiple times to vary and test every component of the pipeline (e.g., feature extractor, sampling technique, training data) becomes crucial to answer research questions.For instance, it could imply an examination of the impact of different training datasets [13][14][15], tests of ML models [16,17,11], or evaluation of multiple datasets [18].Hence, the required number of needed computations exponentially increases as the components are varied, and it might reach more than 20 thousand simulations, as, for example, in the research of Teijema et al. [19].
Nonetheless, the current workflows generated by Makita pursue the sequential strategy of computations, so the simulation commands are executed one after another.If many simulations are needed, this approach might result in unnecessarily vast timeframes due to queuing time, which can burden the research workflow.That is, such tasks require a large number of computations not because of intensive but because of extensive computational complexity.Their 'heaviness' is caused not by algorithmically demanding tasks (i.e., in NN models) but rather by the high number of simpler ones.In other research fields, where a significant number of simulations is also employed for analysis, scholars adopt a parallel computing approach when the simulations are independent of one another -for example, in physics [20] or statistics [21].So, while the active learning model itself cannot be parallelized, running many simulations in parallel decreases the overall computational time of the entire study.The total time then depends on the slowest machine learning model.Therefore, in this study, we propose a similar multiprocessing solution for AL-aided systematic review simulations, which divides the large and complex set of simulations into several independent parts and optimizes the computation time.The main goal of this study is to contribute to the field of simulation studies investigating the performance of active learning (AL) aided systematic reviewing by introducing parallel and distributed computing techniques with the usage of cloud environments and resources.Implementing such techniques makes it possible to considerably optimize computational time, making the research more efficient and less burdensome.The current paper starts with a description of parallel computing, containerization, and orchestration technologies, followed by a technical explanation of the proposed cloud architecture and a guideline for data scientists regarding the multiprocessing of Makita templates.Then, we present the results of a simulation study investigating the computational time required for the ARFI template using various numbers of CPUs and memory settings.The study ends with a discussion describing the limitations of the presented study and its potential development in future research.

Parallel and Distributed Computing
If some parts of a computational task (e.g., calculation, algorithm, or a combination of such) are performed simultaneously, such type of solutions is called parallel computing (or distributed computing, or multiprocessing) [22].Parallel computing is usually conflated with concurrent computing, but they are distinct, although not mutually exclusive, concepts.While the former advances the task performance by breaking it down into multiple similar and independent tasks, the latter approach also implements the "divide and conquer" principle.However, the tasks in concurrent computing are co-dependent and address different issues [23].

Containerization technology
Using a powerful local computer might be less flexible and economical than running simulations on cloud services several times because the latter can provide as many available memory, processing, and storage resources as a potential user needs.Therefore, executing large processes on the cloud might be a better option.To make the application scalable to the optimal number of instances, an approach called container-based virtualization (containerization) could be utilized.This technique involves packaging the application with all other technologies required for its functionality, making it compatible with any hardware environment [24].
The core principle of container technologies lies in enabling the processes and their resources to be isolated without any hardware emulation and specific hardware requirements.Another advantage of containerization is that it makes software lightweight and portable by using its abstraction -a container image.It contains information about necessary settings, code to install package dependencies [25].Today, there is a multitude of open-source pieces of software developed for containerization, like Docker [26].
Containerized applications allow the parallelization of the simulations following the principle depicted in Figure 1, which was inspired by the approach of Tesliuk et al. [27].Figure 1 shows an atomic example of two containerized instances of parallelization that can be adjusted to a number of available CPU cores.Assigning one container with an instance of AL-aided software to one CPU unit enables the isolation of the inner processes from other CPUs, so if there are any other processes' we prevent conflicts for the CPU's resources.Further, we refer to a CPU or a CPU assigned to a container as CPU/Container.
In contrast to a sequential strategy of Makita-generated workflow, the presented approach takes the queue of the necessary commands and creates a pool out of them.From that pool of "Input commands," we send them to various CPUs/Containers and execute them in parallel and simultaneously, thereby saving time.It is noteworthy that commands are sent to CPU/Containers, which are not currently processing any simulation and have already output the result to an allocated file place.Therefore, there are as many queues as the number of CPUs/Containers.

Orchestration System
While the traditional multiprocessing in algorithms utilizes CPU kernel for allocating and scheduling tasks, the performance of the same functionality over containers requires special orchestration technologies.They were initially developed in large IT companies that have been implementing scalable high-load services like Amazon, IBM, or Google.When the demand for an orchestrated application service increases or decreases, such tools help to upscale or downscale the number of application instances.Therefore, the demand for this application can be matched in an optimal manner.There are multiple available instruments of orchestration systems.Orchestration systems such as Kubernetes are a subset of system administration, which can be distinguished by the possibility of automated configuring, coordinating, and managing computer systems and software [28].
The orchestrating approach enables easy integration of diverse software in a unified computing platform and allows to manage and to scale containerised software in large computing infrastructures.In combination Figure 1 The architecture of parallelizing a simulation study with input commands between two instances of CPU (or CPU attached to a Docker container) and allocating them to the files of commands' output.After the Container/CPU finishes a simulation, it takes a new command from the pool of input commands.Both left and right Containers/CPUs have their own queue, which is denoted by the numbers (e.g., 1 or 2).While Files 1 and 2 are the output files of two already completed simulations, Files 3 and 4 correspond to the output of two simulations, which are in progress.File N is a set of files that will be created from upcoming commands.
with container technologies, such management systems can ensure the security of the parallel processes, their isolation from each other, and secure and simplified networking [29].This makes parallelization on the level of containers safer than parallelization on the level of CPUs themselves, because the traditional computer kernel management might be burdened by paging out of processes and denial-of-service cases, when a system cannot fulfil users requests [30].In overload scenarios, for example when the number of parallel commands exceeds CPUs availability, containers can guarantee the isolation of processes and provide troubleshooting functionality.

ORCHESTRATION SYSTEM OF ASREVIEW AND TYPES OF KUBERNETES PODS
The current study presents a cloud architecture for ASReview LAB v1.x [31] and is designed for applying a parallel computation strategy to ASReview Makita templates [32].This section is dedicated to explaining and describing each of these components separately.

Overall workflow
The parallel computations might be implemented on various levels: bit-parallelism, instruction level, data level.and tasks parallelism.Our multiprocessing strategy was focused on parallelizing ASReview simulations on the task level by using many container instances of the ASReview application and running separate tasks on them.There are four types of components: a Tasker, a Message Broker, Worker, and Volume.There can be only one Tasker, Volume, and Message Broker, whereas the Worker components can be multiplied as long as there are enough CPUs and memory.
The Tasker initiates the process using a given data and the Tasker.shscript.The Tasker.sh script splits the Makita-generated commands into three blocks of commands, which are not parallelizable between themselves.The Tasker executes the first block to prepare the Volume.
The second and third block are sent to the Message Broker one at a time.Each command block becomes a message queue where each command is a different message.The messages are sent to the Workers.In response, after the simulation is completed, the Worker stores the result in the Volume component and sends the message that it is finished with the task to the Tasker component (via Message Broker) and waits for another message from the Message Broker.

Message broker
RabbitMQ software is used as a message broker.RabbitMQ implements the Advanced Message Queuing Protocol (AMQP) and supports various messaging patterns, such as request/reply, and what is more important for parallel processing -work queues.The Message broker persists as a Service abstraction, meaning it is running until the user manually deletes it or it meets a deletion condition.RabbitMQ solves the distribution of jobs (simulations commands in our case) among Worker pods and prevents Workers from receiving a new job before the previous one is solved by messaging about their completion.The Tasker job sends all commands to the RabbitMQ queue and waits until it receives an equal amount of confirmation before moving to the next group of tasks.

Volume Component
The Volume is the component that stores the data and results.It works as a shared disk space between all Workers and the Tasker.

Tasker job
The Tasker component is a job Kubernetes abstraction, meaning it runs only once and finishes once the Docker image terminates.First, it copies the provided datasets and runs the specified Makita template, which is split into three parts: the part of directories definition, the part of simulations commands, and the part of metrics computation.While the two latter are parallelized one after another to ensure that metrics can be computed based on the completed simulations, the former part cannot be parallelized as it creates the directories' structure, which must be executed sequentially.After the directories are created, the Tasker distributes the commands among Worker via RabbitMQ -first, from the simulations part and, second, from the metrics part.

Data Intelligence 327
Compared to the previous component, the Worker is implemented as a Kubernetes deployment, meaning they run continuously and will restart if they terminate (successfully or not).They are responsible for running the Makita templates' commands sent from the Tasker and storing the results in the Volume.In order to make it possible, each Worker pod, and thus the container inside, has the list of packages and dependencies that the ASReview application needs.Moreover, the Worker is responsible for sending messages back to Tasker, so it is aware that the command ended successfully.

Where to start
The presented architecture was provided with explicit documentation describing the steps necessary to run a large number of simulations via the cloud and local environments.This implementation has been made publicly available at the GitHub repository, where all used scripts and documentation are covered under the Apache License Version 2.0 (https://github.com/asreview/cloud-usage).

SIMULATION STUDY METHODOLOGY
To test the infrastructure and to provide a proof-of-principle, we ran a simulation study investigating the computational time required for an ARFI template using various numbers of CPUs and memory settings.

Setup
Initially, the cloud infrastructure was developed and tested locally on several machines (MacBook M1 chip and Dell XPS 2018) using both Windows and MacOS operating systems for proof runs.
Then, we implemented the infrastructure using a cloud environment with Linux Ubuntu OS.For the cloud infrastructure provider, we have been using SURF Cloud -a non-commercial cloud infrastructure benefiting Dutch academic researchers and stakeholders [33].
In Table 1, the specific details of the used software on the cloud can be found.
Regarding the AL tool used for the simulations study, the default ASReview simulation settings were utilized that consisted of Naïve Bayes as a classifier model, TF-IDF as a feature extractor, 'MaxQuery' as a query strategy, and dynamic resampling as a balancing strategy.ASReview v1.13., Docker (version 24.0.0],Minikube Kubernetes (version 1.30.1), and RabbitMQ (version 3.12.0),were selected for this study.

Analytical strategy
We conducted multiple time measurements of ARFI Makita template simulations using different limits for all allocated memory for Minikube Kubernetes implementation, different numbers of central power unit (CPU) cores, and different volumes of allocated RAM per CPU.
Data Intelligence Just Accepted MS. https://doi.org/10.1162/dint_a_00244Data Intelligence Just Accepted MS. https://doi.org/10.1162/dint_a_00244First, the default Kubernetes limit of RAM was 2GB for the whole implementation, and then it was increased to 60 GB.Regarding the CPU, the number of cores was varied between 1 and 14 with increments of 2 cores for each test.Furthermore, we varied the memory allocated to each Worker container -one set of runs with 1024 megabytes of RAM per CPU and 2048 megabytes of RAM per CPU.
In addition to that, we used the GNU parallel [33] package and implemented parallelized simulations on the same CPUs of SURF cloud machine but without Docker and Kubernetes.In the current study, we refer to this implementation as 'bare metal'.
It is important to note that Kubernetes utilized additional CPUs by default for Tasker and for RabbitMQ, one per each, and, thus, the run of Kubernetes cannot be compared to the 'bare metal' at the level of 16 CPUs; nonetheless, we included the measurements for 'bare metal' in order to show the maximum speed up capacity available with a given setting.
Thereby, time measurements with CPU numbers from 1 to 16 were implemented in four rounds and compared to a benchmark sequential ARFI run time of 387 seconds: • Kubernetes with a default limit of RAM;

Metrics
To evaluate the performance of multiprocessing strategies, time for computation, speedup ratio, and parallel efficiency metrics were calculated per each run.Time for computation: T denotes the time (in seconds) taken by a given simulation run from its start to finish.
Speedup ratio: S p =T 1 /T p , where T 1 denotes the time for sequential computation of the whole set of simulations on one CPU (default run), and T p is the time for computation of the whole set of simulations with parallelization using p number of CPU cores [34].The speedup ratio indicates how many times a given multiprocessor program run is faster than a sequential run.For example, if the serial process is run in T 1 = 10 seconds, and the same but the parallelized process is run in 5 seconds, thus, the speedup ratio is 2.
Parallel efficiency: E=S p /p, where S p denotes the speedup for a certain number of cores, and p is the number of cores [35, pp81-82].This metric provides information about the percentage of speedup each CPU core provides.If E < 1, the speedup is called sublinear; if E ≈ 1, the speedup is called linear; if E > 1, the speedup is called superlinear.

Dataset
To conduct the simulation study one dataset was taken from the Synergy datasets collection -a free and open-source dataset on study selection in systematic reviews, comprising 169,288 academic works from 26 systematic reviews [36].The dataset collection was published with an open CC0 1.0 Universal Licence.In our work, we used PTSD dataset [37].It is a collection of 4,544 abstracts of studies related to post-traumatic stress disorder (PTSD) trajectories.It contains information on the title, authors, abstract, key-words, and record's status of relevance.This status indicates whether the study met the criteria for being included in the systematic review.The dataset of van de Schoot is used as a benchmark dataset in the ASReveiw software, meaning that it is more familiar to its current users, and it might be a feasible demonstration of the possibilities of cloud environment and architecture for them.This makes a good demonstration for the purpose of the current study in the available time frame.
The overall inclusion rate (38 records, 8% of the whole data) resulted in 38 simulation runs (one for each relevant record), indicating that per each combination of CPUs number with GNU (8 � 38), it is 304 simulations, and three times for Kubernetes with ARFI per each combination of CPUs number (7 � 38 � 3) is 798 simulations (38 + 304 + 798 = 1140).
All scripts to reproduce the simulation study are available under the MIT license at https://github.com/Data Intelligence Just Accepted MS. https://doi.org/10.1162/dint_a_00244Data Intelligence Just Accepted MS. https://doi.org/10.1162/dint_a_00244zoneout215/asreview-makita-multiprocessing.During the initial time measurements of 266 simulations with Kubernetes, it was found that after using more than 4 CPUs and reaching the performance of 273 seconds, the computational time (T) ceases to decrease.As can be seen in Figure 3, T stays between 275 and 298 seconds.That was caused by the default setting of Minikube Kubernetes, which limits the RAM memory for all Worker containers to 2 GB, and the usage of more than 4 Workers with 1024 megabytes of RAM conflicted with these limitations.Although the default settings have imposed some limits, the usage of such amounts of memory already gave almost a two-fold speedup from 483 seconds (non-parallelized default ARFI depicted by the red bar) to a minimum of 273 seconds in the first round of tests.Because those Minikube limits do not provide information about the accelerating capacity of the architecture, speedup ratio and parallel efficiency were not computed for this round.

Comparison of Parallel Computing Strategies
After adjusting the limits of the Minikube Kubernetes default settings (2 CPUs) and allocating more RAM to each CPU, the results indicated a decrease in overall computational time as well as per each CPU number; see Figure 4 and see Appendix B for speedup ratio measurements.Data Intelligence Just Accepted MS. https://doi.org/10.1162/dint_a_00244Data Intelligence Just Accepted MS.
Whereas with only one CPU, the default ARFI run (387 seconds) was faster than 'bare metal' (411 seconds) and both second (506 seconds) and third (500 seconds) Kubernetes runs, with the step to 2 CPUs the latter three had T of 203, 265 and 262 seconds, respectively, and outperformed all first Kubernetes runs from Figure 3 (273-483 seconds) and the default ARFI run.
Comparing Kubernetes runs with different RAM allocation per Worker, the third round of simulations with 2048 megabytes of allocated RAM per Worker had a different outcome than with 1024 megabytes: it was faster (the mean difference is approximately 3.11 seconds), but still slower than the "bare metal" simulations.
What is more, all implementations apart from the first Kubernetes round show a more drastic speedup in Figure 5 Speedup ratios for 38 parallel simulation studies distributed over various CPU numbers (from 1 to 16 CPUs) for three rounds of runs.On the x-axis are numbers of CPU cores, and on the y-axis, there are ratios of the benchmark run to the three rounds of runs, respectively.Note that the simulations with 1 CPU were not included because they do not have a parallelizing aspect.Each bar is the time of running a set of 38 simulations.There are 570 simulations on this plot in total.the steps between a smaller number of CPUs than with 12, 14, or 16 CPUs, thereby forming a plateau in terms of time optimization with the higher numbers of CPUs; see Figure 5.For example, for GNU implementation, there are similar timings of 46 seconds when 14 and 16 CPUs are used, resulting in the same speedup.At the same time, the speedup between 12 and 14 CPUs for both Kubernetes was 5.16 and 5.22 times and 6.14 and 6.14 times, respectively, which is faster than the benchmark ARFI run.
In contrast, the initial step from 2 to 4 CPUs produced a larger speedup: for GNU -from 1.9 to 3.36 times, for Kubernetes with 1024 megabytes -from 1.46 to 3.45 times, for Kubernetes with 2048 megabytes -from 1.47 to 2.58 times; see Figure 5.

Speedup Patterns and Efficiency
Notably, in the 2 CPU run, the GNU parallelization solution achieved 95.3% parallel efficiency, allocating each core with maximum observed efficiency; see Figure 6 and see Appendix C for Parallel efficiency measurements.
In comparison, Kubernetes showed only 73.8% efficiency in similar CPU configurations.Throughout the study, Kubernetes indicated a lower level of parallel optimization than GNU, except for the case of one run with 4 CPUs and 1024 of RAM, when it was faster and more efficient by 2%.
With more than 12 cores, in both cases, Kubernetes exhibited less than 50% efficiency (see Figure 6 and Appendix C), indicating diminishing returns in terms of performance improvement, while GNU had 52% of parallel efficiency per core only in the last 16 CPUs run.
The overall efficiency and speed of all rounds of simulations showed a notable decline, accelerating fewer times than the number of CPUs, constituting a sublinear manner of speedup.For instance, the most optimal and efficient simulation run improved by 1.903 times with 2 CPUs compared to the serial ARFI run, which  Nevertheless, all parallel computing strategies implemented in this study were discovered to perform faster than a default sequential ARFI run (387 seconds) (see Figures 3 & 4).

DISCUSSION
In this paper, we demonstrated the degree to which the Makita ARFI template can be accelerated with multiprocessing usage.Compared to the serial processing benchmark, all four rounds of paralleled runs of ARFI were executed faster just using more than one CPU.With the suggested parallel computation technique developed with Kubernetes architecture design, future research can benefit from more optimal simulation time and, at the same time, ensure the safe completion of the needed processes.
In previous studies, which have also implemented orchestration solutions for computationally intensive tasks [27], Kubernetes was compared to a parallelization without the usage of containers and orchestration ('bare metal').
However, initial expectations based on Tesliuk et al. [27], where Kubernetes demonstrated faster and more efficient performance, were not met during this study.The overperformance of Kubernetes over GNU was observed only with four cores with 1024 megabytes of RAM per Worker, suggesting that this specific configuration may be optimal for Kubernetes with the given settings and dataset.Nevertheless, it still shows an opposite outcome compared to the results of Tesliuk et al. (p.70), and it is noteworthy that this optimal configuration may not be generalizable to other datasets or scenarios.There are several differences in the setups of our analysis and their work, which potentially might have produced such an outcome: their study have used GPU, our work was based only on CPU, their data was not textual, but particles' images and there was no Message broker software (pp.67-68).In our study, we suppose that the presence of RabbitMQ within the ASReview Kubernetes setup has affected its overall performance, preventing it from outperforming the "bare metal" solution, as the difference in GPU and data types would have affected processing time regardless of containerization or 'bare' CPU exploitation.
Although the parallelization with the GNU package outperformed Minikube Kubernetes, except for one specific test, both address the possibility of making the total computation time of large AL simulations studies faster and more optimal.Despite the "bare metal" solution providing higher speedup and parallel efficiency, it should be noted that it lacks the degree of security and process isolation possessed by containers [29 p.44].In more data-intensive cases, for which the cloud architecture was designed, parallelizing on default CPUs may compromise process security and introduce troubleshooting challenges, such as process page-out [30].Thus, both the Kubernetes architecture and the GNU parallel package may be used to optimize ASReview Makita, thereby constituting a trade-off between the degree of optimization and data-processing security.
The dataset we analyzed possesses its own specific and idiosyncratic textual features, and this influences the computational time and number of cores with the best efficiency.In other words, the provided results may lack generalization; however, the time measurements and experimental runs demonstrate the potential behind Data Intelligence Just Accepted MS. https://doi.org/10.1162/dint_a_00244Data Intelligence Just Accepted MS. https://doi.org/10.1162/dint_a_00244parallelizing simulation runs in the Makita template.To facilitate future research in this area, we would like to highlight that any researcher interested in conducting similar simulation studies can access the Synergy datasets collection.This collection offers a variety of datasets, including those on different topics and with different characteristics.By utilizing the Makita templates, researchers can easily adapt our methodology to their chosen dataset, thereby exploring the scalability and efficiency of parallel computing strategies in various contexts.High variability of AL performance with dependence on data brings new challenges and questions about parallel simulations of systematic reviews.The datasets' characteristics differ not only in size and proportion of relevant records but also in the semantics of the covered topics, syntax, vocabulary, and morphology of the datasets' language [38].Continuing the current research, it is noteworthy to focus on the extensions in terms of data in similar and other simulation settings of the options within ASReview and its Makita templates.
The sublinear speedup we observed is ofter caused by shared data among threads, leading to increased data movement and latency in large networks [39].Our architecture, however, avoids these by assigning separate data to each CPU.A key factor we discovered is the uneven distribution of simulation execution times, particularly under the ARFI methodology, where different starting points lead to variable computational demands.For instance, in a setup running 38 simulations on 38 CPUs, idle time occurs when some simulations conclude faster than others due to the earlier identification of all relevant records.To mitigate this and move towards linear speedup, we propose enhancing parallelization at the simulation level, allowing CPUs to assist each other post-completion of individual tasks.This approach could significantly improve resource utilization, although achieving superlinear speedup would require deeper architectural changes beyond our current study's scope.Moreover, many of the feature extractors and classifiers that are utilized in AL pipelines support parallelization by design.Whereas random forest can be parallelized [40] but does not take a lot of time, most neural networks (NN) architectures tend to have the most prolonged learning time frames [11].Expanding on the parallelization aspect, it should be noted that NNs are specifically suitable to be parallelized on (graphical power units) GPUs [41].The parallelization on GPUs can bring about significant benefits, particularly in feature extraction or classification timeframes with NN architectures.Lastly, the conducted experiments methodologically are not suitable for drawing causal conclusions about the efficiency of the presented strategy but rather present observational and descriptive results.Cloud and multiprocessing technologies should not be perceived as a 'silver bullet' solution for the acceleration of simulations investigating the performance of active learning models because the performance of pipeline simulations varies dramatically depending on the choice of dataset.

CONCLUSION
We presented an architecture design with a multiprocessing computational strategy for ASReview Makita (Make It Automatic) templates, which helps run simulation studies mimicking the screening process of active learning (AL) aided systematic reviews.The provided solution can be conducted on both local and virtual machines.The number of Kubernetes Workers is only limited to the availability of CPU and has less queuing computational limits than classical sequential strategies, which suits the necessities of processing numerous

Figure 2
Figure 2Diagram of the ASReview parallelization design for cloud Kubernetes cluster implementation that describes the setup manual steps and the way two Docker components (Worker and Tasker) communicate using a 'tasker.sh'and 'worker.sh'bash scripts with the addition of RabbitMQ Message broker

Figure 3
Figure 3 presents the results of multiprocessing timings (in seconds) for 38 parallel simulations distributed over various CPUs numbers (from 1 to 14 CPUs) for one round of runs and a benchmark timing; see Appendix A for computation time measurements.

Figure 3
Figure 3 Multiprocessing timings (in seconds) for 38 parallel simulations distributed over various CPU numbers (from 1 to 14 CPUs) for one round of runs and a benchmark timing.Each bar is the time of running a set of 38 simulations.There are 304 simulations in total in this plot.

Figure 4
Figure 4 Multiprocessing timings (in seconds) for 38 parallel simulation studies distributed over various CPU numbers (from 1 to 16 CPUs) for three rounds of runs and a benchmark timing.Each bar is the time of running a set of 38 simulations.There are 874 simulations in total in this plot.

Figure 6
Figure 6Parallel Efficiency for 38 parallel simulation studies distributed over various CPU numbers (from 1 to 16 CPUs) for three rounds of runs.On the x-axis is the number of CPU cores, and on the y-axis, there is a percentage of speedup (efficiency) each core has in the three rounds of runs.The simulations with 1 CPU were not included because they do not have a parallelizing aspect.Each bar is the time of running a set of 38 simulations.There are 570 simulations on this plot in total.

Optimizing ASReview Simulations: A generic Multiprocessing Solution for 'Light-data' and 'Heavy-data' Users
Data Intelligence Just Accepted MS. https://doi.org/10.1162/dint_a_00244Data Intelligence Just Accepted MS. https://doi.org/10.1162/dint_a_00244 a summary of the Kubernetes cluster and configuration of Kubernetes Pods -a native abstraction in the Kubernetes architecture, which can consist of either one or more application containers.

Table 1
Note that the Kubernetes needs CPU per Tasker and RabbitMQ components; hence, there are no Kubernetes Software utilized during the study