Revealing the Trends in the Academic Landscape of the Health Care System Using Contextual Topic Modeling

ABSTRACT The health care system encompasses the participation of individuals, groups, agencies, and resources that offer services to address the requirements of the person, community, and population in terms of health. Parallel to the rising debates on the healthcare systems in relation to diseases, treatments, interventions, medication, and clinical practice guidelines, the world is currently discussing the healthcare industry, technology perspectives, and healthcare costs. To gain a comprehensive understanding of the healthcare systems research paradigm, we offered a novel contextual topic modeling approach that links up the CombinedTM model with our healthcare Bert to discover the contextual topics in the domain of healthcare. This research work discovered 60 contextual topics among them fifteen topics are the hottest which include smart medical monitoring systems, causes, and effects of stress and anxiety, and healthcare cost estimation and twelve topics are the coldest. Moreover, thirty-three topics are showing in-significant trends. We further investigated various clusters and correlations among the topics exploring inter-topic distance maps which add depth to the understanding of the research structure of this scientific domain. The current study enhances the prior topic modeling methodologies that examine the healthcare literature from a particular disciplinary perspective. It further extends the existing topic modeling approaches that do not incorporate contextual information in the topic discovery process adding contextual information by creating sentence embedding vectors through transformers-based models. We also utilized corpus tuning, the mean pooling technique, and the hugging face tool. Our method gives a higher coherence score as compared to the state-of-the-art models (LSA, LDA, and Ber Topic).


INTRODUCTION
Healthcare systems are intricate.These are composed of various interrelated parts.The World Health Organization (WHO) says a health system is "all institutions, people, and actions whose primary aim is to promote, restore, or maintain health."They usually include both rural and urban areas, public and private systems, formal/allopathic, and informal/traditional methods of providing healthcare, as well as being at the national level, giving them a tremendous scope.[1].
There are many more functions that health systems do in society in addition to providing healthcare and other treatments to maintain or improve health.They aid in preventing financial fallout from illness and medical costs for homes.It's imperative to keep in mind that health systems contribute to society's economy [2].The health system is a sector of the economy that provides employment, revenue, and business opportunities for many health workers and enterprises.For instance, there is some research that suggests that a population's health may have an impact on economic productivity.A wider range of societal norms and values are established through health systems, which are also social and cultural institutions [3].Health systems and the broader environment interact dynamically.Because of their diffuse nature and usually porous borders, health systems must include the social, political, and economic environment while assessing their structure and efficacy.
Moreover, the health systems are locations where actors with various wants and desires compete and argue.Setting health priorities, funding health systems, and allocating resources within the system are all contentious issues.The place of the state and the market within health systems, as well as the function that a health system should serve in society, are frequently the subject of ideological and political disagreement.These various facets of the complexity of health systems are rarely addressed simultaneously and are transdisciplinary.The fact is that numerous disciplinary perspectives, such as those of history, economics, medicine, epidemiology, politics, law, ethics, anthropology, and sociology, are necessary to fully study and comprehend health systems [1].
The researchers in different studies applied bibliometric methods to analyze the healthcare literature [4,5] or extract latent patterns from the scientific literature on various subjects of the healthcare system [6,7,8].For example, in [6] authors describe a mapping of the research on healthcare operations and supply chain management.In [8] scholars state the exploratory analysis of research on IoT in healthcare.These studies are limited to a specific disciplinary viewpoint and do not show the holistic picture of healthcare research.Moreover, these approaches do not capture the context of the discussion.In the current study through the application of computational methods and advanced topic modeling tools, we capture the context of the research so that the topics are more semantically understandable.This goal is achieved by utilizing a contextual word embedding-based topic modeling method.It uses sentences as the elementary unit of analysis for creating embeddings.Combining computational methods with qualitative data analysis, we provided highly objective, coherent, superior, and meta-analytical insight into current research on healthcare systems.This study's overall technical and theoretical contributions can be illustrated as follows: Revealing the Trends in the Academic Landscape of the Health Care System Using Contextual Topic Modeling (i) We developed a novel contextual topic modeling approach that incorporates corpus tuning and mean pooling techniques to design healthcare Bert which we link up with CMT(CombinedTM) to generate the contextual topic modeling in the domain of healthcare care.(ii) We compare our model with LSA, LDA, and Ber Topic.Our model achieves a maximum coherence score as compared to these state-of-the-art models.(iii) This study performs the quantitative assessment of scientific literature in the domain of health care in terms of contextual topics and classified the topics into hot and cold categories on the basis of p-values.We also investigated topic clusters and correlations by exploring inter-topic distance maps.(iv) We also elaborated the top cited studies qualitatively to cross-validate the topic themes which enhance the significance of our findings.
The rest of the article is divided into six sections.The first section describes the introduction.The related work is covered in the second section, and the materials and methods are covered in depth in the third section.Results and discussions are covered in the fourth section, followed by a conclusion in the fifth section.

LITERATURE REVIEW
Here, we discuss (i) the top-cited works on the healthcare system as well as (ii) related work on topic modeling approaches.

Healthcare Systems
This section covers some overviews of highly cited research on healthcare systems.Significant research generally deals with diseases, diagnoses, interventions, treatments, and other related subjects.In the context of diseases, the scholars focus on heart disease stroke, congenital heart disease, and other vascular diseases [9,10,11,12].Regarding healthcare interventions, scholars emphasize various subjects such as the acceptability of the interventions, evaluation of interventions, behavioral interventions, and care transition interventions [13,14].The most cited research also analyzes medication effectiveness [15] and stigma as the cause of health inequality, service utilization of lifetime mental disorder, cultural competence in the delivery of healthcare services, and patient perception of hospital care [16].
The healthcare industry is undoubtedly the most significant of all the sectors that have profited from technological adoption.As a result, it eventually raised the standard of living and contributed to several life savings.The research scholars developed various tools and techniques to automate the various operations and tasks from a healthcare perspective.For example, a 3D slicer, a clinical research tool similar to a radiology workstation [17].WSN technologies have various applications in the health sector like sensorintegrated devices which can monitor human activity such as pressure, temperature, and strain.It also provides monitoring facilities through contextual information that minimizes the caregiver's needs [18,19].Additionally, the IoT-enabling solutions based on a WSN, RFID, and mobile technology can monitor

Revealing the Trends in the Academic Landscape of the Health Care System Using Contextual Topic Modeling
patients, personnel, and healthcare devices are another application of technology for healthcare services [20].
Material sciences also have a significant contribution to the healthcare industry.For example, electrospun nanofibers can be used for membrane preparation [21].Silk-molded electronic skin can monitor the psychological signals of human beings [22].From a healthcare perspective, a large amount of data is generated that can be processed using machine learning and deep learning techniques.It advances healthcare research and improves human health [23,24].Beyond these areas, the physician acceptance of telemedicine technology and cell phone-based interventions (voice, text messages) are being evaluated as an alternative to ordinary business settings [25,26].Additionally, the usage of technology in the healthcare system positively impacts hospital revenue and the quality of services delivered to patients [27].Video conferencing technology has also been applied in healthcare to train primary care providers to treat complex diseases like HCV infection, which increase the patient-treatment ratio [28].
Regarding healthcare costs and related subjects, researchers pointed out various factors that increase or decrease healthcare costs.For example, medication adherence, higher medical adherence decreases patient hospitalization [29,30].The implication of patient follow-up intervention [31] and fall-related injuries can overcome re-hospitalization risk and expenditure [32,33,34].Surgical site infections (SSIs) are one of the major contributors to healthcare-associated infections and contribute significantly to the damage in medical care through the over-length of stay at hospitals [35,36].Infections such as; Clostridium difficile infection and antibiotic-resistant bacteria threaten the healthcare system as these are the cause of various deaths [37,38,39,40].Therefore, the deployment of a surveillance system can provide an estimate of the burden of the infection.Additionally, prevention activities along with surveillance can avoid infections and overcome the burden of extra costs in the healthcare system [41].Cancer imposes various health and economic burdens in terms of its treatment which reflects a substantial increase that highlights the importance of cancer prevention efforts, which may result in future savings to the healthcare system.Therefore, research recommends early cancer detection and treatment for effective cancer control [42,43].

Topic Modeling
The group of algorithmic machine learning techniques used in the field of text mining is called probabilistic topic models.These models look for structural patterns within a corpus to extract semantic data.The topic templates create word clusters representing the major subjects in a given corpus.These methods offer an automatic method of locating common topics in the papers that are being displayed in this manner.Topic modeling can be performed using various approaches that employ algorithms like NMF, LSA LDA, and clustering employing the K-means or Ward's method used for hierarchical clustering [55,56].
A variety of statistical and probabilistic approaches are used in language modeling (LM) to estimate the likelihood that a given string of words will appear in a sentence.Language models examine the corpora of

Revealing the Trends in the Academic Landscape of the Health Care System Using Contextual Topic Modeling
text data to provide a foundation for their word predictions.These models are more efficient as compared to other approaches as they take into account the meaning and semantics of words and sentences as well as the relationship between words.Additionally, by using this strategy, we were able to achieve the highest level of semantic integrity within each topic, enhancing the topics' relevance and differentiating them from one another [57].Moreover, google created a transformer-based machine learning method for pre-training natural language processing called Bidirectional Encoder Representations from Transformers Known as BERT.It was developed and released by Google employees Jacob Devlin and his team in 2018 [58].
In the literature, we found various approaches that focused on the analysis of scientific discourses.The existing studies mostly concentrated on using bibliometric techniques [4,5], Latent Dirichlet Allocation (LDA) based topic modeling techniques [56,59] or qualitative content analysis [60].The other qualitative methods also applied by the researchers in the scientific trajectory analysis include systematic mapping reviews, critical reviews, and narrative reviews [61].Researchers primarily developed keywords-based analysis techniques that do not usually capture the context of the discussion.The applications of traditional methods such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis are difficult given the high dimensionality of massive data.The problem also exists of unclear topics caused by the sparse distribution of topics [62].In [63]  The generalized Bert model is less efficient in detecting cluster quality and Kmeans is inefficient in generating clusters of data which has outliers.To solve the above challenges, we applied contextual topic modeling [64] in the domain of health care.We tuned the specialized Bert model (Medical-Bio-Bert2) on the corpus of healthcare research articles and then added a mean pooling layer on top of it giving us a novel HealthcareBert which we link with the CombinedTM (CTM) model to develop a novel contextual topic model.This strategy improves the coherence score, giving more accurate embeddings and resulting contextualized topics.We also compared our model with the state of art LSA, LDA, and Ber Topic models.Our model outperforms the existing models.We further investigated the topic trends and correlations among the contextualized topics to add more dimension to understanding the healthcare landscape.

MATERIALS & METHODS
In this section, we discuss the data collection, its pre-processing, and the methodology for the contextual topic modeling procedure in detail.

Data Collection
We selected Scopus as a data source and the following query: TITLE-ABS-KEY ((healthcare system* OR health care system* OR Health-care system* OR healthcare* OR health care* OR Health-care*)) AND (LIMIT-TO (SRCTYPE, "j")) AND LIMIT-TO (DOCTYPE, "ar")) AND (LIMIT-TO (LANGUAGE, "English")) is executed in this database on 30 th November 2022.As a result, we obtained 29600 records having publication dates from 2000 to 2022.We removed duplicates and empty abstract articles from the dataset; the rest have 28036 records.

Pre-processing
After the data collection process was finished, pre-processing of the data was done before modeling to increase the data quality.The text in the data set was first divided into tokens using word tokenization.Then lowercase was applied to the tokens.The text was cleaned up by getting rid of the numerals, punctuation, and stop words.This was achieved using a typical English stop word list (n=153).The text is cleaned up using the stemming and lemmatization procedures.

Healthcare Bert
We develop a transformer-based deep learning model as Healthcare Bert to enhance the semantic understanding of the topics.We tuned fspanda-Medical-Bio-Bert2 available through the hugging faces tool on the healthcare corpus and generate an improved Transformers-based model to provide more accurate context vectors in contextual topic modeling.There are various pooling methods (e.g., CLS pooling, Mean pooling, Max pooling) for transformer models [65].We added these three layers on top of HealthcareBert one by one and computed the coherence score through the CMT model.However, the addition of mean pooling layer gives a higher coherence score so we added this layer on top of our HealthcareBert.(See algorithm-1)

Topic Models and CombinedTM
Latent Dirichlet allocation (LDA) is an important and widely used probabilistic topic model.It is based on a generative process denoted by the equation as follows: Since Dirichlet prior is not a location-scale family, to solve this issue in ProdLDA the decoder network is used to approximate the Dirichlet prior p(h|a) with a logistic normal distribution given by the equation where µ and ∑ represents the outputs of the encoder network as follows: The encoder network has some disadvantages that it is stuck in a bad local optimum this problem is addressed using Adam optimizer, batch normalization, and dropout units in the encoder network.Another difference between LDA and ProdLDA is that b is unnormalized and w n is defined as: Revealing the Trends in the Academic Landscape of the Health Care System Using Contextual Topic Modeling ( ) CombinedTM (CTM) is a contextualized topic model inspired by ProdLDA even though both of the models use the same hyperparameters.The original CTM model uses SBERT features in combination with the Bow (Bag of words).In the current study, we replaced SBERT with a novel Healthcare BERT.We fetched and processed the data set using Panda's data frame.The CombinedTM model is employed that integrates [66] contextualized embeddings and a bag of words model.Finally, contextual topics were generated through CombinedTM, and the sentence embedding vectors were constructed using the improved healthcare Bert we have developed in the current study as shown in (Fig. 1).

Model Evaluation
For the assessment of the validity of our approach, we design the following criteria (i) quantitative assessment (Coherence score) and (ii) qualitative assessment (highly cited literature).

(i) Quantitative assessment (Coherence Score)
The selection of an optimal model from the list of generated models is a critical task.Human comprehension depends on the concept of semantic context, and the coherence method makes an attempt to determine the context between words in a topic.Maximising the coherence score is crucial because it provides subjects that are easier for humans to understand.This context cannot be captured by the other matrices (such as perplexity).As a result, we assess the model's performance using the coherence score [67].We

Revealing the Trends in the Academic Landscape of the Health Care System Using Contextual Topic Modeling
generated various models with no. of topics k in the range of (10 to 100 with the step of 5) employing healthcare Bert with CombinedTM and plot the coherence score as (Coherence Cv) corresponding to each model in both cases.We choose the CombinedTM model that gives the maximum value of coherence Cv among the lists of generated models.
( , ) ( ) ( , , ) For a thematic understanding of the discovered topics, we also adopted a qualitative approach to crossvalidate the topic's themes.In this method, we summarize the highly cited literature in section (2.1) which facilitates the general assessment of topic themes.

Topic Trends and Popularity Measurement
In this study, every topic's trends are examined, and the posterior distribution is linked to the year that each document was published.Each document gives a certain topic to it that best represents its likelihood at that current time.By dividing the total number of papers each year by the number of papers assigned to this topic, the total number of papers was normalized to determine the topic proportion for each topic each year.Following that, the cumulative proportion was used to calculate the overall popularity.Additionally, the Mann-Kendall trend test (M-K-test) is used to look for enduring upward or downward patterns in the data gathered over time.It is a non-parametric trend test method that examines discrepancies between earlier and later data points and is applicable to all distributions.It denotes that when a trend is present, sign values consistently tend to rise or fall [68].Each topic's rising and declining patterns were recorded using the Mann-Kendall test.

Inter-Topic Distances and Topic Correlations
We used LDAvis, a visualizing tool for topic models, to aid interpretations of the contextualized topics.This package plots the topics on a two-dimensional plane which gives us inter-topic distances, topic clusters, and resulting correlations.

RESULTS AND DISCUSSIONS
In this study firstly we develop a transformer-based deep learning model to enhance the semantic understanding of the topics.Google created various transformer-based machine learning methods for pretraining natural language processing which required large computational resources.So, we search first for the most suitable transformer-based model existing on the hugging face (fspanda-Medical-Bio-BERT2) that can provide word embeddings for our data set.To improve these word embeddings, we tuned this model further using Google GPU and Colab Notebook on the local corpus.Further, we added a mean pooling layer after corpus tuning.In this way, we generated a novel model as Healthcare Bert.These word embeddings

Revealing the Trends in the Academic Landscape of the Health Care System Using Contextual Topic Modeling
are provided as context vectors to the CTM model that combine a bag of words and word embeddings to generate contextual topics of the corpus.The newly generated word embeddings also improve the topic's coherence score of the CTM models.CTM generates a document term matrix (DTM) which we further investigated using various techniques of statistics to classify the topics into hot and cold categories.Five subheadings-(i) models' evaluation/performance analysis, (ii) contextual topics, (iii) classification of topic trends, and (v) inter-topic distances topic clusters, and correlations are used to organize the study's findings.

Models' Evaluation/Performance Analysis
We programmed coherence matric (C_v) to compare the proposed model (CMT) with LSA, LDA, and Ber Topic.Table .1 shows the values of the coherence score (C_v) and no. of topics (K) applying our healthcare Bert with CTM and Ber Topic.We also computed the coherence score of other state-of-the-art methods (LSA and LDA) on our data set without our healthcare Bert.All values of the coherence score computed using different models are analyzed and the CTM model generated using our novel healthcare Bert with k=60 gives a clear maxima value and is chosen as an optimal model to report the contextualized topics as shown in (See Table 1).

Context-Based Topics
This section lists the contextualized topics that our innovative healthcare Bert and the CTM model uncovered.The terms of each topic are supplied in (Appendix A) and explained in (See Table 2).Our contextual topic modeling approach generated 60 topics.Among these topics, 15 showed significant increasing trends.These topics mainly deal with smart medical monitoring systems (Topic-42), different health pandemics and the importance of vaccination to control them (Topic-22), causes and effects of stress and anxiety (Topic-30), and health care services generally (Topic-08).Hot topics also include healthcare cost estimations (Topic-09), patient treatment risks (Topic-13), research on designing healthcare policy (Topic-47), 95 ci hospital risks (Topic-12), patient care quality (Topic-44), global financing for public health (Topic-34), 95 ci testing (Topic-02), surgery complications (Topic-38), health care research reviews (Topic-23),95 ci for patient disparities and transplantation (Topic-10 and Topic-01).Twelve topics showed significant

Revealing the Trends in the Academic Landscape of the Health Care System Using Contextual Topic Modeling
decreasing trends.These topics covering care quality and management (Topic-45), Asthma and costs (Topic-28 &Topic-31), Hygiene compliance (Topic-16), Telemedicine technology (Topic-54), glucose control intervention (Topic-17), Antimicrobial compliance (Topic-11), Home intervention for patient care (Topic-14), Cost gained strategy (Topic-04), Physical and emotional effects on quality of life (QOL) (Topic-55), the role of nursing as a healthcare professional (Topic-39) and Resistance in MRSA isolation and transmission (Topic-07).
For a profound understanding of the discovered topics, we also elaborated on the top-cited articles of the corpus in section (2.1) that can cross-validate the themes of the mostly contextual topics.Topic 42, Topic 51, Topic 37, Topic 54, and Topic 35, for instance, can be supported in light of "Paragraph 2." (par.2).Both sections disused the applications of technology for healthcare.Another example would be the topics (Topic-24, Topic-04, Topic-38, Topic-29, Topic-27, Topic-31, Topic-03, Topic-9) are positioned with (par.3).These units mainly focused on cost and other relevant subjects like surgery and infections.Similarly, we can also cross-validate the rest of the topics.Another perspective of this study is that since we already encoded the context of topics in the modeling, the discovered topics are more coherent and self-explanatory (See Table 2).

Revealing the Trends in the Academic Landscape of the Health Care System Using Contextual Topic Modeling
as shown in (See Table 1).The annual distributions of these topics are depicted from the graphs (See Figure 3  and 4).The fifteen topics showed significantly increasing trends, twelve topics showed significantly decreasing trends thirty-three topics do not show significantly increasing or decreasing trends.

Inter-Topic Distances Topic-Clusters and Correlations
We analyzed the inter-topic distance map of the most fitted contextualized topic model with (K=60).Figure 2 gives us three important clusters of the topics.Cluster-1 shows thirteen topics, and cluster-2 is dense and consisted of twenty-three topics.Cluster-3 consists of ten topics.We analyzed the different topics residing in these clusters which give important patterns, and correlations among the topics and various inter-topic research directions as follows: The most significant topics covered under cluster-1 mainly focused on statistical analysis (95-CI) for different aspects of medical care, risk factors, associated cost, and re-hospitalization in severe medical conditions.Cluster-2 is dense and is further composed of various sub-groups.In the first group generalized topics (23,32,36,46,50) narrate the reviews on different areas of healthcare like job satisfaction, drug management, the need for cancer patients, and healthcare quality indicators.

Revealing the Trends in the Academic Landscape of the Health Care System Using Contextual Topic Modeling
The topics in the second group (48,49,53) correlate due to discrimination factors in healthcare generally and sexual substance discrimination in particular.The group-3 topics (4,11,16,21,22,55) point out the pandemic (covid-19), its effects, symptoms, antimicrobial compliance, the importance of hand hygiene, estimation of the cost of vaccination, and the immunity of the people against various infections.The group-4 (topics 6,14,15,18,41) is covering health expenditure and home interventions for many health impairments like breast cancer, migraine, and maternity in different economic situations.It also clarifies the side effects of overwork, especially in women.Group 5 (topics 3,9,17,30) focuses on cost estimation and patient treatment in different diseases, especially in glucose control, drugs, and insulin.Additionally, it highlights the effects of inappropriate audit attendance cost estimation.
Cluster-3 (topics 20,26,34,35,39,40,42,45,47,52) elaborates on the significance of healthcare education, nursing, geographic organizational management of finance in the healthcare sector, uses, and application of smart medical appliances in healthcare units.It points out the discomforts of women's stigma ethically.This cluster also shows its findings on research in healthcare policies, care, quality, and clinical management, provision of primary healthcare services, and technology adoption in the treatment process.

CONCLUSION
In this study, we developed a context-based topic modeling approach that uses a transformer-based deep learning model to enhance the semantic understanding of the topics.We developed a novel Healthcare Bert to provide word embeddings as context vectors.These embeddings are combined with a bag of words to generate contextual topics of the corpus in context-based topic modeling.The newly generated word embeddings also improve the topic's coherence score.In order to categorize the themes into hot and cold categories, we further explored the document term matrix (DTM), which CTM generates using statical analysis techniques.This study also sheds light on the correlation between the topics by plotting them on a two-dimensional plane with a visualization tool.In this way, various interesting topic patterns and intertopic research directions are pointed out.By generating rich sentence embedding vectors of the corpus under study using transformers-based models, corpus tuning, mean pooling, and the hugging face tool, this research broadens the existing topic modeling approaches which do not include contextual information in the topic extraction process.Moreover, it improves the previous topic modeling methodologies that analyze the healthcare literature from a specific disciplinary viewpoint.This process has several restrictions and things to think about.We can look at online databases like Web of Science and PubMed when choosing a data source, however, the current study solely takes into account papers that are indexed in Scopus.The current study adds context information to the topics and further gives clustering and topic correlations analysis, in future studies we may add hierarchical semantic modeling and temporal perspective in this direction.

Revealing the Trends in the Academic Landscape of the Health Care System Using Contextual Topic Modeling
Saheb et al. (2022) propose a context-based topic modeling approach.It integrates the general Bert model LDA and K-means clustering to contextually analyze research articles.

Figure 2 .
Figure 2. Inter-topic distance and topic clusters.

Table 1 .
Performance Analysis of LSA, LDA, Ber Topic, and Our Method.