Knowledge Representation and Reasoning for Complex Time Expression in Clinical Text

Abstract Temporal information is pervasive and crucial in medical records and other clinical text, as it formulates the development process of medical conditions and is vital for clinical decision making. However, providing a holistic knowledge representation and reasoning framework for various time expressions in the clinical text is challenging. In order to capture complex temporal semantics in clinical text, we propose a novel Clinical Time Ontology (CTO) as an extension from OWL framework. More specifically, we identified eight time-related problems in clinical text and created 11 core temporal classes to conceptualize the fuzzy time, cyclic time, irregular time, negations and other complex aspects of clinical time. Then, we extended Allen's and TEO's temporal relations and defined the relation concept description between complex and simple time. Simultaneously, we provided a formulaic and graphical presentation of complex time and complex time relationships. We carried out empirical study on the expressiveness and usability of CTO using real-world healthcare datasets. Finally, experiment results demonstrate that CTO could faithfully represent and reason over 93% of the temporal expressions, and it can cover a wider range of time-related classes in clinical domain.


INTRODUCTION
Time-related information is essential part of in-patient medical records, doctor prescriptions and other clinical text.It describes the sequence of symptoms and treatments received by patients and is crucial in clinical decision-making and medical research [1,2].However, natural language expressions about time and temporal relation concepts are diverse and can vary with different contexts and writers.Current research on temporal knowledge representation in text mainly focused on news or open domain [3,4,5].However, due to the distinct features in the medical domain, the research results from the news domain or open domain cannot be applied directly to the medical domain [6].
Temporal knowledge representation is challenging and has been discussed extensively.Generally, OWL-Time [7] can be used to represent instant and interval, and possibly Allen's 13 temporal relations [8].Reasoning engine CHRONOS [9] can be used for relation inference, inconsistency detection and path consistency check.However, these ontologies do not cover some important features, such as irregular, granular time, and modality.These ontologies designed for general domains are not adequate enough for specific domains such as healthcare [10].To fill the gap between general tools and the clinical domain, Tao et al. [11] proposed temporal ontology CNTRO.After that, Li et al. [12] extended it and proposed the TEO ontology.They leveraged a Java-based TEO reasoner to realize complex timeline reasoning.We can use it to represent granular time, instant, interval and incomplete time.However, there is also some uncertain and irregular time in the clinical domain, such as "take pills around 6 a.m.," and "The patient received rehabilitation training twice a day last year, and once every 2 days this year."The former describes an uncertain time.There is no precise time point for the occurrence of an event, but there is still a rough range.For example, "4 p.m." is not in the range.The latter one describes an irregular time, and it represents the phased change of the event.
Another challenge is the representation of inference queries with complex time.For example, if we have known that "Patient A had surgery yesterday and was advised by his doctor to do 20 minutes of rehab around 9:00 am every morning," then for the question, "What should Patient A do at 9:05 am this morning?"Humans can quickly answer it.Nevertheless, all of those ontologies mentioned above cannot get the answer.Based on this, we proposed Clinical Time Ontology (CTO), which aims to represent the complex temporal information in clinical texts.

Knowledge Representation and Reasoning for Complex Time Expression in Clinical Text
Representation of complex time (e.g., fuzzy time, irregular collection time, etc.), subjectivity, negation, and complex temporal relationships is significant challenge for temporal ontologies.Based on this, we propose Clinical Time Ontology (CTO), which aims to achieve inference and query complex temporal information problems and mine more relationships between them.

TEMPORAL EXPRESSIONS IN CLINICAL TEXT
Temporal information is pervasive in clinical data.However, it is nontrivial to capture temporal information in unstructured text from clinical data, which is pivotal to healthcare decision making.For example, a patient's self-description and previous medical history can be obtained from the medical record data.It could take the form of a paragraph in natural language, and healthcare providers need to digest this paragraph to understand the chronological events it implies, so as to make a diagnosis or treatment plan.Formulating a semantic framework for time-related expressions in clinical text is a first step towards automated reasoning and decision support.In this section, we first categorize the time-related expressions in typical clinical text, and then we provide a set of key time concepts.

Complicated Expressions of Time in Clinical Text
We refer to TIDES [16] for a basic classification of temporal expressions, then we further refine the categorization of temporal expressions in the clinical domain based on our observation over 3.5K anonymized real-world clinical records.We identified different types of time descriptions from the clinical text according to different temporal aspects, including completeness, accuracy, repetition, subjectivity, etc. Regarding to the completeness, there can be complete or incomplete time; to the accuracy, there can be accurate or fuzzy time; to the repetition, there can be cyclic or irregular time.And based on this, we divided time into 5 types (e.g., absolute time, lack information time, fuzzy time, cyclic time, and irregular time).And then, we also obtained three qualitative questions, such as questions of subjectivity (e.g., whether it is worth believing), questions of temporal relationships (e.g., before, after, etc.), and questions of negation temporal relationships (e.g., not before, not after, etc.).In general, we summarized eight problems, which are shown in Table 1.
"Absolute time" is a point or duration that can be represented separately on the time axis; "Lack information time" means the sentence which describes the time information about the event is incomplete, and it includes two subcategories, which are "relative time" and "incomplete time."The former means that we need to find the reference time and get the real time by calculating.The latter is commonly used to describe a time interval but without complete information (e.g., lack start time or end time); "Fuzzy time" indicates that there exist approximate and vague modifiers in sentence, and it may be an instant or an interval; "Cyclic time" means that the event is repeated and the occurrence is regular, and it may be a collection of multiple instants or intervals; "Irregular time" means that the event is repeated, but not regular, and its items may be instants or intervals or collections.

Time Types
The temporal expressions listed in Table 1 can be addressed using a combination of syntactic forms and/ or semantic concepts.Syntactically, all temporal expressions can use 1, 2 or more timestamps to denote the relevant instance, interval or repetitions, additionally, temporal concepts and relations can have modifiers to express beliefs or polarity.Semantically, a temporal concept can be absolute, fuzzy, relative, or periodic, etc.Based on the first five problems in Table 1, we categorized the time.Problems 6 and 8 motivate the subjective and negation modifiers.We analyze the combinations of the above syntactical and semantic axis of temporal concepts and relations and organize them in Table 2 with correlations to the temporal expression types listed in Table 1.
"Absolute Instant" is a point in time, with no fuzzy (e.g., about, server) and relative modifiers (e.g., ago, after); "Relative Instant" is a point in time, but with relative modifiers; "Fuzzy Instant" is a point in time, but with fuzzy modifiers; "Absolute Interval" is a duration in time, whose start and end time are "Absolute Instant" instances; "Relative Interval" is a duration in time in which at least one of the start or end time is a "Relative Instant" instance; "Fuzzy Interval" is a duration in time, and the condition is satisfied as long as one of its three components (duration quantity, start time and end time) is with a fuzzy modifier, even though its start time may make it like a "Relative Interval" instance; "Incomplete Interval" is a duration in time with only one of the three components, whether or not the component is with a modifier; chinaXiv:202211.00426v1The time is subjectively described by the patient.
Type 6 "Periodic Instant Collection" is a collection of points in time, which is commonly used to describe the time for the events in cycle, and occurrence time is also a point in time; "Irregular Instant Collection" is a collection of points in time, commonly used to describe the time for events with irregular occurrence time, and the occurrence time is a point in time, and its item can also be some "Periodic Instant Collection" instances; "Periodic Interval Collection" and "Periodic Instant Collection" are more similar, the difference is that the occurrence time of the former is a duration; "Irregular Interval Collection" is similar with "Irregular Instant Collection," but the occurrence time is a duration, and its item can also be some "Periodic Interval Collection" instances.
We divided the time appearing in the clinical field into the 11 main types mentioned above.And there is no intersection between them.For negation and subjective modifiers, we treat them as a property of temporal instances.

CTO ONTOLOGY
We developed the Clinical Time Ontology (CTO) for semantic representation and reasoning over the temporal aspects discussed in Section 3. In this section, we elaborate on its module design.

Overview
Temporal concepts cannot be utilized alone, so we need to bind them with the concept of Events.Therefore, we defined the class Event to represent clinical events.It has sub-classes such as HospitalizationEvent, DischargeEvent, etc.We can extend the list of sub-classes upon requirements.
According to Table 2, we can obtain 11 classes, such as AbsoluteInstant, RelativeInstant, FuzzyInstant, AbsoluteInterval, etc.They are both the sub-class of Time.Class Event and class Time can be linked by object property hasTime.We refer to the OWL-Time and TEO ontologies when designing our ontology.We have some common and different concepts.For example, class AbsoluteInstant is the same as Instant in OWL-Time, class TimeQuantity is the same as Duration in TEO.However, class IrregularInstanCollection does not exist in their ontologies.Figure 1 shows the taxonomy of CTO and the relationship with OWL-Time and TEO (using sameAs).

Module Design
We divided time into four modules, which are Instant, Interval, InstantCollection, and IntervalCollection.Figure 1 illustrates this, and more details can be obtained as followed: chinaXiv:202211.00426v1

Instant
Class Instant has three subclasses: AbsoluteInstant, RelativeInstant, and FuzzyInstant, they are respectively designed for absolute instant time, relative instance time, and fuzzy instance time.There exist a number of properties that describe time at a granular level, such as second, minute, hour.Class TimeUnit is designed for representing time's precision, which means the smallest granularity unit of time.
Object property prefixAnchor is designed to give a qualitative representation of the relationship between the moment and its reference moment.Object property relativeQuantity can quantitatively represent the length of difference.For example, "Hospitalized on July 1, 2021, and started showing significant symptoms three days ago.""Three days ago" is a relative time description, we can set it to <relativeInstant1>."July 1, 2021" is an AbsoluteInstant instance, and it is the prefixAnchor of <relativeInstant1>.And the relativeQuantity of <relativeInstant1> is "3 days." Figure 2 illustrates this case.For fuzzy time, object property leftAnchor can indicate its earliest possible occurrence time, and rightAnchor can indicate the latest one.For example, "Early January" is a fuzzy description, and we can set its leftAnchor to <absoluteInstant1>, which for "1, January," and set its rightAnchor to <absoluteInstant2>, which for "10, January." Figure 3 illustrates this case.

Interval
Class Interval has four subclasses: AbsoluteInterval, RelativeInterval, FuzzyInterval, and IncompleteInterval.The former two both have object property startTime, endTime, and durationQuantity.The differences between them are that their startTime's and endTime's ranges are not the same class.
Object property minDurationQuantity and object property maxDurationQuantity are designed for fuzzy interval, and they indicate the minimum possible duration and the maximum possible duration respectively.For example, "it started on 1, January, and lasted for about half a month." Figure 4 illustrates this case.Class IncompleteInterval is a special case of Interval, which just has startTime or endTime and does not have durationQuantity.

InstantCollection and IntervalCollection
The temporal description of medication and check-up events in clinical medical texts is cyclical.For example, "From today, review daily at 2:00 pm for five days."In addition, some irregular time collections still exist in the medical texts, such as 'The patient received rehabilitation training twice a day last year, and once every 2 days this year."Each occurrence may be instant, interval or collection.Therefore We can implement the representation of irregular time in two steps as follows: 1) create some Time instances; 2) connect them using object property belongTo or hasSubset and data property itemIndex.The Time instance can be not only Instant or Interval, but also InstantCollection or IntervalCollection.We can set the second sentence mentioned above to <irregularInstantCollection1>.There are two stages, and both of them are cyclic time.They can be represented by two instances of PeriodicInstantCollection: <periodicinstantCollection1> and <periodicInstantCollection2>.And then, use object property hasSubset to connect them.Figure 6 illustrates this case.

Subjective Time
In clinical texts, it is widespread that patients describe the occurrence time of events by themselves.In this case, the time in the text is subjective and maybe not the real-time of the event.

Knowledge Representation and Reasoning for Complex Time Expression in Clinical Text
a subjective description of the patient or an objective record so that the physician can make some judgments about that time information.F igure 7 illustrates this.

Temporal Relationship
Besides aforementioned cases, there are also some cases that indicate the temporal relationship between the events, e.g., "receiving treatment B is earlier than taking medicine A." Allen's 13 temporal relationships can be used to represent partial situations, however, they cannot cover all cases, such as point-to-collection, duration-to-collection, and collection-to-collection situations.Therefore, to fill the gap, we extended them by additions, such as containAll, containNull, and their inverse properties.Figure 8 illustrates a sample.

Negation
The application of chronological relations sometimes appears with his negation.For example, "taking medicine A is not earlier than the end of treatment B." In the medical record data, as long as we cannot know or infer that "taking medicine A" before "treatment B," we can assume that A is not before B. And based on the closed world assumption (CWA), we designed the algorithm not, which can be used to represent the negation of a temporal relationship, such as not before, not after, and not overlap, etc.In Table 3, we used 3 auxiliary functions: AllItems(), CalTime() and StartTime().The semantics of these functions are provided in Appendix A.

Negation Temporal Relation Definition
If the reasoning machine can infer that there exists a temporal relation r1 between time A and time B, but cannot conclude that there exists temporal relation r2, then according to CWA, there is a negation relation not r2 between the two times.For example, if the reasoner cannot reason out <time1> before <time2>, <time1> not before <time2> can be concluded.All of the temporal relationships can get their negations.Table 4

EMPIRICAL STUDY
In order to verify the ability of the CTO ontology to express time information in the clinical field, we selected a set of case texts of people with a mental health condition.We used an anonymized realistic dataset from a hospital in China.We randomly selected 300 patients, which contained approximately 3000 Chinese statements.And then, we arranged annotators to annotate the time information and conducted a quantitative analysis using the Inter-Annotator Agreement (IAA) [18] metric.IAA metric is commonly used to evaluate annotation consistency and the difficulty of annotation, usually focusing on the following metrics: Precision, Recall, and F1.At the same time, we analyzed the characteristics of other ontologies.And we also compared and analyzed the expressiveness of CTO with other ontologies in terms of complex time and temporal relations.

Evaluation Results
For IAA, three annotators were asked to annotate time information in 300 patient cases jointly.The evaluation consists of two parts, (1) evaluation of classes and object properties (without temporal relations), and (2) evaluation of temporal relationships.For evaluation of classes and object properties, they first annotated classes, and the annotation methods were discussed together.After that, they discussed and agreed on the Gold Standard.The annotation of object properties was same, and the base data is the Gold Standard.
The mean value of F1 metric of time-related classes annotation were 77.18% and 83.06% (exact mapping and partial mapping).And the average F1 metric of object properties (without temporal relationship object properties) was 87.34%.Considering that complex time information can be represented in various ways, such as the representation of time interval can use startTime/endTime + durationQuantity or startTime + endTime.If such errors are ignored, the average score is 93.61%.
For evaluation of temporal relations, we asked three annotators to annotate the temporal relations between time entities, using the Gold Standard obtained by previous sub-process.We had developed a reasoning machine for the experiment about evaluating of temporal relations, and regarded the results as Gold Standard for the second evaluation task.Note that we treat pairs of tokens that have the inverse property as true positive as well, such as <time1> before <time2> and <time2> after <time1>.The mean value of Precision, Recall and F1 are 96.98%,91.97%, 94.39% respectively.We also calculate the three metrics of each temporal relationship to analyze which te mporal relations are more difficult to express.The results are shown in Figure 10.From Figure 10, it can be seen that it is easy to reach a consensus on the representation of simple relations, such as after, overlap, endBeforeEnd, However, there are difficulties in representing more complex temporal relations, such as containAll, finishPoint, etc.After analysis, it is found that manual annotation will show omissions for this type of data but still maintain a high F1 score, which indicates that the CTO ontology can also achieve better results in the representation of temporal relations.
Concerning the ability of ontologies to cover temporal instances, we found that CTO ontologies are able to cover almost all types of temporal descriptions.As can be seen from the Figure 11 and Figure 12, instances of complex time types (PeriodicInstantCollection, PeriodicIntervalCollection, IrregularInstantCollection, IrregularIntervalCollection) account for approximately 16.77% (594 of 3542) of all time types, and the extended temporal relationships (containAll, containSome, containNull, startPoint, finishPoint) account f or 21.19% of all temporal relationships (629 of 2969).We provided 2 sample texts from clinical records as well as their relevant knowledge graphs annotated with CTO in Appendix B.
To represent subjectivity, each Time instance can use the data property subjective with True or False value to indicate whether it is chinaXiv:202211.00426v1ChinaXiv合作期刊 Data Intelligence 583

Table 3 .
Reasoning for Complex Time Expression in Clinical Text Description of temporal relation of CTO using ALC(D).

Figure B. 16 .
Figure B.16. Representation for the text shown in FigureB.14 using CTO.

Figure B. 13
Figure B.13 is an example case data, which contains some events (e.g.,hospitalization, discharge, medication, and paroxysm events) and some types of time (e.g.,absolute instant time, absolute interval time, relative time, fuzzy time, and incomplete time).Figure B.15 illustrates the representation.

Figure B. 13
Figure B.13 and Figure B.15 present the time representation for problems 1-3.For problems 4-7, we find some short sentences from the dataset.Figure B.14 shows the corpora, and Figure B.16 illustrates the representation.

chinaXiv:202211.00426v1 ChinaXiv合作期刊 Data Intelligence 577 Knowledge Representation and Reasoning for Complex Time Expression in Clinical TextTable 1 .
Classifi cation and examples of time-related problems in clinical texts.
(Patient-A is known to be hospitalized on July 1, 2021, and Patient-B is known to be hospitalized on July 12, 2021) Who was hospitalized fi rst, Patient A or B? Patient-A 8 Negation Problems Whose operation event is not before Patient-A's?Patient-B, Patient-C

Table 2 .
Time types obtained by combinations of the syntactical and semantic axis of temporal concepts and relations.