The Artificial Immune Systems Domain : Identifying Progress and Main Contributors Using Publication and Co-Authorship Analyses

Much can be learned about the progress, fathers and future of a scientific domain from the analysis of a collection of relevant articles and their corresponding authors. Here, we study the highly interdisciplinary domain of Artificial Immune System (AIS) since its birth, a couple of decades ago. We apply Social Network Analysis to the coauthorship network of the most comprehensive publicly accessible AIS bibliography. We automatically extract publication dates and author names from the bibliography and evaluate authors with the highest degree (unique collaborations) and centrality (influence). Our results highlight the relative growth of publication volume and identify significant contributors in the AIS field. Furthermore, our findings are not only encouraging for the AIS community but may be useful for analyses of other scientific communities and leading contributors therein.


Introduction
Artificial Immune Systems (AIS) are adaptive systems, inspired by theories and observed principles of the immune system, and applied towards solving computational problems [1].Common AIS techniques are based on specific theoretical models explaining the behavior of the vertebrate adaptive immune system such as negative selection, clonal selection, immune networks and dendritic cells [2].AIS are mainly classified into two categories: The first one aims at mathematically modeling the immune system to better understand its behavior.The second one uses the immune system as a metaphorical inspiration to engineer algorithms that are capable of learning and solving a huge variety of machine learning problems such as classification, clustering and regression analysis.

ECAL 2013 1206
DOI: http://dx.doi.org/10.7551/978-0-262-31709-2-ch185 AIS is a relatively new field which began in the mid 80's with the modeling and refinement of Jerne's immune network theory [3] by Farmer et al. [4], and later by Bersini and Varela [5,6].However, it wasn't until negative selection was used by Forrest [7] for protecting computer networks from viruses that the AIS domain was established.Cooke and Hunt [8,9] adapted immune networks for classification and Timmis [10] further improved it while De Castro et al [11,12] worked on aiNet for multimodal function optimization and data analysis.The first book on AIS was edited by Dasgupta in 1998 [13].
In the past few years, several review papers have discussed the slow advances in AIS and proposed improvement strategies through novel and simpler AIS models (inspired by the vertebrate innate immune systems and immune systems of plants) as well as the development of a unified architecture for integration of existing models [14,2,15,16].Timmis argues that AIS has reached an impasse [2] and Timmis et al pointed out a dearth of theory to justify the use and continuity of AIS [17] .
The domain of AIS has existed for a couple of decades but has never been analyzed quantitatively.Furthermore, the fear that the domain is stagnating was only based on qualitative studies targeting specific AIS models and frameworks, such as negative selection, clonal selection and more recently, the danger theory.
In this paper we use techniques from co-authorship network analysis and statistical methods in order to investigate the current state and future of the domain of AIS.Moreover, we identify leading contributors to the field using coauthorship network analysis and discuss our results.In the following section, we discuss the methods used for information extraction and statistical analysis, in addition to social network analysis of the co-authorship network.In section 3, we illustrate and discuss the growth of the AIS domain and compare it to the general growth of Medline indexed articles in general.Finally, in section 3.3, we discuss our results and the future directions of the field.

Methods
We adopt data mining techniques in order to extract author names and publication dates from the most comprehensive AIS bibliography [18] compromising 1044 articles and 994 authors.We only had access to a binary PDF format of the bibliography and therefore we had to convert it into an ASCII textual format.The converted text was far from the desired clean structured text and therefore a significant amount of manual curation was involved in order to systematically mark the end of the authors' list before automating the extraction of author names.
Publication dates are limited to the year of publication consisting of a 4-digit number Y Y Y Y where 1980 < Y Y Y Y < 2010 in order to avoid confusion with other numbers such as digital object identifiers (doi), volume numbers and serial numbers.More challenging was the task of author name extraction with some authors having their first names abbreviated others having full first names and others not respecting the first-name-last-name order.We manually restructured Late Breaking Papers

1207
ECAL 2013 the bibliography, ensuring that last names were preceded by abbreviated or full names.We prepended shared last names with the abbreviated first names to avoid the risk of agglomerating authors with a common last name.For example "X Lee" and "Y Lee" are considered two distinct authors and represented as: X-Lee and Y-Lee, respectively.Still, we face the possibility that two different authors would be counted as one if they share both lastnames and firstnames (at least initial).However, that is outside the scope of our analysis.

Social Network Analysis
Social Network Analysis (SNA) has played major roles in many disciplines in the past few years [19,20].Co-authorship Networks (CN) are social networks consisting of scientific collaborations and collaborators [21].In CN, the authors are represented as nodes (or vertices) and collaborations as undirected edges.CN are similar to citation networks in the scientific literature [22].However, CN have better social and collaborative implications [23].CN Analysis has already been studied and applied to a couple of fields but never to AIS [23][24][25].Indeed, studies on co-authorship analysis have shown how visible and influential can article be [26].Other studies have focused on examining academic research performance based on a co-authorship network of centrality and gender [27].
We use several existing methods to analyze a CN of AIS as follows: We used the R package [28] to calculate the degree, closeness, and betweenness centrality for the binary undirected co-authorship network.In the following sections, we illustrate and discuss the 20 highest ranking authors for each of the following metrics: Degree is a measure of the unique number of collaborators an author has.Closeness, that is only applied to the largest (connected) component, is a measure of how authors are directly connected to a well-connected author.
Betweenness is a measure of a node's influence for information flow in the network.Betweenness measures how many times a node is visited when two random nodes are connected through a path of nodes.
For more information about centrality measures and SNA, please refer to [20].

PageRank
PageRank [29], or eigenvector centrality, is used by the Google search engine to determine a page's relevance or importance.Important pages receive a higher PageRank and are more likely to appear at the top of the search results.PageRank is based on backlinks.The more quality backlinks the higher google pagerank.
Liu et al [23] have applied PageRank to a co-authorship network of the Digital Science domain in order to identify prestigious authors.They transformed each undirectional edge into a set of bi-directional, symmetrical edges.

AuthorRank
Liu et al [23] also define a modification of PageRank, that they call AuthorRank.PageRank assumes that when a node A connects to n other nodes, it receives a weight based on an equal fraction 1 n , whereas AuthorRank attributes different weights for each author based on the number of their publications in common ibid.
Both PageRank and AuthorRank are measures of prestige that we use to identify leading authors in the domain of AIS as discussed in the following section.
We used the R package [28] to implement PageRank and AuthorRank to rank the top ranking authors and to visualize the co-authorship network using a Fruchterman Reingold Layout.

Publication Distribution Analysis
In order to answer the numerous doubts about a decline in the AIS field, we have measured the number of publications that are relevant to the field over time.Furthermore, we have fitted our observation using exponential functions to study the growth of publication size over time and to predict it over the years to come.We have used the coefficient of determination R 2 in order to validate the fitness function and its prediction as shown in figure 1. Contrary to previous fears of a stagnating AIS field, we have shown that the field of AIS is ever growing by measuring the number of publications over time.Indeed, we have used exponential fitness functions in the form of f (x) = c.e ( x.b), where c is a constant, x is the year index and b is the growth factor.We have compared the growth in the volume of AIS articles according to our AIS bibliography (b = 0.2) to that of indexed Pubmed articles (b = 0.03) between the years of 1984 and 20081 using Corlan's Medline trend (http://dan.corlan.net/medline-trend.html) to cement our conclusion regarding the relative expansion of the AIS domain.Furthermore, we add to our perspective the number of articles returned by Google Scholar for the query "Artificial Immune Systems" which results in a faster growing trend (b = 0.4) as shown in green in figure 1. Dasgupta's Bibliography includes conference proceedings, as can be found in Google Scholar, whereas PubMed indexes journal articles only.However, we argue that in the fields of informatics and engineering, conference and workshop publications typically have higher impact In addition, we have analyzed the number of authors per publication that is an indication of collaboration strength.As shown in figure 2, most collaborations include two authors.Moreover, there are more 3 co-authored publications than (and almost as many 4 co-authored publications as) single authored ones.We presume that this may be as a result of the field of AIS being a very collaborative one.Indeed, a highly interdisciplinary field such as AIS invites collaborations from the fields of immunology, systems biology, artificial intelligence, machine learning, data mining...etc.Similar studies have been conducted on the field of the Digital Library Research Community [23] yielding similar results with 28.5% of papers authored by 2 authors, 23.6% by 3 authors, 19.6% by a single author and 9.4% by 4 authors.

Co-Authorship Network Analysis
The co-authorship network is summarized in table 1 and visualized in figure 3. The number of publications per author may be a good indicator of an author's contribution to the field however that can be biased in favor of non-collaborative authors with many published articles.The author's degree, or number of unique collaborators, can be a better measure of an author's collaborative efforts in a field.Degree centrality measures authors' connectivity with immediate neighbors or collaborators.Some authors may, however, be locally well connected but not globally with the entire network.Closeness centrality expands on degree centrality and favors authors that are connected (directly or indirectly) to as many authors in the network.Betweenness is another measure of centrality that measures how often a node is on a shortest path between any two random nodes in the network.Betweenness conveys the role of an author as an information spreader or a hub.The authors with the highest number of publications, degree, betweenness and closeness are listed in table 2. Authors with the highest degree, betweenness and closeness measures are illustrated in figure 4.

Network
Several studies have analysed Co-authorship Network (CN) components for various scientific domains.Nascimento [24] reports the largest component in SIGMODs 2 CN having about 60% of all authors.Newman [25] has studied several CN with the smallest "largest component", containing 57.2% of all authors.Liu et al [23] report in the JCDL 3 CN the largest component of only 38% (599 authors) of all authors for the years between 1994 and 2003.The low percentage may be due to a relative immaturity of the Digital Library (DL) field.The AIS CN has the largest component of 55% (550 authors) of all authors, thus showing a relative maturity of the field.We suspect that the maturity of AIS is related to 10 years of the dedicated conference, ICARIS, since 2002 and 20 years since the beginning of AIS, i.e. twice as longer than the DL field.The largest component of 550 authors is visualized in black at the center of figure 3, whereas the remaining components alternate in various colors around it.Moreover, table 3 2 SIGMOD is a Special Interest Group on Management Of Data under the Association of Machinery, ACM   Table 2 distinguishes Jon Timmis as a top contributor and collaborator in this field, however, we are also interested in identifying other contributors in the AIS domain like the founding fathers such as Forrest and De Castro.In table 2, we identify authors with significantly high centrality (top 20 in degree and either betweenness or closeness 4 ) such as Nicosia, A.Freitas in the AIS CN regardless of their relatively lower number of publications (not in the top 20 in number of publications).Conversely, authors such as Neal and Stibor rank amongst the top 20 for their number of publications but do not rank as highly Finally, we are interested in understanding large network components that are disconnected from the largest component such as those led by Fukuda, Kara, Watanabe and Coello.We presume that separation is due to language and geographic barriers but we hope to have a more integrated network or more methodological explanations about this segragation in the near future.We are as well interested in understanding cluster propoerties of the largest component mainly led by team leaders, namely, Forrest, Timmis, Hart, Dasgupta and Tarakanov.

Conclusion and Future Directions
Several reviews have discussed advances in the field of Artificial Immune Systems but all from a qualitative perspective.Recent reviews have alluded to a stagnation in the field of AIS.In this work, we investigated these questions from a quantitative perspective.Our results have shown that the field has been growing ever since it was established for the past couple of decades.In addition, we have identified leading contributors by co-authorship network analysis based on AIS-relevant publications.
We acknowledge that the bibliography may be biased towards an engineering perspective as it is maintained by a computer scientist thus dismissing fundamental contribution from the theoretical immunology perspective.However, our method can be applied to any bibliography (structured or unstructured) in any scientific domain.Hence, we expect our analytical method not only to motivate the AIS community and encourage external scientists to entertain the challenges presented by AIS, but also to be a benchmark for scientific domain analyses.

Fig. 1 .
Fig. 1.Number of Publications per year 1) according to the AIS bibliography [blue], 2) according to google scholar results for "Artificial Immune Systems" [green] and 3) for all indexed Pubmed articles [red] per year.The results show a relative growth in the domain based on both AIS bibliography and Google Scholar results.The exponential fits are validated using the R 2 coefficient.

Fig. 2 .
Fig. 2. The distribution of publications per authors forms a gaussian distribution around two authors.

Fig. 3 .
Fig. 3.A visualization of the AIS co-authorship network with nodes representing authors and edges representing collaborations (with at least one co-authored article).Each component is represented in a different color.In particular, the largest component (in black) contains 55% of all authors.

Fig. 4 .
Fig. 4. Alphabetically sorted list of authors with the highest degree, betweenness and closeness

Table 1 .
Co-Authorship network summary

3
Joint Conference on Digital Library

Table 2 .
AIS authors ranked according to their number of publications, degree (or number of unique collaborators), betweenness and closeness.

Table 3 .
AIS Co-authorship Network component sizes and frequencies with the largest component boldened lists the component sizes and frequencies of the AIS CN showing a significant amount of smaller components.This is suggestive of many existing AIS subcommunities, that collaborate individually but can eventually collaborate with other sub-communities .