Preface
Exciting times are ahead of us. Instantaneous communication made possible by contemporary technology is propitiating more cross-disciplinary interactions than ever, as well as accelerating those within each field. And it was high time, because it is becoming clearer that some disciplines simply need to join forces with others.
Biology, one of our disciplines in focus, is a case in point: an unprecedented volume growth of biological data over the past few years (most notably, human language text produced in the form of articles, books, web sites, etc., and genetic code text in nucleic acid language, such as DNA sequences) has created formidable challenges for their timely and interrelated processing. Biologists' traditional methods for processing and making sense of such information can no longer keep up with the exponentially growing information tsunami. Artificial Intelligence methods are coming to the rescue, in particular those for knowledge retrieval and analysis, computational linguistics and natural language processing. In the process, AI methods are specializing into biologically oriented methods, and influencing computing sciences methods in turn, in a full circle dance.
Historically, cross-disciplinary interaction between our three vertices (Biology, Computing and Language Processing) has been a naturally occurring phenomenon in many cases: the Prolog language evolved from a computational linguistics formalism (Alain Colmerauer's Q-systems); the memoing technique underneath tableaux-oriented computing platforms is intimately related to chart parsing in language processing; logic grammars were extensively used around the world to help find the human genome; statistical computing is permeating computational linguistics… However these interactions are mostly two-way ones, partly because combining two disciplines is challenging enough already: one must acquire at least a familiarity with the other discipline's jargon in order to communicate unambiguously and effectively with collaborators in that discipline; the usual methods in each discipline and the ways results are evaluated might differ and even the conventions by which publications are viewed may differ greatly, e.g. in some disciplines a publication's main author is typically listed first, in others, last.
The fact remains that for some disciplines, reaching out to others is no longer a luxury but a necessity. It is our belief that once two-way interactions develop enough to constitute a solid basis for further cross-fertilizations, some now incipient multiple-discipline endeavors will become more prevalent. For instance, as soon as knowledge extraction from text – now largely dependent on keyword and corpus-or domain-oriented restrictions – matures enough to allow for arbitrary text to be reliably transformed into usably coded knowledge that can be later consulted, the field of web text mining will be able to interact with many fields that desperately need such abilities. This will include those that already draw from two different disciplines, e.g. computational life sciences.
Just as our root discipline – Philosophy – needed to slowly separate into a myriad of disciplines and sub-disciplines in order to develop methods specific to each, to achieve depth, etc., we believe we are now at a time in which an inverse process of integration needs to happen: in specializing, some disciplines have become unnaturally disconnected from others or from the whole, with the result that the broad view of the forest is sometimes lost, and that parallels that could be exploited cannot even be seen. A reconnection from the more mature present standpoints of these different branches seems in order and in any case, is simply happening.
It is with this integration in mind that we introduce in the present volume a series of essays on biology, computation and linguistics, as a first effort to promote a level of granularity among the three where their connectedness becomes more apparent, and from which these single disciplines that relate naturally but have become far apart can fruitfully reconnect from their present degrees of specialization.
Thus, incidences of biology upon computing sciences are examined from the point of view of using the structure of DNA molecules as computational tools, and of biologically-inspired computational models. Computational tools such as tailored parsing methodologies and web querying, probabilistic extended regular expressions and matrix insertion-deletion systems are applied to health sciences/biological tasks such as de-identifying medical records, modeling repeats in DNA, modeling intermolecular structures and defining ambiguity in gene sequences.
Biology and linguistics are joined from the point of view of finding isomorphisms between genetic code and verbal language, and of how childhood dialects can be seen as modeling computers. Three-way interdisciplinary work includes the use of a computational formalism- concept formation- for mining both linguistic and biological texts, and a three-way analysis of dependency crossing, from the points of view of language, biology and satisfiability.
The influences of computing science upon language are represented by work on lower bounds for asymmetrical insertion-deletion languages, on a computational model for linguistic complexity, on an enumerative speculation on possible languages, and on computational realizations of vowel-consonant speech segmentation by neuromorphic units. Finally, the language of acyclic recursions is applied to linguistic descriptions, and cognitive architectures for multi-agent systems are explored from a computational point of view.
With this work we hope to stimulate further research on these emerging themes and their three-way relationships, and to propitiate further interconnectedness in general between the humanistic and the formal sciences, for a less dichotomized, more integrated world.
Tarragona, March 2011
Gemma Bel-Enguix, Veronica Dahl, M. Dolores Jiménez-López