Recent experimental and computational techniques yield networks of increased size and sophisticat... more Recent experimental and computational techniques yield networks of increased size and sophistication. The study of these complex cellular networks is emerging as a new challenge in biology. A number of dimensionality reduction techniques for graphs have been developed to cope with complexity of networks. However, it is yet not clear to what extent information is lost or preserved when these techniques are applied to reduce the complexity of large networks. Here we therefore develop a rigorous framework, based on algorithmic information theory, to quantify the capability to preserve information when network motif analysis, graph spectra and sparsification methods respectively, are applied to over twenty different well-established networks. We find that the sparsification method is highly sensitive to deletion of edges leading to significant inconsistencies with respect to the loss of information and that graph spectral methods were the most irregular measure only capturing algebraic information in a condensed fashion but in that process largely lost the information content of the original networks. Importantly, our approach demonstrated that motif analysis approximated the algorithmic information content of a network very well, thus validating and generalizing the remarkable fact that local regularities (subgraphs) preserve information, despite their inherent combinatorial possibilities, to such an extent that information in the algorithmic sense from the networks is fully recoverable across different network superfamilies. Our algorithmic information methodology therefore provides a rigorous framework enabling fundamental assessment and comparison between different methods for reducing the complexity of networks * H.Z. and N.K. contributed equally to this work. 1 arXiv:1504.06249v2 [q-bio.MN] 3 May 2015 while preserving key structures in the networks thereby facilitating the identification of such core processes.
Network biology approaches have over the last decade proven to be very useful for the integration... more Network biology approaches have over the last decade proven to be very useful for the integration and generation of functional hypothesis by providing a context for specific molecular components and processes. Recent experimental and computational techniques yield networks of increased size and sophistication. The study of these complex cellular networks is emerging as a new challenge in biology. A number of dimensionality reduction techniques for graphs have been developed to cope with complexity of networks. However, it is yet not clear to what extent information is lost or preserved when these techniques are applied to reduce the complexity of large networks. Here we therefore develop a rigorous framework, based on algorithmic information theory, to quantify the capability to preserve information when network motif analysis, graph spectra and sparsification methods respectively, are applied to over twenty different well-established networks. We find that the sparsification method...
The aim of this paper is to undertake an experimental investigation of the trade-offs between pro... more The aim of this paper is to undertake an experimental investigation of the trade-offs between program-size and time computational complexity. The investigation includes an exhaustive exploration and systematic study of the functions computed by the set of all 2-color Turing machines with 2 states -we will write (2,2)-and 3 states -we write (3,2)-with particular attention to the runtimes, space-usages and patterns corresponding to the computed functions when the machines have access to larger resources (more states).
We study formal properties of a Levin-inspired measure $m$ calculated from the output distributio... more We study formal properties of a Levin-inspired measure $m$ calculated from the output distribution of small Turing machines. We introduce and justify finite approximations $m_k$ that have already been used in applications as an alternative to lossless compression algorithms for approximating algorithmic (Kolmogorov-Chaitin) complexity. We provide proofs of the relevant properties of both $m$ and $m_k$ and compare them to Levin's Universal Distribution. Finally, we provide error estimations of $m_k$ with respect to $m$.
The aim of this paper is to undertake an experimental investigation of the trade-offs between pro... more The aim of this paper is to undertake an experimental investigation of the trade-offs between program-size and time computational complexity. The investigation includes an exhaustive exploration and systematic study of the functions computed by the set of all 2-color Turing machines with 2 states -we will write (2,2)-and 3 states -we write (3,2)-with particular attention to the runtimes, space-usages and patterns corresponding to the computed functions when the machines have access to larger resources (more states).
We show that strategies implemented in automatic theorem proving involve an interesting tradeoff ... more We show that strategies implemented in automatic theorem proving involve an interesting tradeoff between execution speed, proving speedup/computational time and usefulness of information. We advance formal definitions for these concepts by way of a notion of normality related to an expected (optimal) theoretical speedup when adding useful information (other theorems as axioms), as compared with actual strategies that can be effectively and efficiently implemented. We propose the existence of an ineluctable tradeoff between this normality and computational time complexity. The argument quantifies the usefulness of information in terms of (positive) speed-up. The results disclose a kind of no-free-lunch scenario and a tradeoff of a fundamental nature. The main theorem in this paper together with the numerical experiment---undertaken using two different automatic theorem provers AProS and Prover9 on random theorems of propositional logic---provide strong theoretical and empirical argum...
We show that real-value approximations of Kolmogorov-Chaitin (K m ) using the algorithmic Coding ... more We show that real-value approximations of Kolmogorov-Chaitin (K m ) using the algorithmic Coding theorem as calculated from the output frequency of a large set of small deterministic Turing machines with up to 5 states (and 2 symbols), is in agreement with the number of instructions used by the Turing machines producing s, which is consistent with strict integer-value program-size complexity. Nevertheless, K m proves to be a finer-grained measure and a potential alternative approach to lossless compression algorithms for small entities, where compression fails. We also show that neither K m nor the number of instructions used shows any correlation with Bennett's Logical Depth LD(s) other than what's predicted by the theory. The agreement between theory and numerical calculations shows that despite the undecidability of these theoretical measures, approximations are stable and meaningful, even for small programs and for short strings. We also announce a first Beta version of an Online Algorithmic Complexity Calculator (OACC), based on a combination of theoretical concepts, as a numerical implementation of the Coding Theorem Method.
We describe a method that combines several theoretical and experimental results to numerically ap... more We describe a method that combines several theoretical and experimental results to numerically approximate the algorithmic (Kolmogorov-Chaitin) complexity of all 8 n=1 2 n bit strings up to 8 bits long, and for some between 9 and 16 bits long. This is done by an exhaustive execution of all deterministic 2-symbol Turing machines with up to 4 states for which the halting times are known thanks to the Busy Beaver problem, that is 11 019 960 576 machines. An output frequency distribution is then computed, from which the algorithmic probability is calculated and the algorithmic complexity evaluated by way of the (Levin-Chaitin) coding theorem.
Sistemas Complejos como Modelos de Computación (Complex Systems as Computing Models)
... Los autores abarcan una variedad de temas relacionados con sistemas complejos, evaluación de ... more ... Los autores abarcan una variedad de temas relacionados con sistemas complejos, evaluación de complejidad de sistemas, cifrado de datos, computación cuántica, modelos de computación inspirados en sistemas biológicos y matemáticos, entre otros. ...
across the output of several systems including abstract devices as cellular automata and Turing m... more across the output of several systems including abstract devices as cellular automata and Turing machines, as well as real-world data sources such as images and human,DNA fragments. This could suggest that all them stand a single distribution in accordance with algorithmic probability. 1.1. Introduction Among the several new ideas and contributions made by Gregory Chaitin
In the past decades many definitions of complexity have been proposed. Most of these definitions ... more In the past decades many definitions of complexity have been proposed. Most of these definitions are based either on Shannon's information theory or on Kolmogorov complexity; these two are often compared, but very few studies integrate the two ideas. In this article we introduce a new measure of complexity that builds on both of these theories. As a demonstration of the concept, the technique is applied to elementary cellular automata and simulations of the self-organization of porphyrin molecules.
Kolmogorov-Chaitin complexity has long been believed to be impossible to approximate when it come... more Kolmogorov-Chaitin complexity has long been believed to be impossible to approximate when it comes to short sequences (e.g. of length 5-50). However, with the newly developed coding theorem method the complexity of strings of length 2-11 can now be numerically estimated. We present the theoretical basis of algorithmic complexity for short strings (ACSS) and describe an R-package providing functions based on ACSS that will cover psychologists' needs and improve upon previous methods in three ways: (1) ACSS is now available not only for binary strings, but for strings based on up to 9 different symbols, (2) ACSS no longer requires time-consuming computing, and (3) a new approach based on ACSS gives access to an estimation of the complexity of strings of any length. Finally, three illustrative examples show how these tools can be applied to psychology.
Uploads
Papers by Hector Zenil