# The 14th Israeli Bioinformatics Symposium IBS 2012, Jerusalem June 7

## List of abstracts

1. Dissecting inner structures in disease regulatory networks: the DICER algorithm
David Amar, Ron Shamir

2. Indel Reliability in Phylogenetic Inference
Haim Ashkenazy, Ofir Cohen, Dorothee Huchon, and Tal Pupko

3. What distinguishes GroEL substrates from other Escherichia coli proteins?
Ariel Azia, Ron Unger, Amnon Horovitz

4. Combinatoric inference of miRNA regulation in human cells
Ohad Balaga, Yitzhak Friedman, and Michal Linial

5. Tissue interactomes illuminate the tissue-selectivity of genetic diseases
Ruth Barshir, Omer Shwartz, Ilan Smoly and Esti Yeger-Lotem

6. Unraveling the mystery of temperature compensation of the Drosophila circadian clock
Osnat Bartok, Manuel Garber, Sebastian Kadener

7. Non-redundant compendium of human ncRNA genes in GeneCards
Frida Belinky, Iris Bahir, Gil Stelzer, Naomi Rosen, Noam Nativ, Irina Dalah, Tsippi Iny Stein, Toutai Mituyama, Marilyn Safran, and Doron Lancet

8. Widespread structural proximity of co-regulated yeast genes
Shay Ben-Elazar, Zohar Yakhini, Itai Yanai

9. Utilizing the Effects of SNPs on Distant Genes Reveals Molecular Links between Diseases
Aharon Brodie, Dr. Yanay Ofran

10. PatchBag: A novel approach for efficient detection of protein structural similarity
Inbal Budowski-Tal, Yael Mandel-Gutfreund, Rachel Kolodny

11. Systematic Identification of Small Antigen Binding Units in Antibodies
Anat Burkovitz, Olga Leiderman, Inbal Sela, and Yanay Ofran

12. Fishing for Virulent Factors: Machine Learning Prediction and Experimental Validation of Bacterial Effectors
David Burstein, Michael Peeri, Tal Zusman, Ziv Lifshitz, Gil Segal, Tal Pupko

13. Promoter sequence determines the relationship between expression level and noise.
Lucas Carey, David van~Dijk and Eran Segal

14. Genome scale systematic analysis of metabolic homeostasis in disease and treatment
Noa Cohen, Allon Wagner, Eytan Ruppin

15. The role of reverse-transcriptase in intron gain and loss mechanisms
Noa E. Cohen, Roy Shen and Liran Carmel

16. Uncovering the co-evolutionary network among microbial gene families
Ofir Cohen, Haim Ashkenazy, David Burstein, Gil Segal, and Tal Pupko

17. Retracted

18. Towards understanding of free text medical records in Hebrew

19. Determinants of translation elongation speed and ribosomal profiling biases in mouse embryonic stem cells
Alexandra Dana and Tamir Tuller

20. In search of the fifth column: Splice variants that undermine the main activity of their gene
Miri Danan and Erez Y. Levanon

21. Infection Dynamics and Transcriptional Programs of a Single Phage During Infection of 3 Marine Synechococcus Hosts
Shany Doron, Ayalla Fedida, Iris Karunker, Debbie Lindell and Rotem Sorek

22. NetCmpt: A network-based tool for calculating the metabolic competition between bacterial species
Anat Kreimer, Adi Doron-Faigenboim, Elhanan Borenstein, Shiri Freilich

23. PloiDB: A Community Resource for Polyploidy Research
Moshe Einhorn, Shing H. Zhan, Itay Mayrose

24. The role of cyanophage tRNAs in the cross infectivity of marine cyanobacteria.
Hagay Enav, Oded Beja and Yael Mandel-Gutfreund

25. Retracted

26. Protein Sequence Reveals the Co-Expression and Co-Localization of Network Hubs and their Interacting Partners
Ariel Feiglin, Shaul Ashkennazi, Avner Schlessinger and Yanay Ofran

27. Age-related patterns in literature-derived knowledge and clinical data
Nophar Geifman & Eitan Rubin

28. DNA-methylation patterns on exon-intron structure and their effect on co-transcriptional splicing
Sahar Gelfman, Ahuvi Yearim and Gil Ast

29. Multi-layered chromatin analysis reveals E2F, SMAD and ZFX as transcriptional regulators of the Histone gene family
David Gokhman, Ilana Livyatan, Shai Melcer and Eran Meshorer

30. Protein characteristics and tissue specificity
Sivan Goren and Yanay Ofran

31. INDI: A computational framework for inferring drug interactions and their associated recommendations
Assaf Gottlieb, Gideon Y. Stein, Yoram Oron,Eytan Ruppin & Roded Sharan

32. Differential GC Content between Exons and Introns Establishes Distinct Strategies of Splice-Site Recognition
Maayan Amit, Maya Donyo, Dror Hollander, Amir Goren, Eddo Kim, Sahar Gelfman, Galit Lev-Maor, David Burstein, Schraga Schwartz, Benny Postolsky, Tal Pupko, Gil Ast

33. Mutational analysis of deafness-related proteins
Adva Yeheskel, Daphne Karfunkel, Zippora Brownstein , and Karen Avraham

34. Yeast promoters maintain their relative activity levels under different growth conditions
Leeat Keren, Ora Zackay, Maya Lotan-Pompan, Danny Zeevi, Adina Weinberger, Ron Milo, Eran Segal

35. Conformational readout of ligand binding sites on RNA
Efrat Kligun, Yael Mandel-Gutfreund

36. Identifying protein recognition elements on RNA by combining sequence and structural information
Refael Kohen, Iddo Z. Ben-Dov, Thomas Tuschl and Yael Mandel-Gutfreund

37. A transcription-splicing integrated network reveals pervasive cross-regulation among regulatory proteins
Idit Kosti and Yael Mandel-Gutfreund

38. GPCR & company: databases and servers for GPCRs and interacting partners
Noga Kowalsman and Masha Y. Niv

39. Why CDRs are not what you think they are or How to identify the real antigen binding sites
Vered Kunik, Bjoern Peters and Yanay Ofran

40. A context aware perspective of protein protein interaction networks
Alexander Lan, Michal Ziv-Ukelson and Esti Yeger-Lotem

41. Novel families of toxin/immunity modules confer phage resistance in bacteria
Azita Leavitt, Hila Sberro, Ruthie Kiro, Udi Qimron, Rotem Sorek

42. Efficient motif search in ranked lists and applications to variable gap motifs
Limor Leibovich and Zohar Yakhini

43. Clk mRNA turnover de-noises circadian transcription and behavior
Immanuel Lerner, Osnat Bartok, Uri Weisbein, Shaked Afik, Chen Gafni, Nir Friedman, and Sebastian Kadener Silberman Institute of Life Sciences, The Hebrew University, Israel; Department of Computer Sciences, The Hebrew University, Israel .

44. Exploring the accuracy of small RNA quantification using microarrays and deep sequencing technologies
Dena Leshkowitz, Shirley Horn-Saban, Yisrael Parmet, Ester Feldmesser

45. Flavors of discovery: computational predictions of new agonists of the bitter taste receptor hTAS2R14
Anat Levit, Ayana Wiener, Stefanie Nowak, Rafik Karaman, Maik Behrens, Wolfgang Meyerhof and Masha Y. Niv

46. A vast collection of microbial genes that are toxic to bacteria
Aya Kimelman, Asaf Levy, Hila Sberro, Shahar Kidron, Azita Leavitt, and Rotem Sorek

47. A Phylogenetic Approach to Music Performance Analysis
Elad Liebman, Eitan Ornoy and Benny Chor

48. Speed Controls in Protein Translation: The Secretory Proteome
Shelly Mahlab, Michal Linial

49. Using Contiguous Bi-Clustering for data driven temporal analysis of fMRIbased functional connectivity
Adi Maron-Katz , Didi Amar, Eti Ben-Simon , Yael Jacob ,Keren Rosenberg , Richard M. Karp , Talma Hendler, Ron Shamir

50. Recently-formed polyploid plants diversify at lower rates
Itay Mayrose, Shing H. Zhan, Carl J. Rothfels, Karen Magnuson-Ford, Michael S. Barker, Loren H. Rieseberg, Sarah P. Otto

51. RNA Tree Comparisons Via Unrooted Unordered Alignments
Nimrod Milo and Shay Zakov and Erez Katzenelson and Eitan Bachmat and Yefim Dinitz and Michal Ziv-Ukelson

52. mtDNA heteroplasmy can be detected Next-Generation Sequencing data without pre-amplification
Tal Nagar, Eitan Rubin

53. A Mathematical Model of 6S RNA Regulation of Gene Expression
Mor Nitzan, Karen Wassarman, Ofer Biham, and Hanah Margalit

54. The H3K27 demethylase Utx facilitates epigenetic reprogramming to pluripotency
Ohad Gafni, Abed AlFatah Mansour, Leehee Weinberger, Muneef Ayyash, Asaf Zviran, Yoach Rais, Vladislav Krupalnik, Mirie Zerbib, Daniela Amann-Zalcenstein, Itay Maza, Shay Geula, Sergey Viukov, Liad Holtzman, Eli Canaani, Shirley Horn-Saban, Ido Amit, Noa Novershtern and Jacob H. Hanna

55. Bacterial gene expression in response to potato tuber soft rot
Shany Ofaim, Elazar Fallik and Shlomo Sela

56. Personalized olfactory receptor repertoire
Tsviya Olender, Sebastian Waszak, Edna Ben-Asher, Miriam Khen and Doron Lancet

57. RAP: accurate prediction of cis-regulatory motifs from protein binding microarrays
Yaron Orenstein, Eran Mick, Ron Shamir

58. Discovering molecular signals using hidden semi-Markov models
Michael Peeri, David Burstein, Gil Segal, Tal Pupko

59. FastML: a web server for probabilistic reconstruction of ancestral sequences
Osnat Penn, Haim Ashkenazy, Adi Doron-Faigenboim, Ofir Cohen, Gina Cannarozzi, Oren Zomer, and Tal Pupko

60. Malacards: the integrated Human Malady Compendium
Noa Rappaport, Noam Nativ, Michal Twik, Gil Stelzer, Frida Belinky, Tsippi Iny Stein, Iris Bahir, Marilyn Safran and Doron Lancet

61. Functional Inference by ProtoNet Family Tree: The Uncharacterized proteome of Daphnia pulex

62. Context-dependent evolution of protein domains
Dan Reshef, Liran Carmel and Ora Schueler-Furman

63. Inferring the Efficiency of tRNA-Codon interaction based on Codon Usage Bias
Renana Sabi and Tamir Tuller

64. Membrane Protein (Pseudo-) Energetics
Chaim A. Schramm., Jason E. Donald, Brett T. Hannigan, Jeffery G. Saven, William F. DeGrado, Ilan Samish

65. Retracted

66. De-novo assembly and Characterization of the Transcriptome of Metschnikowia fructicola reveals differences in gene expression following interaction with Penicillium digitatum and grapefruit peel
Noa Sela, Vera Hershkovitz, Ginat Rafael, Clarita BenDayan, Leena Taha, Michael Wisniewski, Samir Droby

67. Evidence for Allosteric Efects in Antivodies
Inbal Sela-Culang, Shahar Alon and Yanay Ofran

68. Metabolic profiling of fasting and feeding reveals human variability in energy metabolism
Oded Shaham

69. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters
Eilon Sharon, Yael Kalma, Ayala Sharp, Tali Raveh-Sadka, Michal Levo, Danny Zeevi, Leeat Keren, Zohar Yakhini, Adina Weinberger & Eran Segal

70. Mapping the structure of the T cell receptor repertoire using quantitative high throughput sequencing: repertoire biases and public clones
Hilah Gal, Wilfred Ndifon, Eric Shifrut, Nir Friedman

71. Integrative analysis of the yeast phosphorylation network reveals a hierarchy of kinases dominated by a thin layer of phosphatases
Ilan Smoly, Esti Yeger-Lotem

72. MotifNet. A web-interface for network motif analysis
Ilan Smoly, Guy Wald, Esti Yeger-Lotem

73. Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR)
Oz Solomon, Michal Safran, Naamit Deshet Unger, Pinchas Akiva, Jasmine Jacob- Hirsch, Karen Cesarkas, Reut Kabesa, Ninette Amariglio, Ron Unger, Gideon Rechavi, Eran Eyal

74. OPM-based Model Verification Framework with Application to Molecular Biology
Judith Somekh, Valerya Perelman, Chhaya Dhingra, Gal Haimovich, Mordechai Choder, Dov Dori

75. Coordinated Disease-Induced Changes in miRNA-Alternative Splicing networks Identified by Deep transcriptome Sequencing of Parkinson's Leukocytes
Lilach Soreq, Hagai Bergman, Zvi Israel and Hermona Soreq

76. Online statistical enrichment tools for ranked lists of genes
Roy Navon, Israel Steinfeld, Zohar Yakhini

77. The GeneCards human proteome: quantitative tissue expression patterns and expanded sequence similarity space
Gil Stelzer, Frida Belinky, Irina Dalah, Tsippi Iny Stein, Noam Nativ, Naomi Rosen , Noa Rappaport Eugene Kolker, Marilyn Safran, Doron Lancet

78. Modeling single cell stochastic gene expression
Marek Strajbl, Tal El-Hay and Nir Friedman

79. Intracellular metabolite concentrations are explained by an interplay between minimization of total concentration of metabolites and their corresponding enzymes
Naama Tepper, Elad Noor, Daniel Amador-Noguez, Josh Rabinowitz, Wolfram Liebermeister & Tomer Shlomi

80. MORPH: MOdule guided Ranking of candidate PatHway genes in Arabidopsis thaliana and Lycopersicum solanum

81. Compensating mode of regulation by human and viral miRNAs
Isana Veksler-Lublinsky, Yonat Shemer-Avni, Eti Meiri, Zvi Bentwich, Klara Kedem,Michal Ziv-Ukelson

82. Codon bias in pyrimidine-ending codons
Naama Wald, Maya Alroy, Maya Botzman, Hanah Margalit

83. Systematic dissection of roles for chromatin regulators in a yeast stress response
Assaf Weiner, Hsiuyi Chen, Chih Long Liu, Ayelet Rahat, Avital Klien, Luis Soares, Mohanram Gudipati, Jenna Pfeffner, Aviv Regev, Steven Buratowski, Jeffrey A. Pleiss, Nir Friedman, and Oliver J. Rando

84. BitterDB - bitter compounds database and analysis of chemical features associated with bitterness
Ayana Wiener-Dagan, Masha Niv

85. Ontologies from Existence of Homologues: Human vs. Fly Viewpoints
Jonathan Witztum, Erez Persi, David Horn, Metsada Pasmanik-Chor and Benny Chor

86. Three-Dimensional Folding and Functional Organization Principles of the Drosophila Genome
Tom Sexton, Eitan Yaffe, Ephraim Kenigsberg, Frédéric Bantignies, Benjamin Leblanc, Michael Hoichman, Hugues Parrinello, Amos Tanay, Giacomo Cavalli

87. The Evolution of antisense transcripts in 12 yeast species, using de-novo transcriptome assembly
Moran Yassour, Brian J. Haas, Manfred G. Grabherr, Jenna Pfiffner, Joshua Z. Levine, Dawn-Anne Thompson, Aviv Regev & Nir Friedman

88. How the time of replication shapes the genomic structure
Yishai Yehuda, Ephraim Kenigsberg, Shlomit Farkash-Amar, Yaara David, Zohar Yakhini, Andrei Chabes, Amos Tanay, Itamar Simon

89. A novel Metabolic Transformation Algorithm predicts perturbations counteracting aging in yeast and mammalian muscle tissue
Keren Yizhak, Orshay Gabay, Haim Cohen, Eytan Ruppin

90. Uncovering pre-mRNA Splicing Regulation Code in S. Cerevisiae Using A Synthetic Intron Library
Ido Yofe, Tuval Ben Yehezkel, Zohar Zafrir, Tamir Tuller, Ehud Shapiro, Maya Schuldiner

91. A micro-well array reveals contact independent suppression of effector CD4 T-cells by regulatory T-cells at short intercellular distances
Irina Zaretsky, Eric Shifrut, Michal Polonsky, Nir Friedman

92. Using Computational Biology Methods to Improve Post-silicon Microprocessor Testing
Ron Zeira, Dmitry Korchemny and Ron Shamir

93. De-Novo prediction of Histone deacetylase 8 non histone substrates based on computational structural modeling
Lior Zimmerman, Ora Furman

94. RFMapp: Ribosome Flow Model Application

## 1. Dissecting inner structures in disease regulatory networks: the DICER algorithm

### David Amar, Ron Shamir

Abstract

Novel approaches to gene expression analysis seek differential co-expression patterns, wherein the level of co-expression of a particular
set of genes diffffers markedly between disease and control samples. Such patterns can arise from a disease-related change in the regulatory mechanism governing that set of genes. Here we present DICER, a new method for detecting differentially co-expressed gene sets. We introduce a novel probabilistic score for differential correlation, and use it to detect pairs of modules whose intra-module correlation is consistently high but whose inter-module correlation differs markedly between disease and normal samples. DICER outperforms the state of the art methods in terms of signi cance and interpretability of the detected gene sets. Moreover, the discovered gene sets are enriched with disease-speci c microRNA families. In a case study on Alzheimer's disease, DICER dissected biologica processes into functional sub-units that are differentially co-expressed, thereby revealing inner structures in disease regulatory networks.

## 2. Indel Reliability in Phylogenetic Inference

### Haim Ashkenazy, Ofir Cohen, Dorothee Huchon, and Tal Pupko

Abstract

Insertion and deletions (indels) in protein sequences are considered to be rare genomic events and thus it is often assumed that finding the same indel independently in two evolutionary lineages is unlikely. This suggests that indel-based inference of phylogeny should be less subject to homoplasy compared to standard inference based on substitution events. Indeed, indels were recently successfully used to solve debated evolutionary relationships among Metazoa. However, indel based phylogeny may suffer from biases and artifacts that can impede accurate phylogenetic inference. For example, we hypothesized that since indels are never directly observed but rather inferred from the alignment, indel-based inference may be sensitive to the alignment algorithm used. To test this hypothesis, we first determined the level of agreement among different alignment methods (MAFFT, PRANK, and ClustalW). We show that differences among alignments obtained from various methods suffice to generate different indel-based trees. We next developed a method to quantify the reliability of indel characters by measuring how often they appear in a set of alternative multiple sequence alignments. Our approach is based on the assumption that indels, which are consistently present in most alternative alignments are more reliable compared to indels that appear only in a small subset of these alignments. Specifically, for each indel character we assign a reliability score, which is the fraction of sub- and co-optimal alignments in which it is present out of 100 such alignments. We further show that filtering unreliable indels increases the accuracy of indel-based phylogenetic reconstruction. Finally, we conducted a genome scale analysis, in which we characterize unreliable indels (e.g., in terms of the distribution of their length, their position in the protein coding sequence). Taken together, our results show that indel-based inference is sensitive to biases stemming from uncertainty in alignment and that filtering unreliable indels is critical for accurate indel-based phylogeny reconstruction. Our indel reliability program is freely available.

## 3. What distinguishes GroEL substrates from other Escherichia coli proteins?

### Ariel Azia1, Ron Unger1, Amnon Horovitz2

Abstract

Experimental studies and theoretical considerations have shown that only a small subset of Escherichia coli proteins fold in vivo with the help of the GroE chaperone system. These proteins, termed GroE substrates, have been divided into three classes: (a) proteins that can fold independently, but are found to associate with GroEL; (b) proteins that require GroE when the cell is under stress; and (c) obligatory' proteins that require GroE assistance even under normal conditions. It remains unclear, however, why some proteins need GroE and others do not. Here, we review experimental and computational studies that addressed this question by comparing the sequences and structural, biophysical and evolutionary properties of GroE substrates with those of nonsubstrates. In general, obligatory substrates are found to have lower folding propensities and be more aggregation prone. GroE substrates are also more conserved than other proteins and tend to utilize more optimal codons, but this latter feature is less apparent for obligatory substrates. There is no evidence, however, for any specific sequence signatures although there is a tendency for sequence periodicity. Our review shows that reliable sequence- or structure-based predictions of GroE dependency remain a challenge. We suggest that the different classes of GroE substrates be studied separately and that proper control test sets (e.g. TIM barrel proteins that need GroE for folding versus TIM barrels that fold independently) be used more extensively in such studies.

## 4. Combinatoric inference of miRNA regulation in human cells

### Ohad Balaga, Yitzhak Friedman, and Michal Linial

Abstract

MicroRNAs (miRNAs) negatively regulate the levels of mRNA post-transcriptionally. Overexpressing miRNA in cells revealed hundreds of suppressed genes. Additionally, capturing miRNAs at the RISC complex provides a map of miRNAs and their targets. Using these data, we implemented combinatorial and statistical constraints in the miRror2.0 algorithm. miRror estimates the likelihood of a combinatorial action of miRNAs to explain the observed data. A systematical assessment from 30 transcriptomic datasets and hundreds of miRNAs sets shows that miRror is a robust protocol that outperforms a dozen of miRNA-target prediction databases. We then questioned the additive contribution of miRNA pairs. We found that miRNAs belonging to a family adopt an overlapping, backup mode of regulation. However, experimental data support an expanding, complementation mode of regulation by most miRNA pairs. Finally, we activated the miRror protocol on transcriptomic data from manipulated cells, and identified instances in which small miRNAs sets govern the observed gene expression. We propose that the miRNA combinatorial regulation is the chosen strategy in governing cellular homeostasis while overcoming the low specificity assigned to any unique miRNAs.

## 5. Tissue interactomes illuminate the tissue-selectivity of genetic diseases

### Ruth Barshir, Omer Shwartz, Ilan Smoly and Esti Yeger-Lotem

Abstract

A major factor in the functional uniqueness of human tissues is the repertoire of genes and proteins they expresses. We assembled these repertoires from recent extensive analyses of gene and protein expression. Integrating these repertoires with the set of known protein-protein interactions enabled us to construct up-to-date interactomes for 16 main human tissues. We found that all interactomes were composed of a large common backbone that contained many of the hub proteins. Hub proteins were highly enriched for primary regulatory roles, suggesting that tissues share a core regulatory system. We also identified a significant tissue-wide correlation between gene expression level and number of interacting partners. We then harnessed these interactomes to illuminate the tissue-selectivity of genetic diseases that are caused by germline mutations. While the underlying disease genes were typically expressed in many tissues, they showed a significantly higher tendency for tissue-specific interactions. We suggest that the tissue-selectivity of certain genetic diseases might be mediated by protein-protein interactions specific to the disease tissues. Furthermore, these interactions could illuminate the etiology of certain tissue-selective diseases.

## 6. Unraveling the mystery of temperature compensation of the Drosophila circadian clock

### Osnat Bartok1, Manuel Garber2, Sebastian Kadener1

Abstract

Unraveling the mystery of temperature compensation of the Drosophila circadian clock

Osnat Bartok1, Manuel Garber2, Sebastian Kadener1

1- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, , Israel.
2- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts, USA.

Most organisms have circadian rhythms of gene expression and behavior which are maintained by an internal clock. These rhythms are based on single-cell, self-sustained oscillators that keep time by a complex transcriptional-translational feedback loop.
Circadian clocks are extraordinarily robust systems. They are buffered, such that their output (-the circadian behavior) stays invariant/ consistent under different environmental perturbations. A notable example of this robustness is the remarkable property of temperature compensation. Briefly, the free running period of the biochemical/ behavioral oscillations in the circadian clock remains invariant even with big changes in temperature (that ranges from 150C to 290C). This is an intriguing issue, as the rates of enzymatic reactions (which clearly determine circadian period) are strongly dependent on/ changed with temperature. Nonetheless, circadian clocks keep the 24 hour period of daily cycles. Temperature compensation holds not only for homothermous, but also for poikiloterm organisms such as Drosophila. This property is likely the result of having multiple layers of regulation, which assure accurate timekeeping and buffering of stochastic changes into the molecular clockwork.
Our research focuses on exploring the molecular and biochemical mechanisms underling temperature compensation. Accordingly, we are currently profiling the kinetics of transcription, RNA and protein abundance, well as the cellular localization of key regulators of the circadian clock system during the circadian cycle, at different temperatures, ranging from 180C to 290C.
Moreover, in an aim to further characterize additional modulators involved in temperature compensation, we have recently developed a novel digital, sequencing-based, genome- wide transcriptom analysis. This high throughput technique is very cheap and circumvents many limitations of hybridization-based gene expression profiling methods and enables the identification of different (/low abundance) transcript variants expressed under different temperatures. Additionally, for genome-wide assessments of transcription, we are currently using this sequencing methodology on DNA-associated nascent RNAs.

## 7. Non-redundant compendium of human ncRNA genes in GeneCards

### Frida Belinky, Iris Bahir, Gil Stelzer, Naomi Rosen, Noam Nativ, Irina Dalah, Tsippi Iny Stein, Toutai Mituyama, Marilyn Safran, and Doron Lancet

Abstract

Non-coding RNA genes are increasingly acknowledged for their importance in the human genome. However, there is no comprehensive non-redundant database for all such human genes.
We leveraged the effective platform of GeneCards, the human gene compendium, together with the power of fRNAdb, to judiciously unify all non-coding RNA gene entries obtainable from 15 different primary sources. Overlapping entries were clustered to unified locations based on an algorithm employing genomic coordi-nates. This allowed GeneCards' gamut of relevant entries to rise ~3 fold, reaching more than 50,000 human non-redundant ncRNAs. Such "grand unification" within a regularly updated data structure will assist future non-coding RNA research. All these non-coding RNAs are included among the ~95,000 entries in GeneCards V3.08, along with pertinent annotation, automatically mined by its built-in pipeline from 100 data sources. This information is available at www.genecards.org.

## 8. Widespread structural proximity of co-regulated yeast genes

### Shay Ben-Elazar, Zohar Yakhini, Itai Yanai

Abstract

While it has been long known that genes are not randomly ordered within the genome, the degree to which the three dimensional (3D) structure of the genome influences the spatial arrangement of genes has remained elusive. In particular, transcriptional-factories' may arise by positioning co-regulated genes in physical proximity. These questions can now be partially addressed through the analysis of data produced by recently developed genomic technologies and their application to measuring the 3D structure of the E. coli, yeast and human genomes. Here we present evidence that the widespread co-localization of yeast genes regulated by the same factor is far greater when observed at the 3D-scale than by simply accounting for the linear gene order. We first provide a minimum-assumption method for the interpolation and embedding of a published raw measurement data, to produce an unconstrained 3D model of the yeast genome by extending classical embedding approaches from computer science. Next we introduce a statistical framework that enables the comparison of the spatial and genomic densities of any given set of genes. For 107 out of 140 examined transcription factors we observe a 3D co-localization which is more significant than the genomic linear-order co-localization. Furthermore, we found that transcription factors with highly spatially co-localized targets are significantly highly expressed in the structure measurement conditions, validating activity related to the observed target density. A structural analysis of these factories did not reveal recurring 3D structures and therefore no insight about dominant conformations. Overall, our results provide robust evidence for a general organization of the genome into co-regulated spatially-restricted units.

## 9. Utilizing the Effects of SNPs on Distant Genes Reveals Molecular Links between Diseases

### Aharon Brodie, Dr. Yanay Ofran

Abstract

Mapping the relationships between genomic variations and phenotypes is a critical step in biomedical research. Genome-wide association studies (GWAS) are a prevalent approach to finding such associations. While GWAS typically link a single nucleotide polymorphism (SNP) to a phenotype with a certain probability, often they cannot explain the molecular-mechanistic relationship to the phenotype. This is essential for understanding and treating diseases.
We propose a computational framework to model the relationship between multiple SNPs and a phenotype. This is done by associating disease-related-SNPs to genes, and genes to pathways. Results show that for a given disease, associated genes tend to cluster within a few pathways. Finding associations between diseases and pathways enhances our understanding of the molecular basis of diseases.
Using our framework, we demonstrate associations of metabolic pathways to diseases, and use this approach to learn of novel disease-disease relationships.
Finally, observing disease-SNP distributions throughout the genome, we present the prospect that a SNP may be associated with multiple genes. We support this argument by showing improved results when compared to single-gene associations.

## 10. PatchBag: A novel approach for efficient detection of protein structural similarity

### Inbal Budowski-Tal, Yael Mandel-Gutfreund, Rachel Kolodny

Abstract

With the growing amount of protein structures, new methods for identifying structural and functional similarities are critically needed. We present PatchBag, a vector representation of the protein surface that enables accurate identification of structural similarity, based on a bag-of-words' approach.
Firstly, the protein surface is represented by a bag of all its overlapping equal-size surface patches. Surface patches are defined by a central surface residue and its nearest surface residues. Then, the k-means++ algorithm was used to cluster a training set of size 5,056 patches to 100 clusters, represented by their mediods. Further, a protein was characterized by a bag-of-surface patches': a vector representing the number of times each library-patch best approximates a surface patch of a given protein. The similarity between two structures was measured by the cosine distance of their corresponding vectors.
Our results clearly demonstrate that the PatchBag approach, which is based solely on the proteins surface, can accurately identify known structural similarities. We use a test set of ~2900 sequence non- redundant proteins, and calculate the PatchBag distances between structure pairs at different CATH levels. Our analysis shows that average PatchBag distance grows as the sets become more structurally diverse.
We propose that the method, which is based on surface rather than fold similarity, will be of great advantage for function prediction from distantly related proteins as well as proteins which have evolved by convergent evolution.

## 11. Systematic Identification of Small Antigen Binding Units in Antibodies

### Anat Burkovitz, Olga Leiderman, Inbal Sela, and Yanay Ofran

Abstract

Anecdotal studies found that some Complementarity Determining Regions (CDR) derived peptides can bind antigen (Ag), but no a-priori principles for the identification of such peptides from antibodies(Abs) were suggested. Here, we show that through computational analysis, it is possible to predict which CDRs may putatively bind the Ag as peptides. We systematically scanned peptides of the anti-Hen Egg White Lysozyme Ab HyHEL-10 and determined experimentally which of them maintains specificity to the Ag on its own and we show that the experimental and computational analyses are in agreement with each other. These results may help the design of active CDR-derived peptides.

## 12. Fishing for Virulent Factors: Machine Learning Prediction and Experimental Validation of Bacterial Effectors

### David Burstein, Michael Peeri, Tal Zusman, Ziv Lifshitz, Gil Segal, Tal Pupko

Abstract

Numerous pathogenic bacteria exert their function by translocating a set of proteins, termed effectors, into the cytoplasm of their host cell. The primary goal of this study was to identify novel effectors in a genomic scale, towards a better understanding of the molecular mechanisms of bacterial pathogenesis. We applied a machine learning approach for the detection of effectors in the intracellular pathogen Legionella pneumophila, the causative agent of the Legionnaires' disease, a severe pneumonia-like disease. Our approach is based on the combination of several classification algorithms trained on a variety of features collected on a genomic scale. We applied this methodology to predict and experimentally validate dozens of new effectors. Notably, our computational predictions had a high accuracy rate of over 90%. Having a large pool of identified effectors, we studied the signals that enable the secretion of effectors. We have implemented a hidden semi-Markov model (HSMM) to characterize regions that are recognized by the bacterial secretion machinery. Using the HSMM we were able to detect novel effectors in different species of Legionella, as well as in Coxiella burnetii, an extremely infectious pathogen and a potential bio-terrorism agent. Based on the HSMM we were able to synthesize, for the first time, an artificial secretion signal, and experimentally prove its translocation. Furthermore, we are using similar machine learning approaches to identify pathogenic determinants in several other pathogens, including the food-borne Salmonella enterica, the plant pathogen Xanthomonas campestris, and Pseudomonas aeruginosa - the predominant respiratory pathogen in cystic fibrosis (CF) patients.

## 13. Promoter sequence determines the relationship between expression level and noise.

### Lucas Carey, David van~Dijk and Eran Segal

Abstract

The ability of cells to accurately control gene expression levels in response to extracellular cues is limited by the inherently stochastic nature of transcriptional regulation. A change in TF activity results in changes in the expression of its targets, but the way in which cell-to-cell variability in expression (noise) changes as a function of TF activity, and whether targets of the same TF behave similarly, is not known. Here, we measure expression and noise as a function of TF activity for twenty native targets of the transcription factor Zap1 that are regulated by it through diverse mechanisms. For most activated and repressed Zap1 targets, noise decreases as expression increases. Kinetic modeling suggests that this is due to two distinct Zap1-mediated mechanisms that both change the frequency of transcriptional bursts. Notably, we found that another mechanism of repression by Zap1, which is encoded in the promoter DNA, likely decreases the size of transcriptional bursts, producing a unique transcriptional state characterized by low expression and low noise. Our results suggest a global principle whereby at low TF concentrations, the dominant source of differences in expression between promoters stems from differences in burst frequency, whereas at high TF concentrations differences in burst size dominate. Taken together, we show that the precise amount by which noise changes with expression is specific to the regulatory mechanism of transcription and translation that acts at each gene.

## 14. Genome scale systematic analysis of metabolic homeostasis in disease and treatment

### Noa Cohen, Allon Wagner, Eytan Ruppin

Abstract

The notion of homeostasis and its disruption is central to our perception of health and disease. We conduct the first systematic study of this concept by quantifying disruptions to healthy homeostasis in the human metabolic model. We find that, as expected, the loss of disease genes causes a larger deviation from the normal physiological state than the loss of non-disease genes. We then simulate metabolic drugs' treatment by projecting their gene expression signatures onto the model and surprisingly find that they fail to reinstate the healthy state. The drugs correct disease-induced disruptions, yet simultaneously cause undesirable network-wide alterations that may account for their side-effects. In contrast, positive lifestyle changes (e.g., regular exercise) do produce a global favorable effect, to some extent. Overall, our findings highlight the potential utility of a network-based paradigm for drug target identification that will be aimed at reinstating healthy homeostasis.

## 15. The role of reverse-transcriptase in intron gain and loss mechanisms

### Noa E. Cohen, Roy Shen and Liran Carmel

Abstract

The mechanisms that underlie the processes of intron gain and loss in eukaryotes are notoriously elusive. It is likely that intron gain and loss are driven by multiple mechanisms, which may be specific to certain evolutionary times and certain taxonomic groups. Here, we shed light on this question by looking at the relationship between intron gain and loss rates on evolutionary scale.

To this end, we use a previously developed expectation-maximization algorithm to reconstruct the evolutionary history of the intron-exon architecture of 391 genes that have orthologs in 19 eukaryotic species. The algorithm estimates intron gain and loss rates in each lineage and each gene.
Looking at the gene-specific rates, we find that (1) intron gain and loss rates are positively correlated in intron-rich species, and are negatively correlated in intron-poor species, and that (2) the well-known tendency of introns in intron-poor species to reside towards the 5'-end (positional bias) is due to increased intron loss rate towards the 3'-end. Such positional bias is usually recognized as the fingerprint of reverse-transcriptase (RT) related process. Interpreting positive correlation between gain and loss as a sign of common mechanistic component to both processes, these findings suggest that RT-based mechanism is the dominant factor leading to intron loss in intron-poor species, and that intron gain in these lineages do not involve RT. Moreover, intron-rich lineages show no evidence for RT-related intron loss or gain.

In addition, we show that different taxonomic groups show unique characteristics of positional bias and of gain-loss correlation. This suggests that different mechanisms operate in different parts of the phylogenetic tree.

## 16. Uncovering the co-evolutionary network among microbial gene families

### Ofir Cohen, Haim Ashkenazy, David Burstein, Gil Segal, and Tal Pupko

Abstract

Correlated events of gains and losses of gene families enable inference of co-evolution relations and functional association. We present a novel probabilistic methodology for the detection of co-evolutionary relations between gene families and apply it on large-scale data of prokaryote genomes. We inferred the co-evolutionary network among 4,593 gene families. The number of co-evolutionary interactions substantially differed among gene families. Approximately 40% were found to co-evolve with at least one partner. We partitioned the network of co-evolutionary relations into components and uncovered multiple modular assemblies of gene families with clearly defined functions. Finally, we measured the extent to which co-evolutionary relations coincide with other cellular relations such as genomic prox-imity, gene fusion propensity, co-expression, protein-protein interactions, and metabolic connections. Our results show that co-evolutionary relations only partially overlap with these other types of networks. Our results suggest that the inferred co-evolutionary network in prokaryotes is highly informative towards revealing functional relations among gene families, often showing signal that cannot be extracted from other net-work types.

## 18. Towards understanding of free text medical records in Hebrew

Abstract

The increasing availability of Electronic Health Record (EHR) data and specifically free-text patient notes presents opportunities for extraction of phenotypes, treatment and treatment outcome on a large scale. These data can make a significant contribution to basic science in many fields that require detailed phenotypic information such as linking phenotypes to genetic variance. Various Natural Language Processing (NLP) methods were adapted to the medical domain. Most of these rely on the UMLS, a medical vocabulary with over 300K unique terms and more than 1M synonyms. Here, we present a method for automatically creating a Hebrew-UMLS lexicon. We make public this resource mapping 8K medical Hebrew terms to the UMLS. We show that creating this resource reduces the error for the NLP tasks of segmentation and Part of Speech (POS) tagging. We examine the impact of this improvement on a classification task: identifying patients with Epilepsy from the notes of the Children Neurology Unit in Soroka, resulting in F1 improvement from 92% to 96%.

## 19. Determinants of translation elongation speed and ribosomal profiling biases in mouse embryonic stem cells

### Alexandra Dana and Tamir Tuller

Abstract

Ribosomal profiling is a promising approach with increasing popularity for studying translation. This approach enables monitoring the ribosomal density along genes at a resolution of single nucleotides.

In this study, we focused on profiles of ribosomal density generated in mouse embryonic stem cells. Our analysis suggests, for the first time, that even in mammals such as M. musculus the elongation speed is significantly and directly affected by determinants of the coding sequence such as: 1) adaptation of codons to the tRNA pool; 2) the local mRNA folding of the coding sequence; 3) the local charge of codons. In addition, our analyses suggest that in general, the translation velocity of ribosomes is slower at the beginning of the coding sequence and tends to increase downstream.

Finally, comparison of these data to biophysical models of translation suggest that it suffers from some unknown biases which cause ribosomal flux to increase along the coding sequence, while according to the biophysical models it is expected to remain constant or decrease. Thus, developing experimental and/or statistical methods for detecting and dealing with such biases is of high importance.

## 20. In search of the fifth column: Splice variants that undermine the main activity of their gene

### Miri Danan and Erez Y. Levanon

Abstract

Alternative splicing is considered a main source for protein diversity. However, the extent of the physiology meaningful protein isoforms that result from splicing is still unknown. One of the known functional consequences of alternative splicing, which was shown to drive determinative physiological change, is creation of a dominant negative inhibitor that acts antagonistically to the main functional variant. Elimination of a functional part of a protein by either exon exclusion or creation of truncated protein can cause a dominant negative variant. In this study, we aim to systematically study exon-exclusion transcripts predicted to function as dominant negative inhibitors. Using RNA sequencing (RNA-seq) data conducted from 16 human tissues and 7 human cell lines, we detected known, as well as, many novel dominant negative variants. Our results indicate dominant negative variants derived from exon exclusion are more prevalent than currently thought, and suggest another important regulation layer which should be further explored.

## 21. Infection Dynamics and Transcriptional Programs of a Single Phage During Infection of 3 Marine Synechococcus Hosts

### Shany Doron, Ayalla Fedida, Iris Karunker, Debbie Lindell and Rotem Sorek

Abstract

Cyanobacteria from the genus Synechococcus are highly abundant in the oceans where they contribute significantly to primary production. Lytic viruses that infect them are also abundant in the oceans and cause the death of their cyanobacterial hosts at the end of the infection cycle. Here we investigated the interaction of three Synechococcus strains, WH8102, WH7803 and WH8109, with the broad host range T4-like myovirus, Syn9. Analysis of Syn9 infection dynamics on each of the hosts revealed that the length of the lytic cycle is 6-8h, with a latent period of 4h. Using massively parallel cDNA sequencing, we investigated the transcriptional programs of both phage and host during the latent period of infection. By 30 minutes after infection the majority of mRNA in the cell was already phage-derived. The vast majority of phage genes followed a strict expression program regardless of the infected host. However, a small set of phage genes showed host-specific differential expression, possibly implying adaptation to different host cellular backgrounds. We also identified a set of host genes whose expression becomes relatively more abundant following phage infection, suggesting either an attempt at defence against the phage or phage-induced production of host proteins necessary for the progression of phage infection. Furthermore, we constructed a genome-wide promoter map for phage and host genes. These are the first steps towards deciphering the genome-wide transcription program of a T4-like cyanophage and the response to infection by different Synechococcus hosts.

## 22. NetCmpt: A network-based tool for calculating the metabolic competition between bacterial species

### Anat Kreimer, Adi Doron-Faigenboim, Elhanan Borenstein, Shiri Freilich

Abstract

NetCmpt is a tool for calculating the competitive potential between pairs of bacterial species. The score describes the effective metabolic overlap (EMO) between two species, derived from analyzing the topology of the corresponding metabolic models. NetCmpt is
based on the EMO algorithm, developed and validated in previous studies. It takes as input lists of species-specific enzymatic reactions (EC numbers) and generates a matrix of the potential competition scores between all pairwise combinations.
NetCmpt is provided as both a web tool and a software package, designed for the use of non-computational biologists.

## 23. PloiDB: A Community Resource for Polyploidy Research

### Moshe Einhorn, Shing H. Zhan, Itay Mayrose

Abstract

Polyploidy has long fascinated biologists as a potentially significant source of plant diversification, yet our understanding regarding the evolutionary and ecological consequences of polyploidy are far from being resolved and heavily rely on theoretical studies. Owing, in part, to the absence of large comparative data, most empirical studies are confined to particular geographic regions and/or narrow taxonomic space. As such, treatment of central theoretical hypotheses regarding various aspects of polyploidy evolution within a statistically robust phylogenetic framework is generally absent.
Here we lay the foundations for such future investigations by constructing PloiDB, the plant ploidy-level database. PloiDB aims to provide the dated history of ploidy transitions for each plant species having cytological and/or sequence data. The inference scheme is divided into two time scales, corresponding to the computational tools used: ploidy transitions occurring relatively recently (i.e. those occurring since divergence from the common ancestor of each genus examined) and those that occurred at deeper time scales. Recent ploidy transitions are inferred based on variations in chromosome number using chromEvol, a recently developed probabilistic model of chromosome number evolution. This likelihood-based method assesses the fit of several models of chromosome number change along a phylogeny, infers the expected number of ploidy transitions along each branch, and estimates the ancestral chromosome number at the root of the tree. In its current implementation, the database covers nearly 400 plant genera, encompassing more than 25,000 named species. In the next release, we aim to uncover deeper polyploid events using a combination of cytological and genomics analyses.
PloiDB is made publicly available through a user-friendly web interface, providing researchers with easy access and retrieval capabilities of the data. By uniting polyploidy inference schemes into a common framework we hope the database to serve as a valuable community resource, enabling a large array of studies - from the analysis of detailed taxonomic groups to meta-analyses concerning general hypotheses regarding polyploidy and plant evolution.

## 24. The role of cyanophage tRNAs in the cross infectivity of marine cyanobacteria.

### Hagay Enav, Oded Beja and Yael Mandel-Gutfreund

Abstract

Marine cyanobacteria of the genera Prochlorococcus and Synechococcus are the most abundant photosynthetic prokaryotes in oceanic environments, and are key contributors to global CO2 fixation, chlorophyll biomass and primary production. Cyanophages, viruses infecting cyanobacteria, are a major force in the ecology of their hosts. These phages contribute greatly to cyanobacterial mortality, therefore acting as a powerful selective force upon their hosts. Phage reproduction is based on utilization of the host transcription and translation mechanisms; therefore, differences in the G+C genomic content between cyanophages and their hosts could be a limiting factor for the translation of cyanophage genes. On the basis of comprehensive genomic analyses conducted in this study, we suggest that cyanophages of the Myoviridae family, which can infect both Prochlorococcus and Synechococcus, overcome this limitation by carrying additional sets of tRNAs in their genomes accommodating AU rich codons. Whereas the tRNA genes are less needed when infecting their Prochlorococcus hosts, which possess a similar G+C content to the cyanophage, the additional tRNAs may increase the overall translational efficiency of their genes when infecting a Synechococcus host (with high G+C content), therefore potentially enabling the infection of multiple hosts.

## 26. Protein Sequence Reveals the Co-Expression and Co-Localization of Network Hubs and their Interacting Partners

### Ariel Feiglin, Shaul Ashkennazi, Avner Schlessinger and Yanay Ofran

Abstract

Proteins that are highly connected in protein-protein interaction networks, known as hubs, have been shown to have an important role in controlling and organizing the network. Hubs can be divided into two modes of interaction. Intra-modular hubs interact with their partners at the same time and in the same sub-cellular compartment while inter-modular hubs are not distinctly correlated in space or time. Structural analyses suggest that intra-modular hubs have multiple separate binding sites that may enable simultaneous interactions while inter-modular hubs display high levels of structural disorder, conjecturally to promote promiscuous binding. Based on these features, we analyse a set of inter and intra-modular hubs from yeast and show that it is possible to distinguish between them based on their amino-acid sequence. We then use these sequence extractable features to predict the co-expression and co-localization of hundreds of human hub proteins with their interacting partners. Our results suggest that the spatiotemporal coordination of biological processes by network hubs is encoded, at least in part, in their amino acid sequence.

## 27. Age-related patterns in literature-derived knowledge and clinical data

### Nophar Geifman & Eitan Rubin

Abstract

Background: Age plays an important role in medicine and medical research; it is an important factor when considering phenotypic changes in health and disease. We recently developed the Age-Phenome Knowledgebase (APK) which formaly represents knowledge about clinically-relevant traits such as disease, that occur at different ages. The APK holds over 35,000 entries which describe the relationships between age and phenotypes, mined from over 1.5 million PubMed abstracts. In this work we demonstrate the integrative analysis of raw clinical measurments with knowldge mined from the literature.

Methods and Results: Data from the NHANES III survey was used to calculate the fraction of abnormal blood test results at each age. Furthermore, the ages of diagnosis of diseases in NHANES were also captured. The pattern of change in abnormal blood values and disease diagnosis was compared to a normalised measure of age-to-disease reports from APK. Correlatation analysis of abnormal blood results with APK-derived disease patterns revealed several interesting positive and negative correlations, especially when allowing a 1-3 year shift between the patterns. Furthermore, comparison of the pattern of age of diagnosis of disease from NHANES subjects to that described in the medical literature and captured in APK reveals that these patterns are in good agreement for the majority of diseases tested.

Conclusions: In this work we demonstrate the usefulness of analysis of data obtained from different types of bio-medical resources. Furthermore, we show that knowledge stored in the APK captures current medical knowledge and is comparable to that observed in clinical data.

## 28. DNA-methylation patterns on exon-intron structure and their effect on co-transcriptional splicing

### Sahar Gelfman, Ahuvi Yearim and Gil Ast

Abstract

DNA-methylation is higher in exons compared to the flanking intronic sequences, and recently DNA-methylation was found to be involved in exon recognition via co-transcriptional splicing. Exons contain higher level of GC content compared to the flanking sequences. Thus, the biological implications of higher CpG methylation in exons are not clear. In this work, we use genome-wide single-base resolution data of DNA-methylation and nucleosome occupancy to determine their pattern upon the exon-intron structure. For the first time, we exhibit a pattern that is not biased by GC content, through the use of a large control group of exons with no GC content differential between exon and flanking introns. We find that DNA-methylation marks the exons, regardless of GC content, while nucleosome occupancy is strongly affected by GC content, and find this pattern to encompass the whole human exome. Moreover, we reveal that differential of methylated CpGs between exon and introns, and not absolute methylation level, distinguishes alternative exons from constitutive ones, and only when there is no GC differential between exon and introns. We find that position -2 of the 5' splice site is highly methylated, and a CG dinucleotide in that position is also correlated with higher inclusion of alternative exons, although it destabilizes base-pairing with U1 snRNA. Furthermore, a CG dinucleotide in that position and others along the exon-intron junctions accompanies higher nucleosome occupancy levels. Overall, these results indicate that DNA-methylation is a major factor in exon recognition and is influenced by the GC differential between exon and introns.

## 29. Multi-layered chromatin analysis reveals E2F, SMAD and ZFX as transcriptional regulators of the Histone gene family

### David Gokhman, Ilana Livyatan, Shai Melcer and Eran Meshorer

Abstract

Histones proteins are the building blocks of eukaryotic chromatin, and are essential for the packaging, function and regulation of the genome. The canonical replication-dependent histones must be tightly regulated with the cell cycle as they are primarily required during S-phase. Surprisingly, little is known about the transcriptional regulation of histone gene expression. Here, we conducted a comprehensive computational analysis, based on genome-wide ChIP-seq/ChIP-chip data of more than 50 transcription factors and histone modifications in embryonic stem cells (ESCs). We found significant enrichment of nine transcriptional regulators on histone genes, including E2F1, E2F4, SMAD1, SMAD2, ZFX, Ep300, YY1, TET1 and CTCF. Some of these factors, e.g. E2F1, E2F4 and TET1 act as activators for all histone genes, while others, e.g. SMAD1 and p300, are more restricted. We also identify CTCF and ZFX as repressors of core and linker histones, respectively. Finally, we find that YY1 binding to histone gene promoters is restricted to differentiated cells, and completely absent in ESCs. We propose that the regulation of histone gene transcription is significantly more complex than previously perceived, and that the combination of factors orchestrate histone gene regulation, from strict synchronization with S-phase of all histones to targeted regulation of specific histone subtypes.

## 30. Protein characteristics and tissue specificity

### Sivan Goren and Yanay Ofran

Abstract

An important aspect of protein function in vertebrates is tissue specificity.
Genes specific to a given tissue are regulated by common mechanisms and hence prediction of tissue specificity is based mostly on identifying DNA sequence motifs upstream to the gene. We hypothesize that tissue specificity may be reflected not only in regulatory DNA elements but also in the biophysical characteristics of the proteins.
The success of most subcellular prediction methods is based, in part, on the notion that each subcellular compartment may constitute a slightly different microenvironment. This is the rationale behind the fact that SCL prediction methods incorporate features such as amino acid composition and other sequence derived features. Similarly, we hypothesize that, at least in some cases, tissue specific proteins may possess some common features that are identifiable from sequence and make them stable in a specific tissue.
We used expression data to identify tissue specific proteins. Using these data we analyzed sequence derived features for tissue specific proteins and for ubiquitously expressed ones. We also analyzed and compared proteins specific to different tissues.
We demonstrate that such protein-derived features distinguish between proteins that are tissue specific and proteins that are expressed ubiquitously. Furthermore, we show that sequence-derived features can help determine whether a protein is specific to a given tissue. Interestingly, the predictions we make based on protein-derived features are at least as good as predictions that are based on DNA regulatory motifs. Thus, features extractable from the protein sequence may be used to improve the prediction of tissue specificity and promote our understanding of the mechanism of tissue specificity and its influence on the proteins function.

## 31. INDI: A computational framework for inferring drug interactions and their associated recommendations

### Assaf Gottlieb, Gideon Y. Stein, Yoram Oron,Eytan Ruppin & Roded Sharan

Abstract

Inferring drug-drug interactions (DDIs) is an essential step in drug development and drug administration. Most computational inference methods focus on modeling drug pharmacokinetics, aiming at interactions that result from a common metabolizing enzyme (CYP). Here we introduce a novel prediction method, INDI, allowing the inference of both pharmacokinetic, CYP-related DDIs (along with their associated CYPs) and pharmacodynamic, non-CYP associated ones. On cross validation, it obtains high specificity and sensitivity levels (AUC>=0.93). In application to the FDA adverse event reporting system, 53% of the drug events could potentially be connected to known (41%) or predicted (12%) DDIs. Additionally, INDI predicts the severity level of each DDI upon co-administration of the involved drugs, suggesting that severe interactions are abundant in the clinical practice. Examining regularly-taken medications by hospitalized patients, 18% of the patients receive known or predicted severely interacting drugs and are hospitalized more frequently. Access to INDI and its predictions is provided via a web tool, facilitating the inference and exploration of drug interactions and providing important leads for physicians and pharmaceutical companies alike.

## 32. Differential GC Content between Exons and Introns Establishes Distinct Strategies of Splice-Site Recognition

### Maayan Amit, Maya Donyo, Dror Hollander, Amir Goren, Eddo Kim, Sahar Gelfman, Galit Lev-Maor, David Burstein, Schraga Schwartz, Benny Postolsky, Tal Pupko, Gil Ast

Abstract

During evolution segments of homeothermic genomes underwent a GC content increase. Our analyses reveal that two exon-intron architectures have evolved from an ancestral state of low GC content exons flanked by short introns with a lower GC content. One group underwent a GC content elevation that abolished the differential exon-intron GC content, with introns remaining short. The other group retained the overall low GC content as well as the differential exon-intron GC content, and is associated with longer introns. We show that differential exon-intron GC content regulates exon inclusion level in this group, in which disease-associated mutations often lead to exon skipping. This group's exons also display higher nucleosome occupancy compared to flanking introns and exons of the other group, thus "marking" them for spliceosomal recognition. Collectively, our results reveal that differential exon-intron GC content is a previously unidentified determinant of exon selection and argue that the two GC content architectures reflect the two mechanisms by which splicing signals are recognized: exon definition and intron definition.

## 33. Mutational analysis of deafness-related proteins

### Adva Yeheskel, Daphne Karfunkel, Zippora Brownstein , and Karen Avraham

Abstract

Hereditary hearing loss is genetically highly heterogenous, complicating the discovery of pathogenic mutations. There are >100 unsolved loci in humans that contain genes predicted to carry mutations involved in hearing loss. Deep or massively parallel (MPS) sequencing technology enables the rapid identification of genetic mutations on a large scale. Multiple potential causal mutations were identified from a recent targeted capture and MPS experiment of 284 genes associated with hearing loss in 96 deaf probands from Israeli Jewish and Palestinian Arab populations. In order to determine which variants are possible pathogenic mutations, we used protein structure predictions.
Here we focused on four proteins: MYO7A (2 variants), TMC1 (4 variants), COCH and USH2A. Three dimensional (3D) models of the MYO7A head domain, COCH VWFA domain, and USH2A fibronectin type 3 domain were constructed based on homology to known structures. For TMC1, a transmembrane protein, secondary structure predictions together with multiple sequence alignment (MSA) were used to analyze the mutations. For the 2 variants identified in MYO7A, we predict that one leads to a hydrogen bond disruption and the other lies in the ATP-binding site. The USH2A mutated amino acid is one of 4 residues in the folding nucleus of this domain. The COCH mutation may cause a pi-pi interaction near the conserved Mg2+ binding site. Protein structure predictions may help provide insight into the molecular basis of function, and their mechanistic involvement in the pathophysiology of hearing loss. A comprehensive understanding of functional activity may promote new viewpoints for treatments, including drug design.

## 34. Yeast promoters maintain their relative activity levels under different growth conditions

### Leeat Keren, Ora Zackay, Maya Lotan-Pompan, Danny Zeevi, Adina Weinberger, Ron Milo, Eran Segal

Abstract

The expression of thousands of yeast genes changes in response to different environments. Many of these changes are correlated with the cells' growth rate, but a quantitative understanding of the relationships between expression changes across conditions is still missing. Here, we used fluorescent reporters to obtain accurate activity measurements of 859 native yeast promoters in 10 environmental conditions. Strikingly, we found that although nearly all promoters change their absolute activity levels across conditions, 60-90% preserve their relative activity levels between each pair of conditions. Moreover, the remaining condition-specific promoters can be organized into a handful of functionally related groups, such that within each group, promoters also preserve their relative activity levels across conditions in which they are activated. Thus, seemingly complex genome-wide activity profiles can be accurately described by only a few scaling factors, suggesting a common underlying mechanism. We present a simple resource allocation model that accounts for growth rate and differential assignment of cellular resources to condition-specific promoters that can explain a major fraction of these scaling factors. Our results suggest that many changes in promoter activities across conditions may result from changes in global cellular constraints and not from active regulation, thus providing a new interpretation of genome-wide expression profiles.

## 35. Conformational readout of ligand binding sites on RNA

### Efrat Kligun, Yael Mandel-Gutfreund

Abstract

RNA molecules have highly versatile structures that can fold into a multitude of conformations, providing potential binding pockets for specific drug recognition sites. The increasing number of available RNA structures, in complex with proteins, small ligand and in free form enables the design of new therapeutically useful RNA-binding ligands. Here we analyzed the conformational properties of 79 RNA-ligand complexes from 11 RNA groups, extracted from the Protein Data Bank (PDB). We analyzed the chemical, physical and structural properties of the binding pockets around the ligand. Comparing the properties of the ligand binding pockets to the properties of computed pockets extracted from all protein-RNA interfaces as well as all available RNA structures revealed that ligand binding pockets are characterized by unique properties. We show that the nucleotides with the rare structural properties are indeed involved in direct interactions with the ligand, specifically via hydrogen bonds. We propose that the unique structural properties of nucleotides in the ligand binding pockets on RNA contribute to the specific recognition of the binding pocket via a "conformational readout".

## 36. Identifying protein recognition elements on RNA by combining sequence and structural information

### Refael Kohen, Iddo Z. Ben-Dov, Thomas Tuschl and Yael Mandel-Gutfreund

Abstract

RNA binding proteins (RBPs) bind RNA and control many processes such as splicing, translation, RNA transfer and localization of RNA in the cell. As opposed to DNA binding proteins that usually have a unique binding motif, most recognition elements on RNA are short and degenerative. Based on the relatively low information content of RNA binding motifs it is proposed that the identification of binding sites on RNA is not based solely on the nucleotide sequence. Previous studies have shown that secondary structure of RNA plays an important role in recognition of the binding site by the protein. Here we developed a method to search for protein binding motifs on RNA by combining sequence and secondary structure information employing a new approach that represents the sequence and secondary structure information in a mutual alphabet.
We employed the method to search for motifs in high throughput binding data (e.g. RIP-Chip, CLIP-seq and PAR-CLIP). In several cases, such as for the Pumilio family RBPs and different splicing factors we were able to detect the experimentally verified motifs for which we could add additional information regarding their secondary structure preference. Further, we have employed our approach to PAR-CLIP data for the BICC1 (Bicaudal-C homolog 1) RBP and identified a novel sequence and structural motif. Evolutionary conservation analyses confirmed that the detected motifs are significantly more conserved relative to the rest of the sequence as well as when compared to the PAR-CLIP anchor sites which do not contain the motif, reinforcing that the detected motifs are functional. Taken together our results strongly support that secondary structure information plays an important role in RNA-protein recognition.

## 37. A transcription-splicing integrated network reveals pervasive cross-regulation among regulatory proteins

### Idit Kosti and Yael Mandel-Gutfreund

Abstract

Traditionally the gene expression pathway was regarded as being composed of independent steps, from RNA transcription to protein translation. To-date there is increasing evidence for coupling between the different processes of the pathway, specifically between transcription and splicing. Given the extensive cross-talk between these processes, we derived a transcription-splicing integrated network. The nodes of the network included experimentally verified human proteins belonging to three groups of regulators: Transcription factors (TFs), splicing factors (SFs) and kinases. The nodes were wired by instances of predicted transcriptional and alternative splicing regulation. Analysis of the network indicated a pervasive cross-regulation among the nodes, specifically; SFs were significantly more often regulated by alternative splicing relative to the two other subgroups, while TFs were more extensively controlled by transcriptional regulation. In particular, we found a significant preference of specific pairs of TF-TF and SF-SF to regulate their target genes, SFs being the most regulated group via independent and combinatorial binding of SFs. Consistent with the extensive cross-regulation among the splicing and transcription factors, the subgroup of kinases within the network had the highest density of predicted phosphorylation sites. The prevalent regulation of the regulatory proteins was further supported by computational analysis of the protein sequences, demonstrating the propensity of these proteins to be highly disordered relative to other proteins in the human proteome. Overall, our systematic study reveals that an organizing principle in the logic of integrated networks favor the regulation of regulatory proteins by the specific regulation they conduct. Based on these results we propose a new regulatory paradigm, postulating that fine-tuned gene expression regulation of the master regulators in the cell is commonly achieved by cross-regulation.

## 38. GPCR & company: databases and servers for GPCRs and interacting partners

### Noga Kowalsman and Masha Y. Niv

Abstract

G-protein coupled receptors (GPCRs) constitute a large superfamily of membrane receptors that are involved in a wide range of signaling pathways. This makes them one of the most important classes of pharmacological targets.
In the last years the amount of data on GPCRs increased dramatically along with increase in number of solved GPCR structures. A vast repertoire of web-based databases and servers are being developed to organize this data and to make it easily accessible to researchers from various fields.
One of the main aspects of GPCR-targeted drug development is the interactions between the GPCR and other molecules, such as natural ligands, synthetic agonists and antagonists, other GPCRs and G-proteins.
We review freely available databases and servers that supply information about GPCR interactions. These are organized into the following topics: i) databases dealing with general GPCR-ligand interactions, ii) specialized GPCR-ligand interaction databases, such as, for example, the database for bitter compounds (BitterDB) that was developed in our group iii) databases focused on GPCR oligomerization iv) databases dedicated to GPCR-G protein interactions and signaling and v) databases and servers pertaining to structural information on GPCRs.
We conclude by outlining challenges and opportunities in establishing, maintaining, and using GPCR-partner databases and servers, and highlight topics that, in our opinion, can benefit from online tools development.

## 39. Why CDRs are not what you think they are or How to identify the real antigen binding sites

### Vered Kunik, Bjoern Peters and Yanay Ofran

Abstract

Identification of the residues within an antibody (Ab) that recognize and bind the antigen (Ag) is at the heart of immunological research. Complementarity Determining Regions (CDRs) are considered a proxy for the sites of Ag recognition and binding and are typically discerned by searching for the regions that are most different, in sequence or in structure, between Abs. Therefore, the most widely used bioinfomatic immunological tools are those aimed at identifying CDRs. In this study we compare the residues identified by CDR identification tools to residues that are found experimentally to bind the Ag. We found that >20% of Ag binding residues fall outside the CDRs these methods identify. Thus, we conclude, these widely used methods may not constitute a comprehensive strategy for Ag binding sites identification. By analyzing all Ab-Ag complexes, we found that virtually all Ag binding residues fall within regions of structural consensus between Abs, and that these regions are organized along the sequence of the Ab chains. Moreover, we show that residues that fall outside CDRs are at least as important to Ag binding as residues within CDRs. On the other hand, Ag binding residues that fall outside the structural consensus regions but within CDRs show a marginal energetic contribution to Ag binding. The high affinity and specificity of Ab-Ag interactions are fundamental for understanding the biological activity of these molecules. Correct identification of the residues that mediate these interactions is crucial for numerous molecular applications in immunological research as well as in diagnostics and therapy.

## 40. A context aware perspective of protein protein interaction networks

### Alexander Lan, Michal Ziv-Ukelson and Esti Yeger-Lotem

Abstract

A major challenge of network biology is to understand the phenotype related sub-networks of interacting genes, proteins, and small molecules that give rise to biological form and function. Current techniques focus mainly on assigning static scores to each edge and then performing computations in a one-size-fits-all fashion. We offer a novel perspective in which edges are context sensitive and their score may change dynamically. We implemented a search scheme based on this using a Markov model for edge weights using the gene ontology annotations as the context. Validation is performed using the known pathways derived from the SPIKE database. Finally our method is applied to uncover novel functional relations within the signaling cascades underlying influenza infection of human primary lung cells.

## 41. Novel families of toxin/immunity modules confer phage resistance in bacteria

### Azita Leavitt, Hila Sberro, Ruthie Kiro, Udi Qimron, Rotem Sorek

Abstract

To survive in the face of constant phage attacks, bacteria have developed a variety of anti-phage defense mechanisms. One of the documented defense strategies is abortive infection (Abi), where the infected bacterial cell commits "suicide", thus preventing phage spread and enabling colony survival. Several reports show that Abi can work via a pair of stable "toxin" and unstable "antitoxin" that prevents the toxic effect of the toxin but is degraded upon phage infection. To systematically discover new types of defense TA modules in an unbiased manner we analyzed the experimental insertion of 1.5 million genes from 388 microbial genomes into an E. coli host using over 8.5 million random clones. This analysis detected genes (toxins) that could only be cloned when the neighboring gene (antitoxin) was also present on the same clone. Clustering of these genes revealed 9 novel families of TA modules widespread in defense islands in bacterial genomes, most of which do not follow the classical characteristics of TA modules described to date. Introduction of the novel genes into E. coli validated that the toxic effect of the toxin is mitigated by the antitoxin. Infection experiments with an array of phage strains verified that two of the new modules can provide resistance against phage. Moreover, our experiments exposed an 'anti-Abi' protein in T7 that neutralizes bacterial suicide by inhibiting a protease essential for antitoxin degradation.

## 42. Efficient motif search in ranked lists and applications to variable gap motifs

### Limor Leibovich and Zohar Yakhini

Abstract

Sequence elements, at all levels-DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of sequences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs-two half sites with a flexible length gap in between-and searching long motifs over large alphabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refinement of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation.

## 43. Clk mRNA turnover de-noises circadian transcription and behavior

### Immanuel Lerner1, Osnat Bartok1, Uri Weisbein1, Shaked Afik1, Chen Gafni1, Nir Friedman1,2 and Sebastian Kadener1 1Silberman Institute of Life Sciences, The Hebrew University, Israel; 2Department of Computer Sciences, The Hebrew University, Israel .

Abstract

Most organisms use circadian clocks to keep temporal order and anticipate daily environmental changes, these timing devices are extraordinarily robust. In Drosophila, the master genes clock (CLK) and cycle (CYC) activate the circadian system by promoting rhythmic transcription of several key clock genes. We have previously shown the importance of the miRNA Bantam on Clk post-transcriptional control and subsequent circadian robustness, however the mechanistic role and significance of Clk post-transcriptional regulation is still poorly understood.
Here we demonstrate that Clk mRNA turnover is exceptionally high and that this is a key mechanism behind the robustness of the circadian system. We show here that while Clk is transcribed at high levels, mature Clk RNA molecules are quickly degraded in a 3' UTR-depending way. This regulation is key to buffer stochastic changes in transcription that could result in ectopic expression of clk in time and space.
In order to test the importance of this mechanism for normal timekeeping, we generated flies carrying a clk genomic construct in which the clk 3' UTR has been replaced with a control (SV40) 3' UTR. Indeed, introduction of an additional clk genomic rescue that has lost the post-transcriptional control (because of the replacement of the 3' UTR) results in abnormal circadian behavior. This temporally ectopic CLK-CYC transcription is also accompanied with ectopic expression of CLK and CLK-CYC targets in non-circadian cells in the fly brain. Interestingly, overall CLK levels are slightly increased in these transgenic flies, demonstrating that (contrary to previous reports) that Clk mRNA levels are key for circadian timekeeping. In addition, our experiments demonstrate a role of this regulation in avoiding clock-gene product expression widely in the fly brain. In sum, our works adds an additional and unexpected new layer of regulation that assures CLK-activity is restricted temporally and spatially in the fly brain.

## 44. Exploring the accuracy of small RNA quantification using microarrays and deep sequencing technologies

### Dena Leshkowitz*, Shirley Horn-Saban*, Yisrael Parmet, Ester Feldmesser*

Abstract

Small RNAs (sRNA) are known to play an important regulatory role in a vast range of organisms and biological processes. Several classes of sRNA have been identified, including microRNAs (miRNAs), small interfering RNAs (siRNAs) and Piwi-interacting RNAs (piRNAs). All sRNAs are characterized by a 5' phosphate and a 3' hydroxyl group. A previously-characterized modification, 2′-O-methyl at the 3′ termini can be found in miRNA, piRNAs and siRNAs of plants and additional organisms.

Methods used to identify and quantify sRNAs face unique challenges, due to their short length as well as the high sequence similarities between sRNAs. In this work our aim is to compare the currently available platforms for sRNA: Agilent and Affymetrix microarrays and Illumina NGS. We were interested in determining the ability of these technologies to quantify three categories of spiked sRNA: mature miRNAs, modified miRNAs (2-O-methyl at 3' terminus) and precursor miRNAs. They were introduced at three different concentrations to a placenta total RNA sample.

Our results reveal that in all tested platforms the power of detection depends on the spiked miRNA. The fold difference in detection of the different spiked miRNAs introduced at the same concentration is up to 5 fold in the Agilent platform , 10 fold in NGS and 500 fold in Affymetrix. NGS does not capture the absolute miRNA quantity and therefore like the microarray platform can be used only for relative abundance studies. All platforms have a reduced ability in detecting the 2′-O-methyl modified miRNAs. In all platforms the precursor miRNAs can be wrongly measured as mature miRNAs.

Analysis of the miRNAs detected on the three platforms reveals a stronger agreement in detection of mature miRNA intensities between the Agilent and NGS platforms, although all miRNAs are detected differently in at least two out of the three platforms. The Affymetrix platform seems to suffer from an over estimation of miRNAs rich in guanines and an under estimation of miRNAs rich in uracils.

## 45. Flavors of discovery: computational predictions of new agonists of the bitter taste receptor hTAS2R14

### Anat Levit, Ayana Wiener, Stefanie Nowak, Rafik Karaman, Maik Behrens, Wolfgang Meyerhof and Masha Y. Niv

Abstract

Bitter taste is a basic taste modality, required to guard animals against consuming toxic substances. Bitter compounds are recognized by bitter taste receptors (TAS2Rs), a family of G-protein coupled receptors (GPCRs). The human bitter taste receptor hTAS2R14 is a particularly broadly tuned receptor, with over 50 agonists known to date. Analysis of the physicochemical properties of these molecules in comparison with true negatives - i.e., molecules known not to activate hTAS2R14, provided hTAS2R14-characteristic ranges of chemical properties.
To identify additional potential agonists of this receptor, we compiled a pool of candidate molecules, consisting of the established bitter-tasting compounds from the BitterDB database, and other potentially bitter molecules, such as datasets of approved drugs, traditional Chinese medicines and natural compounds. This dataset of candidate molecules was filtered using the hTAS2R14-like properties ranges, resulting in a subspace of candidate molecules that may potentially activate hTAS2R14.
Next, ligand-based and structure-based pharmacophore models of hTAS2R14 activators were constructed and used to prioritize the candidate subset. Preliminary results using functional assays of hTAS2R14-transfected HEK-293 cells confirm that most of the predicted substances are indeed novel hTAS2R14 agonists.
This approach provides new directions in the identification and design of agonists and antagonists for bitter taste receptors, as biochemical tools for studying these receptors, and for improvement of food taste. Importantly, the recently discovered roles of bitter taste receptors in extraoral locations, such as the respiratory and gastrointestinal systems, provide novel paths of drug design for treatment of metabolic disorders and other indications.

## 46. A vast collection of microbial genes that are toxic to bacteria

### Aya Kimelman, Asaf Levy, Hila Sberro, Shahar Kidron, Azita Leavitt, and Rotem Sorek

Abstract

In the process of clone-based genome sequencing, initial assemblies frequently contain cloning gaps that can be resolved using cloning-independent methods, but the reason for their occurrence is largely unknown. By analyzing 9,328,693 sequencing clones from 393 microbial genomes we systematically mapped more than 15,000 genes residing in cloning gaps and experimentally showed that their expression products are toxic to the Escherichia coli host. A subset of these toxic sequences was further evaluated through a series of functional assays exploring the mechanisms of their toxicity. Among these genes our assays revealed novel toxins and restriction enzymes, and new classes of small non-coding toxic RNAs that reproducibly inhibit E. coli growth. Further analyses also revealed abundant, short toxic DNA fragments that were predicted to suppress E. coli growth by interacting with the replication initiator dnaA. Our results show that cloning gaps, once considered the result of technical problems, actually serve as a rich source for the discovery of biotechnologically valuable functions, and suggest new modes of antimicrobial interventions.

## 47. A Phylogenetic Approach to Music Performance Analysis

### Elad Liebman, Eitan Ornoy and Benny Chor

Abstract

This paper presents a novel algorithmic approach to music performance analysis. Previous attempts to use algorithmic tools in this field focused typically on tempo and dynamics alone. We base our analysis on ten different performance categories (such as bowing, vibrato and durations). We adapt phylogenetic analysis tools to resolve the inherent inconsistencies between these categories, and describe the relationships between performances. Taking samples from 29 different performances of two pieces from Bach's sonatas for solo violin, we construct a "phylogenetic" tree, representing the relationship between those performances. The tree supports several interesting relations previously conjectured by the musicology community, such as the importance of date of birth and recording period in determining interpretative style. Our work also highlights some unexpected inter-connections between performers, and challenges previous assumptions regarding the significance of educational background and affiliation to the historically informed performance (HIP) style.

## 48. Speed Controls in Protein Translation: The Secretory Proteome

### Shelly Mahlab, Michal Linial

Abstract

The ribosomes execute proteins translation in all domains of life. Translation is energetically expensive operation in dividing cells. About a third of the proteins are translated by ribosomes that are docked at the ER membranes. Thee proteins include the membranous and secretory proteins. A common feature of all secretory proteins is their signal peptide (SP) at the N-terminus region, or a transmembrane (TMD) domain. The tRNA Adaptation Index (tAI) is a measure that allows the evaluation of the translation elongation efficiency. This measure considers the abundance of the relevant tRNA and the codon-anticodons wobble interaction rules. Specifically, low and high tAI values indicate lower and higher translation rates, respectively. A nonuniform tAI values along the transcript implies pausing of the ribosome on certain codons, which affect the overall speed of translation. Previous studies showed that a lower tAI values at the beginning of the coding sequence were common to many organisms. Such tAI profile was the basis for a "ramp" model. Accordingly, the ramp contributes to the translation efficiency, by preventing ribosomal drop-off and collisions.
In this research, we clustered proteins coding sequences into distinct groups of secretory, membranous and cytosolic proteins and analyzed the tAI profile for each group. We found that proteins with a ramp at the N-terminal signify proteins having SPs. Furthermore, on average these proteins have higher global tAI and are shorter in length. In contrast, membranous and cytosolic proteins (lacking SPs) have no evidence for a `ramp'. We conclude that the tAI profile is a reflection of an evolutional refinement of the secreted proteins whose translation must be tightly controlled. The secreted proteins are tuned for maximal translation efficiency. This is achieved by a high global tAI and the presence of an initial ramp. Accordingly, the translation rate of the initial segment is attenuated, allowing a ribosome spacing to minimize translation drop-off. The reported trends for the secretory proteomes applied for a large number of eukaryotes.

## 49. Using Contiguous Bi-Clustering for data driven temporal analysis of fMRIbased functional connectivity

### Adi Maron-Katz , Didi Amar, Eti Ben-Simon , Yael Jacob ,Keren Rosenberg , Richard M. Karp , Talma Hendler, Ron Shamir

Abstract

Background: Functional connectivity is a commonly used approach in human brain imaging for revealing inter- and intra-regional relationships under various conditions and tasks as well as at rest. Since 1990 several approaches to functional connectivity analysis have been proposed, some hypothesis-driven and others data driven. Data-driven approaches have included factor analysis methods such as ICA and PCA as well as clustering methods such as hierarchical clustering, K-means and FCA (Fuzzy clustering analysis). All these methods share a common basic assumption that the connectivity-maps/networks in the brain are stationary and thus seek functional connectivity over all time points. Based on recent findings, we find this assumption is too restrictive, and postulate that network's dynamics can be revealed by considering subsets of the measured time points. In addition, most of the data-driven methods partition the segregated voxel populations (i.e. regions) into disjoint sets, while in fact one region may play a role in more than one network.
Method: Here we present ConBic (Contiguous Bi-Clustering) - a data driven novel computational method for detecting dynamism in functional brain networks using fMRI data. Unlike previous data-driven approaches, this method identifies changes over time in inter- and intra-regional functional connectivity. Moreover, ConBic does not impose a disjoint partition on the voxels/regions, but rather allows networks to overlap, so that one region may be involved in more than one functional network under different time points or conditions. The method models voxels by an undirected graph whose edges represent spatial proximity. First, on each time window, seed nodes are selected and a search procedure from each seed identifies homogenous regions. Various region evaluation measurements are used, including clustering coefficient, region homogeneity and anatomic atlas annotation. In the next stage bi-clustering approaches are used in order to reconcile and group time periods in which spatially similar regions were detected, as well as regions that share a similar signal in a single time period.
Results: Preliminary tests were performed on four motor fMRI data sets obtained from healthy young adults during moving right and left foot alternating for 3 minutes. These tests have shown the ability of the method to identify de-novo the corresponding primary motor regions as well as correlated secondary regions involved in motor cognition. Intra-regional connectivity (i.e. homogeneity) in locations that are known to be related to leg movements was higher when performed the corresponding tasks than during rest periods. In addition, though right and left hemispheres were analyzed separately, there was bilateral symmetry in many of the identified regions in both cortical and sub-cortical locations. Further tests and validations are currently performed on motor datasets of moving left and right hand as well as on resting state data.
Conclusion: ConBic is a promising new data-driven computational method for detecting highly connected, context-dependent or spontaneous functional networks in the brain. Importantly, it can effectively reveal intra- and inter-regional functional connections as well as temporal changes in these connections.

## 50. Recently-formed polyploid plants diversify at lower rates

### Itay Mayrose, Shing H. Zhan, Carl J. Rothfels, Karen Magnuson-Ford, Michael S. Barker, Loren H. Rieseberg, Sarah P. Otto

Abstract

Polyploidy is widely recognized as a key feature of extant organismal diversity, especially among plants, yet its macro-evolutionary impacts are deeply contentious. Traditionally polyploidy has been considered "evolutionary noise", adding little to evolutionary novelty, whereas a surge of interest over the past decade - fueled largely by the unprecedented availability of genomic data - paints the opposite picture, that of polyploidy as a primary driver of macro-evolutionary success. Here, we examined the relatively short-term consequences of polyploidy across a diverse set of vascular plants phylogenies, encompassing hundreds of inferred neopolyploidization events. Using likelihood-based methodologies we found that neopolyploids generally exhibit lower diversification rates than diploids, due to both lower speciation rates and higher extinction rates. Our results suggest that the increased speciation rates of diploids may be ascribed to their greater capability to further speciate via polyploidy compared to neopolyploids. The small subset of polyploids that do persist perhaps represent particularly "fit" lineages that may enjoy longer term evolutionary success.

## 51. RNA Tree Comparisons Via Unrooted Unordered Alignments

### Nimrod Milo and Shay Zakov and Erez Katzenelson and Eitan Bachmat and Yefim Dinitz and Michal Ziv-Ukelson

Abstract

RNA secondary structures are often represented as trees, motivating the application of tree comparison tools to the detection of structural similarities and common motifs. We generalize some current approaches for RNA tree alignment, which are traditionally confined to {\\em ordered rooted} mappings, to also consider {\\em unordered unrooted} mappings. This is motivated by the drive to extend RNA structural comparisons to accommodate additional possible evolutionary phenomena (e.g. segment insertions, translocations and reversals) as well as other new variations in structural modifications which were not considered before.
The problem of unrooted unordered tree edit distance is known to be MAX-SNP hard. Hence, we focus here on a constrained tree comparison variant, named the \\emph{Homeomorphic Subtree Alignment} problem. We present a new algorithm, which applies to several modes, including global or local alignment of trees and supporting both ordered and unordered, rooted and unrooted mappings. It generalizes previous algorithms that either solved the problem in an asymmetric manner, or were restricted to the rooted and/or ordered cases. Focusing here on the most general unrooted unordered case, we show that our algorithm for this case maintains the cubic time complexity of previous, less general algorithms for the problem. Specifically, its time bound is $O(n_T n_S + L_T L_S\\min(d_T, d_S))$, where $T$ and $S$ are the input trees, $n_T$, $L_T$ and $d_T$ are the number of nodes, the number of leaves and the maximal degree of a node in $T$, respectively, and $n_S$, $L_S$, and $d_S$ are defined similarly with respect to $S$.

We implemented the algorithm (in all modes mentioned above) as a graphical software tool which computes and displays similarities between secondary structures of RNA given as input, and employed it to a preliminary experiment in which we ran all-against-all pairwise alignments of RNAse P RNA family members, exposing new similarities which could not be detected by the traditional rooted ordered alignment approaches.

The web-interface for our tool can be found in \\url{http://www.cs.bgu.ac.il/~negevcb/FRUUT}.
Source code and data sets are available through our website.

## 52. mtDNA heteroplasmy can be detected Next-Generation Sequencing data without pre-amplification

### Tal Nagar, Eitan Rubin

Abstract

Heteroplasmy of mitochondrial DNA (mtDNA) is a phenomenon in which sequences vary within and between cells of the same individual. Recent technological advances in sequencing, collectively known as "next-generation sequencing" (NGS), have opened new avenues for the characterization of mtDNA heteroplasmy. A possible barrier to using NGS with total DNA for heteroplasmy detection is the presence of nuclear insertions of mitochondrial fragments (NUMTs), which might be confused with mtDNA sequences. To avoid NUMTs, mtDNA sequencing traditionally begins with some method of obtaining pure organellar DNA (e.g. PCR amplification). Here we show, using simulations and data from the 1000 genomes project that NUMTs have little effect on heteroplasmy detection in mtDNA without specific purification. Using simulated reads of the Illumina technology, we examine the mapping of reads originating from NUMTs and their association with false heteroplasmy. We show that if sufficient care is used in mapping and filtering the resulting reads, no false heteroplasmy is detected (i.e. no false positives). Only very few nuclear reads incorrectly mapping to the mtDNA (0.04%), of which all but one were the result of sequencing errors making the NUMT-originating read more similar to the mtDNA. We also show that heteroplasmy detected in data taken from the 1000 genomes project does occur preferentially in mtDNA positions that align to NUMTs. We conclude that NUMTs contamination in NGS experiments does not contribute to false heteroplasmy detection when proper filtering is applied, even in the absence of specific pre-amplification of the template.

## 53. A Mathematical Model of 6S RNA Regulation of Gene Expression

### Mor Nitzan, Karen Wassarman, Ofer Biham, and Hanah Margalit

Abstract

E. coli's 6S RNA is a type of noncoding, small RNA that is ubiquitously expressed in the cell and is the key component in a unique global RNA polymerase (RNAP) - mediated regulation mechanism. 6S RNA was shown to differentially inhibit sigma-70 dependent promoters during stationary phase by binding and forming a stable complex with the housekeeping form of RNAP, blocking the ability of RNAP to bind to promoter DNA. Surprisingly, when stationary phase cells are exposed to high enough levels of nucleotide-triphosphate (NTP), they enter outgrowth phase at which time 6S RNA is used as a template for product RNA (pRNA) synthesis. 6S RNA interactions with RNAP are destabilized during the pRNA synthesis reaction, leading to the dissociation of the 6S RNA-RNAP complexes. The released 6S RNA becomes highly unstable and the released RNAP enables increased transcription of genes [1-3]. Many of the dynamic properties and the unexpected promoter specificity which characterize this regulation mechanism are still unclear.
Using a mathematical model of this biological system we study the dynamics of the system components and specifically mRNAs transcribed by sigma-70 dependent promoters (during exponential phase, stationary phase and outgrowth) [4]. We find that this global regulation mechanism exhibits unique properties; RNAP level returns to steady state subsequent to its inhibition, and stored inactive RNAPs bound by 6S RNA accumulate over late stationary phase and can return to their active form rapidly upon the introduction of newly available nutrients. Interestingly, although 6S RNA inhibits the general transcription machinery, genes with sigma-70 dependent promoters exhibit variable sensitivities to this regulation. We demonstrate that the specific regulation of genes by 6S RNA depends on their inherent effective promoter parameters- affinity to RNAP and clearance rate. We also compare 6S RNA regulation to other global RNAP-mediated regulation mechanisms and deduce several of its key properties, including its energetic efficiency, its robustness to noise, and the competitive edge of cells carrying it at the transition to a new environment.

## 54. The H3K27 demethylase Utx facilitates epigenetic reprogramming to pluripotency

### Ohad Gafni*, Abed AlFatah Mansour*, Leehee Weinberger, Muneef Ayyash, Asaf Zviran, Yoach Rais, Vladislav Krupalnik, Mirie Zerbib, Daniela Amann-Zalcenstein, Itay Maza, Shay Geula, Sergey Viukov, Liad Holtzman, Eli Canaani, Shirley Horn-Saban, Ido Amit, Noa Novershtern*# and Jacob H. Hanna#

Abstract

Pluripotent stem cells can be generated from somatic cells by the induction of 4 factors: Oct4, Sox2, Klf4 and c-Myc. This reprogramming process is accompanied by genome wide epigenetic changes, however their regulation is largely unknown. Here we show that an H3K27me3 demethylase named Utx is a critical regulator of reprogramming in mouse and human. Although embryonic stem cells lacking Utx retain their pluripotency, Utx-KO somatic cells fail to reprogram. Using gene expression analysis we show that these cells stay in their somatic state and do not initiate the transcriptional program required for pluripotency. Using ChIP-Seq data we demonstrate that Utx depletion results in aberrant H3K27me3 repressive chromatin demethylation dynamics in somatic cells undergoing reprogramming, what subsequently disrupt the reactivation of key pluripotency promoting genes. Remarkably, Utx-KO also leads to aberrant epigenetic "in-vivo reprogramming" during germ cell maturation in mouse chimeras.

## 55. Bacterial gene expression in response to potato tuber soft rot

### Shany Ofaim, Elazar Fallik and Shlomo Sela

Abstract

Bacterial gene expression in response to potato tuber soft rot
S. Ofaim(1), E. Fallik(2) and S. Sela(1)

(1)Department of Food Quality and Safety, Institute for Postharvest and Food Sciences, Agricultural Research Organization (ARO), The Volcani Center, Beth-Dagan, Israel
(2) Department of Postharvest Science of Fresh Produce , Institute for Postharvest and Food Sciences, Agricultural Research Organization (ARO), The Volcani Center, Beth-Dagan, Israel

The potato (Solanum tubersum) is a staple food and one of the important agricultural crops worldwide. Potato Soft rot, caused by Pectobacterium carotovorum is one of the gravest diseases progressing during storage. Symptoms of soft rot include liquefaction of tuber tissue, due to pectolytic enzymes released by the bacteria, which causes rapid dispersion of bacteria to other tubers in its surroundings, leading to fast destruction of whole batches. Soft rotten tubers release volatile organic compounds (VOCs) into the storage atmosphere. Since bacteria react to chemical changes in their environment by differential gene expression, and Since Salmonella enteric is likely to be present in the vicinity of potato tubers, we hypothesized that bacteria exposed to soft rot VOCs will express distinct set of genes and may be used as a future biosensor for early detection of potato soft rot during storage.
Exposure of Salmonella to soft rotted potato VOCs resulted in the activation of the rpoS regulon from which the expression of three genes was confirmed by qPCR. The rpoS gene encodes a sigma factor that controls the expression of numerous genes involved in stationary phase and in response to different stress conditions. The two other genes encode a ligand binding component of an ABC transporter, and a gene with unknown activity. Both are affiliated with the rpoS regulon. The latter has shown significant overexpression levels and therefore its promoter is a good candidate for serving as a biosensor for detecting potato soft-rot. Bioinformatic analysis of the transcription profile in response to potato VOCs indicates the presence of cellular stress response. Further analysis, suggests the presence of osmotic stress, apparently, as a result of a significant rise in solute content in the bacterial culture due to soluble VOCs originating from the soft rotten tuber.

## 56. Personalized olfactory receptor repertoire

### Tsviya Olender, Sebastian Waszak, Edna Ben-Asher, Miriam Khen and Doron Lancet

Abstract

The vastly growing information on DNA diversity along completely sequenced human genomes makes it possible to reassess the diversity status of distinct olfactory receptor (OR) proteins in different human individuals. Our analysis includes 413 genetic loci at which one or more sequenced individual had an intact open reading frame. Based on phased genotype SNP call obtained from the 1000 genome project, we identified 4069 different full-length polypeptide variants, providing a lower limit for the effective OR protein species repertoire of Homo sapiens. Each individual harbors 528±23 allelic variants, up to 50% higher than the number of intact loci, with pronounced ethnic differences. Importantly, olfactory sensory neurons show allele-specific expression, hence the brain receives readout of all such allele types, with implication to inter-individual smell perception diversity. Using 7 public and private variation sources, we identified 239 segregating pseudogenes (SPGs), OR loci that show both intact and a frame-disrupted pseudogene allele in the population. Thus, ~60% of all intact loci are now shown to be subject to segregating deleterious mutations, a 9 fold enhancement compared to past estimates (Menashe, Nat Genet 2003). Twenty-six of the SPGs have so far been annotated as reference genome pseudogenes, hence may be considered "resurrected". Using a custom SNP microarray we validated 184 out of 268 of the frame-disrupting mutations in a cohort of 458 individuals. Finally, we also generated a multi-source compendium of nearly 63 OR loci harboring deletion CNVs, suggesting that 258 OR loci (30.3%) are affected by deleterious SNPs/small indels and/or large deletion CNVs . We estimate that every individual genome contains 35 disrupted variations, 11 in homozygote form. Our results portray a case of unusually high genetic diversity, and suggest that individual humans have a highly personalized olfactory receptor repertoire, a conclusion that likely applies to other multigene receptor families.

## 57. RAP: accurate prediction of cis-regulatory motifs from protein binding microarrays

### Yaron Orenstein, Eran Mick, Ron Shamir

Abstract

The new technology of protein binding microarrays (PBMs) (Berger et al. Nat. Biotech. 06) allows to simultaneously measure the binding of a spe-cific transcription factor to tens of thousands of synthetic double-stranded DNA probes. Established compact probe designs cover all possible DNA 10-mers. The computational challenge is to infer the binding motif, usually presented as a positional weight matrix (PWM), from PBM data.
We have developed a simple algorithm to discover a transcription factor binding site from PBM data. Our algorithm first ranks all 8-mers accord-ing to the average binding intensities of the probes they appear in. Then, it aligns the 500 top 8-mers using star alignment to the top ranking 8-mer. The alignment is used to build a weighted PWM according to 8-mer scores. Lastly, it extends the PWM by looking back at the original probes the 8-mers appeared in. We call the resulting algorithm RAP: Rank, Align, PWM. We tested our algorithm intensively on hundreds of PBM datasets and compared it to four extant algorithms. We compared the motifs produced by each method to a gold standard of experimentally validated PWMs taken from three databases. RAP was one of two top performers on all databases. In addition, we tested how well ranking of top-binding probes is predicted on one array using the binding site motif learned from data of another array for the same TF. In this test, RAP was again one of two algorithms that performed best. Notably, RAP is faster than the four other tools by up to three orders of magnitude.
To conclude, RAP produces motifs that are close to experimentally vali-dated ones, and accurate in predicting probes ranking, achieving top performance in both measures. Our study provides insights on the strengths and weaknesses of each method, which can lead to even bet-ter methods in the future.
This study was funded by the European Community's Seventh Framework Programme under grant agreement no. HEALTH-F4-2009-223575 for the TRIREME project and by the Israel Science Founda-tion (grant 802/08). YO was supported in part by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel Aviv University.

## 58. Discovering molecular signals using hidden semi-Markov models

### Michael Peeri, David Burstein, Gil Segal, Tal Pupko

Abstract

Hidden Semi-Markov Models (HSMMs) are a generalization of Hidden Markov Models (HMMs), in which the time spent in each state is not geometrically distributed. Like standard HMMs, HSMMs can be used to extract a motif from a set of sequences (e.g., a set of proteins known to be translocated to an organelle, a set of promoter regions or a set of proteins secreted by a bacterium). Unlike the well-known profile HMMs that are optimized to model perfectly aligned sequences, HSMMs can better capture diffuse features hidden in sequences that lack clear sequence similarity and therefore may not be alignable. In this work, we implemented an efficient HSMM model to extract motifs from a set of proteins. In many cases, the motif is believed to reside in either the N-terminal or the C-terminal region of the protein, and our method allows us to detect the length of the region in which the motif exists. We also detect the number of states required to model the motif, based on cross-validation techniques.

We applied our method to study effector proteins in Legionella pneumophila. This bacterium is an intra-cellular parasite that relies on secretion of effector proteins to survive in the cellular environment. Effector proteins are secreted using the Icm/Dot type-IV secretion system, and are crucial for virulence. In a previous work we identified several novel effectors and there are currently over 100 known effectors, but the nature of the secretion signal is not well characterized. Using the HSMM approach we created a model that describes the salient features of all known effectors. We show that HSMMs outperform signal characterization by HMMs. We also show that the identification of an accurate signal can be used to search for new effectors. Bioinformatic analysis of the signal can also provide insight into the secretion mechanism.

## 59. FastML: a web server for probabilistic reconstruction of ancestral sequences

### Osnat Penn, Haim Ashkenazy, Adi Doron-Faigenboim, Ofir Cohen, Gina Cannarozzi, Oren Zomer, and Tal Pupko

Abstract

Ancestral sequence reconstruction is essential in a variety of studies such as the experimental resurrection of ancient proteins, the mapping of evolutionary events onto a phylogenetic tree, and protein engineering. Here, we present the FastML web-server, a user-friendly tool for the reconstruction of ancestral sequences with emphasis on an accurate reconstruction of both indels and characters. FastML implements various novel features that differentiates it from existing tools: (i) FastML uses an indel-coding method, in which each gap, possibly spanning multiples sites, is coded as binary data. FastML then reconstructs ancestral indel states using a continuous time Markov process. FastML provides the most likely ancestral sequences, integrating both indels and characters; (ii) FastML accounts for uncertainty in ancestral states: it provides not only the posterior probabilities for each character and indel at each sequence position, but also a sample of ancestral sequences from this posterior distribution, and a list of the k-most likely ancestral sequences; (iii) FastML implements a large array of evolutionary models, which makes it generic and applicable for nucleotide, protein, and codon sequences; (iv) A graphical representation of the results is provided, including the projection of the ancestral sequences onto the phylogeny, and a graphical logo of the inferred ancestral sequences. FastML is available at http://fastml.tau.ac.il.

## 60. Malacards: the integrated Human Malady Compendium

### Noa Rappaport, Noam Nativ, Michal Twik, Gil Stelzer, Frida Belinky, Tsippi Iny Stein, Iris Bahir, Marilyn Safran and Doron Lancet

Abstract

Comprehensive disease classification, integration and annotation are sorely needed for biomedical discovery, but at present, disease compilation is incomplete, heterogeneous and often lacking systematic inquiry mechanisms.
We introduce MalaCards, an integrated database of human maladies and their annotations (malacards.weizmann.ac.il), modeled on the architecture and richness of the popular GeneCards database of human genes, (www.genecards.org). MalaCards mines and merges varied web data sources to generate a computerized web card for each human disease. Each MalaCard contains disease specific prioritized annotative information, as well as links between associated diseases, leveraging the GeneCards relational database, search engine, and GeneDecks set-distillation tool. As proofs of concept of the search/distill/infer pipeline we find expected elucidations, as well as potentially novel ones. As our R&D continues, we plan to expand the list of annotations, sources and sections, and include genetic variation details. This will be enhanced by collaborations with researchers outside of our group, and expanded by the initiation of systems biology tools for batch queries and smart clustering, towards the goal of enabling biomedical discoveries.

## 61. Functional Inference by ProtoNet Family Tree: The Uncharacterized proteome of Daphnia pulex

Abstract

Daphnia pulex (Water flea) is the first fully sequenced crustacean genome. The crustaceans and insects have diverged from a common ancestor. It is a model organism for studying the molecular makeup for coping with the environmental challenges. In the complete proteome, there are 30,550 putative proteins. However, about 10,000 of them have no known homologues. Currently, the UniProtoKB reports on 95% of the Daphnia's proteins as putative and uncharacterized proteins.
We have applied ProtoNet, an unsupervised hierarchical protein clustering method that covers about 10 million sequences, for automatic annotation of the Daphnia's proteome. 98.7% (26,625) of the Daphnia full-length proteins were successfully mapped to 13,880 ProtoNet stable clusters, and only 1.3% remained unmapped. We compared the properties of the Daphnia's protein families with those of the mouse and the fruitfly proteomes. Functional annotations were successfully assigned for 86% of the proteins. Most proteins (61%) were mapped to only 2953 clusters that contain Daphnia's duplicated genes. We focused on the functionality of maximally amplified paralogs. Cuticle structure components and a variety of ion channels protein families were associated with a maximal level of gene amplification. We focused on gene amplification as a leading strategy of the Daphnia in coping with environmental toxicity.
Automatic inference is achieved through mapping of sequences to the protein family tree of ProtoNet 6.0. Applying a careful inference protocol resulted in functional assignments for over 86% of the complete proteome. We conclude that the scaffold of ProtoNet can be used as an alignment-free protocol for large-scale annotation task of uncharacterized proteomes.

## 62. Context-dependent evolution of protein domains

### Dan Reshef, Liran Carmel and Ora Schueler-Furman

Abstract

Domains are the functional building blocks of proteins. Here, we study domain sequence conservation is affected by neighboring domains in a protein. Analyses of multiple sequence alignments as well as phylogenetic trees reveal a higher conservation of a domain when it is found with other domains in the same protein. Structural analysis suggests constrained evolution of a domain at the interface with other subunits, as well as in the protein core.
Therefore, protein domains, when found together, act as a larger functional unit which imposes more evolutionary pressure than on the singleton domain protein.

## 63. Inferring the Efficiency of tRNA-Codon interaction based on Codon Usage Bias

### Renana Sabi and Tamir Tuller

Abstract

The tRNA adaptation index is a widely used measure of the efficiency by which a coding sequence is recognized by the intra-cellular tRNA pool. This index includes among others different weights for different wobble interactions. Currently these weights are based on the gene expression in S. cerevisiae.
In this study we suggest a new approach for adjusting these weights to a target model organism without the need of any information about its gene expression levels. Our method is based on optimizing the correlation between the tAI and a measure of codon bias. We demonstrate that the new weights tend to predict protein abundance better than the current tAI weights.

## 64. Membrane Protein (Pseudo-) Energetics

### Chaim A. Schramm., Jason E. Donald, Brett T. Hannigan, Jeffery G. Saven, William F. DeGrado, Ilan Samish

Abstract

Structure. 2012 May 9;20(5):924-35.
The complex hydrophobic and hydrophilic milieus of membrane-associated proteins pose experimental and theoretical challenges to their understanding. Here, we produce a nonredundant database to compute knowledge-based asymmetric cross-membrane potentials from the per-residue distributions of C(β), C(γ) and functional group atoms. We predict transmembrane and peripherally associated regions from genomic sequence and position peptides and protein structures relative to the bilayer (available at http://www.degradolab.org/ez). The pseudo-energy topological landscapes underscore positional stability and functional mechanisms demonstrated here for antimicrobial peptides, transmembrane proteins, and viral fusion proteins. Moreover, experimental effects of point mutations on the relative ratio changes of dual-topology proteins are quantitatively reproduced. The functional group potential and the membrane-exposed residues display the largest energetic changes enabling to detect native-like structures from decoys. Hence, focusing on the uniqueness of membrane-associated proteins and peptides, we quantitatively parameterize their cross-membrane propensity, thus facilitating structural refinement, characterization, prediction, and design.

## 66. De-novo assembly and Characterization of the Transcriptome of Metschnikowia fructicola reveals differences in gene expression following interaction with Penicillium digitatum and grapefruit peel

### Noa Sela, Vera Hershkovitz, Ginat Rafael, Clarita BenDayan, Leena Taha, Michael Wisniewski, Samir Droby

Abstract

The yeast biocontrol agent, Metschnikowia fructicola was reported to stimulate oxidative burst in citrus and apple wounds characterized by increased levels of reactive oxygen spices (ROS) [1]. The increase in ROS levels following yeast application is believed to be involved in development of resistance in fruit tissue against infection and development of pathogens [2]. The metabolism of ROS is controlled by an array of enzymes, among which superoxide dismutase (SOD), catalase (CAT) and peroxidase (POD) are the most studied during plant-pathogen interactions. The present study was aimed at investigating the effect of the antagonistic yeast M. fructicola gene expression related to the interaction with grapefruit peel and Penicillium digitatum using trasncriptomic RNA-seq data. Here we report a de-novo trasncriptome assembly and characterization of the yeast Metschnikowia fructicola as well as digital gene expression analysis exploring the interaction of the yeast in a variant multi species environment.

## 67. Evidence for Allosteric Efects in Antivodies

### Inbal Sela-Culang, Shahar Alon and Yanay Ofran

Abstract

To study structural changes that occur in antibodies upon antigen binding we systematically compared free and bound structures of all antibodies that were solved in these two forms. Antigen binding is associated with changes in the relative orientation of the heavy and light chains in both the variable and constant domains. However, we found a significantly higher change in the elbow angle (i.e. between the variable and the constant domains). Moreover, this change is significantly larger for binding of big antigens than it is for small antigens. These changes may mediate the allosteric effects between the Fab variable and constant domains proposed recently. Consistent with previous reports, Complementary Determining Region H3 (CDR-H3) is the only element in the binding site that shows significant conformational changes. However, such changes occur in only one-third of the antibodies. The most consistent and substantial conformational changes occur in a loop in the heavy chain constant domain, distant from the antigen binding site. This loop is implicated in the interaction between the heavy and light chains, is enriched in somatic hypermutations, and is often intrinsically disordered. These characteristics may imply a potential role for this loop in antibody function. Our findings shed light on the structural mechanisms of antigen recognition and binding, and may improve antibody engineering and modelling.

## 68. Metabolic profiling of fasting and feeding reveals human variability in energy metabolism

### Oded Shaham

Abstract

Fasting and feeding challenge homeostatic mechanisms of the human body. During fasting, protein and fat are degraded to provide fuel that sustains the body's energetic needs. Upon feeding, insulin suppresses these catabolic processes. Recent advances in metabolomic technologies enable simultaneous measurement of a diverse metabolite collection in a biological sample, providing rich metabolic profiles. We have previously shown that metabolic profiling of oral glucose challenge captures insulin's suppression of multiple catabolic processes, and that pre-diabetics differ in their insulin resistance across these processes. Here we demonstrate that prolonged fasting in a homogeneous group of healthy individuals uncovers large variability in markers of catabolic processes. This 'catabolic variability' among individuals could be linked with susceptibility to metabolic disease, and could help explain the physiologic significance of genetic factors associated with diabetes.

## 69. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters

### Eilon Sharon, Yael Kalma, Ayala Sharp, Tali Raveh-Sadka, Michal Levo, Danny Zeevi, Leeat Keren, Zohar Yakhini, Adina Weinberger & Eran Segal

Abstract

Despite extensive research, our understanding of the rules according to which cis-regulatory sequences are converted into gene expression is limited. We devised a method for obtaining parallel, highly accurate gene expression measurements from thousands of designed promoters and applied it to measure the effect of systematic changes in the location, number, orientation, affinity and organization of transcription-factor binding sites and nucleosome-disfavoring sequences. Our analyses reveal a clear relationship between expression and binding-site multiplicity, as well as dependencies of expression on the distance between transcription-factor binding sites and gene starts which are transcription-factor specific, including a striking ~10-bp periodic relationship between gene expression and binding-site location. We show how this approach can measure transcription-factor sequence specificities and the sensitivity of transcription-factor sites to the surrounding sequence context, and compare the activity of 75 yeast transcription factors. Our method can be used to study both cis and trans effects of genotype on transcriptional, post-transcriptional and translational control.

## 70. Mapping the structure of the T cell receptor repertoire using quantitative high throughput sequencing: repertoire biases and public clones

### Hilah Gal, Wilfred Ndifon, Eric Shifrut, Nir Friedman

Abstract

T cells play a fundamental role in cell-mediated adaptive immunity. A highly diverse repertoire of TCRs is generated through a random process of V-D-J gene rearrangement, which is required for recognition of a vast array of potential antigens. T-cell repertoire diversity is not yet fully understood and its characterization denotes an enormous challenge.
We developed a PCR based quantitative method for TCR sequencing, using the Illumina sequencing technology, and dedicated bioinformatics analysis tools for characterization of repertoire signatures. We sequenced the TCRβ region of CD4+ splenic T cells, derived from seven individual C57BL/6 mice, obtaining ~10^7 sequence reads of which ~10^6 are unique. Moreover, we obtained information about CDR3 length distributions, statistics of nucleotide insertions/deletions and V/J gene usage. Interestingly, all these characteristics are highly similar among individual mice, reflecting a biased repertoire with a well-defined structure. We find that the TCR repertoire, despite its random and highly diverse nature, contains a surprisingly large number of public sequences that are shared among individuals. We examine potential effects of biases in the repertoire on TCR sequence sharing among individuals, we apply a probabilistic model to link repertoire bias to patterns of TCR sequence sharing and show that there is a threshold probability above which a sequence is more likely to be shared among all individuals in a group than by any particular subgroup.
TCR characterization can shed new light on the molecular processes responsible for creating variability and provide a new view on T cell immunity under certain pathological conditions such as autoimmunity, cancer or infections

## 71. Integrative analysis of the yeast phosphorylation network reveals a hierarchy of kinases dominated by a thin layer of phosphatases

### Ilan Smoly, Esti Yeger-Lotem

Abstract

The relationships between kinases, phosphatases and their substrates underlie cellular response to various signals. Due to their prime regulatory roles, several efforts were made in recent years to map these relationships and illuminate the organization of signaling networks. Here we focus on large-scale efforts to map the phosphorylation network of budding yeast. We find that different experimental techniques detected complementary relationships with low yet statistically significant overlap between them. Integrating these relationships we identify simple and composite regulatory network motifs stressing the prevalence of co-regulation, feed-forward and feedback regulation in signaling pathways. Exploiting recent phospho-proteomic data we show that yeast kinases are organized in a hierarchical manner into three layers. These layers differ from each other in the severity of the mutant phenotypes, their buffering capacities and frequency of phosphorylation sites. Intriguingly, we find that a relatively small group of phosphatases governs many of the interactions between these kinases. This hierarchical organization should be considered upon searching for candidate modulators of cellular response to signals.

## 72. MotifNet. A web-interface for network motif analysis

### Ilan Smoly, Guy Wald, Esti Yeger-Lotem

Abstract

Network motifs are small topological patterns that recur in a network significantly more often than in randomized networks. Network motif analysis proved to be a powerful technique for uncovering network building blocks and for illuminating design principles underlying complex biological networks. The widely-used FANMOD tool detects network motifs in integrated colored networks where both nodes and edges may be colored. Here, we present MotifNet, a user-friendly web interface that presents the output of FANMOD in a graphical and searchable manner. In particular, MotifNet can be used to identify network motifs involving a specific set of proteins, and to distinguish between local motifs, where motif occurrences tend to involve a specific protein or protein pair, and disperse motifs, which occur independently in various parts of the network. MotifNet does not require download or logging-in.

## 73. Global regulation of alternative splicing by adenosine deaminase acting on RNA (ADAR)

### Oz Solomon, Michal Safran, Naamit Deshet Unger, Pinchas Akiva, Jasmine Jacob- Hirsch, Karen Cesarkas, Reut Kabesa, Ninette Amariglio, Ron Unger, Gideon Rechavi, Eran Eyal

Abstract

Background
Alternative mRNA splicing is a major mechanism for gene regulation and transcriptome
diversity. Despite the extent of the phenomenon, the regulation and specificity of the splicing
machinery are only partially understood. Adenosine to inosine (A-to-I) RNA editing of pre-
regulation in several cases. In the current study, we used bioinformatics approaches and RNA-seq
and exon-specific microarray of ADAR knockdown cells, to globally examine how ADAR and its
A-to-I RNA editing activity influence alternative mRNA splicing.
Results
While A-to-I RNA editing only rarely targets canonical splicing acceptor, donor and branch sites,
it was found to affect splicing regulatory elements (SREs) within exons. Cassette exons were
found to be significantly enriched with A-to-I RNA editing sites compared to constitutive exons.
RNA-seq and exon-specific microarray revealed that ADAR knockdown in hepatocarcinoma and
myelogenous leukemia cell lines leads to global changes in gene expression, with hundreds of
genes changing their splicing patterns in both cell lines. This global change in splicing pattern
cannot be explained by putative editing sites alone. Genes that showed significant changes in
their splicing pattern are frequently involved in RNA processing and splicing activity. Analysis of
recently published RNA-seq data from glioblastoma cell lines showed similar results.
Conclusions
Our global analysis reveals that ADAR plays a major role in splicing regulation. Although direct
editing of the splicing motifs does occur, we suggest this is not likely to be the primary
mechanism for ADAR-mediated regulation of alternative splicing. Rather, alternative splicing
regulation is achieved by modulating trans-acting factors involved in the splicing machinery.

## 74. OPM-based Model Verification Framework with Application to Molecular Biology

### Judith Somekh, Valerya Perelman, Chhaya Dhingra, Gal Haimovich, Mordechai Choder, Dov Dori

Abstract

A myriad of detailed pieces of knowledge regarding the structure and function of the living cell have been accumulating at an alarmingly increasing rate. Emphasis is shifting from the study of a single molecular process to cellular pathways, cycles, and the entire cell as a system. A framework for supporting the biological researcher for hypotheses verification is proposed. The framework includes molecular biological systems modeling and verification against pertinent literature.
Object-Process Methodology (OPM) is a holistic graphical modeling methodology that combines the behavioral and structural aspects of a system in a single model. The OPM methodology includes OPM-based development process, OPM Case Tool (OPCAT), and a modeling language.
In this work, we propose an OPM-based verification framework for molecular biology systems. The framework includes a set of translation rules from OPM to a finite-state transition system and classification of mechanistic requirements derived from biological experimental findings. The framework is exemplified on the gene expression system.

## 75. Coordinated Disease-Induced Changes in miRNA-Alternative Splicing networks Identified by Deep transcriptome Sequencing of Parkinson's Leukocytes

### Lilach Soreq*, Hagai Bergman, Zvi Israel and Hermona Soreq

Abstract

Background
MicroRNAs (miRs) are short, non-coding RNAs which act rapidly and effectively to regulate various molecular networks by post-transcriptional silencing of their multiple target transcripts; in about 95% of the human genes, alternative splicing (AS) gives rise to a multitude of different transcripts (isoforms) by using varying splice sites. We therefore surmised that coordinated miR/AS changes may be involved in regulating the multiple functions that are perturbed in Parkinson's disease (PD).
To pursue the inter-relationships between miRs and AS events in patient leukocytes we performed next generation RNA sequencing of short RNAs and utilized junction sensitive splicing arrays from PD leukocytes compared to matched controls, followed by network analyses of putative inter-relationships between the observed changes in miR levels and AS events. This study addresses an unmet need to develop novel blood tests for early PD diagnosis which may enable earlier and more effective long-term therapeutic treatment using current medications, while identifying new targets for future therapeutic intervention and providing future means for delaying disease progression.

Results
Blood leukocytes mRNA samples of PD patients and matched healthy volunteers were examined by SOLiD NGS RNA deep sequencing and splice junction prototype microarrays. 166,746,207 sequence reads were subjected to terminal and adapter trimming leaving 71,939,833 reads for analysis. These were aligned to the miRBase human DataBase. A total of 482 mature and 79 mature 3'/5' forms of miR molecules were detected as expressed and of them, 336 which expressed more than 1 per million base pairs were analyzed for differential expression. Common dispersion estimation followed by negative binomial test detected 17 differentially expressed miRs in PD leukocytes, 11 of which showed disease-associated increase; Reciprocal linear regression analysis of the transcripts represented on the splice junction microarrays revealed 478 disease-associated AS events (in 332 distinct genes) which separated patients from controls.
The thousands of predicted miR targets of each differentially expressed miR were then found by crossing multiple miR-target databases and prediction of binding sites on the complete sequence (5' UTR, CDS and 3' UTR) of all known genes. The predicted targets were further filtered for these which exhibited disease-associated AS events. A complex network with 560 connections of 13 PD-modified miRs and 217 target genes exhibiting AS events was revealed. The predicted regulation appeared in the 3' UTR, coding sequence and 5' UTR. Supporting the notion of disease relevance, this network included the PD-related transcriptional regulator Foxp1 and Pitx3 which promotes midbrain identity in dopaminergic neurons.

Conclusions
Our findings demonstrate multiple and apparently inter-related changes in miRs and AS events in PD leukocytes, suggesting disease-associated dysregulation in the combinatorial miR-mediated checks and balances which, under healthy state functioning, maintain balanced AS events controlling leukocyte transcript profiles.

## 76. Online statistical enrichment tools for ranked lists of genes

### Roy Navon, Israel Steinfeld, Zohar Yakhini

Abstract

We present a set of online statistical enrichment tools under a single web interface umbrella. Currently the tools available online include GOrilla - GO enrichment analysis, and miTEA - miRNA target enrichment analysis. Our online tools are based on enrichment statistics that takes a ranked list of genes as input [Eden & Navon et al. 2009, Steinfeld et al. submitted] and finds statistically significant enrichment of functional elements in the top of the ranked list of genes. The enrichment results are provided in a user friendly and interactive graphical representation.
The main advantages of our statistical enrichment tools include:
a) Online access and straightforward application that allows smooth and fast submission.
b) Efficient computations that produce results in matter of seconds.
c) Robust statistical methods that support biologically meaningful results.
d) Graphical representation of results that enables easy interpretation and sharing of significant findings.
These merits contribute to the continuous high popularity and exposure of the tools supported under this umbrella, with more than 1,500 monthly unique users from over 45 countries.

## 77. The GeneCards human proteome: quantitative tissue expression patterns and expanded sequence similarity space

### Gil Stelzer, Frida Belinky, Irina Dalah, Tsippi Iny Stein, Noam Nativ, Naomi Rosen , Noa Rappaport Eugene Kolker, Marilyn Safran, Doron Lancet

Abstract

Proteomics is the hottest innovation arena, following the genomics wave. It affords high throughput scrutiny of the multitude of protein isoforms and post translation modification products for each and every gene. GeneCards allows a comprehensive gene-centric view of the human proteome universe, automatically mined from a diversity of web sources. This includes protein sequence and variant data from UniProt , neXtProt, Tremble, Ensembl, protein-protein interactions from Mint and String, post-translational modifications from Phosphosite, specific peptides from DME and 3D information and rendering from PDB, OCA and Proteopedia. In parallel, GeneCards has become a central comprehensive resource for identifying and ordering protein-related research products such as recombinant protein and antibodies from over a dozen different companies.
While GeneCards and other databases profusely display RNA tissue expression patterns, the parallel protein expression patterns are only sparsely reported. We report here the establishment within GeneCards of a feature that displays such patterns in a systematic fashion. The data are obtained from SPIRE/Moped (https://www.proteinspire.org/SPIRE/startviews/welcome.jsf ), a resource for proteomic data, whose meta-analysis consolidates information from various sources and allows their comparison. Based on Moped data, we generate and display, within the GeneCards protein section, the protein expression values for various tissues, body fluids and cell lines. These graphed patterns allow faithful inter-tissue comparison as we apply a correction that equalizes the medians of expression distribution in the different tissues.
In parallel, we report the inclusion of a new tool for displaying human protein similarity. Previously, we reported nominal paralogs mined from Ensemble. Alongside, a GeneCards suite member GeneDecks, (http://www.genecards.org/index.php?path=/GeneDecks) affords the identification of partner proteins, similar to a probe entry via sharing of cellular pathways, domains, phenotypes, diseases and GO terms. The new feature reported here mines sequence similarity data from SIMAP Ð the Similarity Map of Proteins (http://boincsimap.org/boincsimap/). This resource has determined sequence similarity by all-against-all blast searches for all protein entries from numerous species. Using the human aspect of this SIMAP matrix, we obtain a considerably broader list of sequence partners for each gene product. This results from a less stringent (but still statistically significant) paralogy threshold (~67%), lower than usually applied, so as to display thousands of not identified by other data sources. These new approaches enrich GeneCards and help it support expanding research in the realm of biomedical sciences.

## 78. Modeling single cell stochastic gene expression

### Marek Strajbl, Tal El-Hay and Nir Friedman

Abstract

Stochasticity of gene expression has been recognized as an important factor in
various cellular processes, such as gene regulation or phenotypic variation of
clonal cell populations. Because many processes that mediate gene expression,
such as chromatin remodeling or activation of transcription factors are
difficult to observe experimentally, it is the noise in the amount of
expressed protein levels that can serve as a rich source of information about
the dynamics of the hidden machinery.

In order to analyze noisy protein levels, we have been working on a model of
stochastic gene expression in which the gene activation, transcription and
translation are formally described as a continuous time Bayesian network
(CTBN). Time dependent observations of protein expression levels are used to
determine the values of hidden variables and parameters of the model. We will
present preliminary results of exact inference on single gene synthetic data.
Such toy model will serve us as a benchmark for future development of
computationally more efficient approximate inference method that will be
applicable to analysis of real experimental data.

## 79. Intracellular metabolite concentrations are explained by an interplay between minimization of total concentration of metabolites and their corresponding enzymes

### Naama Tepper, Elad Noor, Daniel Amador-Noguez, Josh Rabinowitz, Wolfram Liebermeister & Tomer Shlomi

Abstract

Intracellular metabolite levels are determined by kinetic considerations of thousands of enzymes that are regulated to meet metabolic demands. In this work, we hypothesize that the steady-state metabolite concentrations in a microorganism can be intuitively explained in terms of a compromise between two factors that "pull" metabolite concentrations in different directions. The first is cellular adaptation towards minimizing the concentration of intermediate metabolites, and the second is the adaptation to minimize the cellular cost associated with the production of metabolic enzymes. To test this hypothesis, we developed a method, metabolic tug-of-war (mTOW), which computes steady-state metabolite concentrations in a microorganism under a growth condition at hand by considering these two factors. The method relies strictly on available data regarding the stoichiometry and thermodynamics of microbial metabolic networks, without requiring enzyme kinetic data (which is mostly unknown). mTOW is shown to successfully explain up to 50% of the observed variation in measured metabolite concentrations in E. coli under aerobic glucose, acetate and glycerol media using existing measurements for these media. As a further validation, we applied LC/MS approaches to measure absolute metabolite concentrations in E. coli growth under anaerobic glucose media, showing that mTOW correctly identifies the major changes in metabolite concentrations between aerobic and anaerobic conditions. To demonstrate mTOW's ability to explain metabolite concentrations in another species, we describe its application to predicting metabolite concentrations in the industrially relevant microorganism C. acetobutylicum under acidogenesis and solventogenesis growth phases.

## 80. MORPH: MOdule guided Ranking of candidate PatHway genes in Arabidopsis thaliana and Lycopersicum solanum

### Oren Tzfadia, David Amar, Louis Bradbury, Ron Shamir and Eleanore Wurtzel

Abstract

Closing gaps in our current knowledge about biological pathways is a fundamental challenge. The development of novel computational methods along with high throughput experimental data carries the promise to help in the challenge. We present a novel algorithm called MORPH (MOdule guided Ranking of candidate PatHway genes), for revealing unknown genes in biological pathways. The method receives as input a set of known genes from the target pathway, a collection of expression profiles, and interaction and metabolic networks. Using machine learning techniques, MORPH selects the best combination of data and analysis method and outputs a ranking of candidate genes predicted to belong to the target pathway. We tested MORPH on 230 known pathways in Arabidopsis (Arabidopsis thaliana) and 93 known pathways in tomato (Solanum lycopersicum), and obtained high quality cross validation results. In the photosynthesis light reactions, homogalacturonan biosynthesis, and chlorophyll biosynthetic pathways, genes ranked highly by MORPH were recently verified to be associated with these pathways. Candidates generated for the carotenoid pathway included genes associated with the biosynthesis of carotenoid regulators and metabolic pathways that intersect at a metabolic hub upstream of the carotenoid pathway.

## 81. Compensating mode of regulation by human and viral miRNAs

### Isana Veksler-Lublinsky, Yonat Shemer-Avni, Eti Meiri, Zvi Bentwich, Klara Kedem,Michal Ziv-Ukelson

Abstract

MicroRNAs (miRNAs) are important regulators of gene expression encoded by a variety of organisms, including viruses. Although the functions of most of the viral miRNAs are currently unknown, there is evidence
that both viral and host miRNAs contribute to the complex interactions between viruses and their hosts. In addition to expressing their own miRNAs, infections with some viruses can result in changes in the expression
of host miRNAs. In particular some host miRNAs may be up- or down-regulated by the host machinery as a defense mechanism. Based on published evidence linking viral and host miRNAs, three modes of action in target
regulation are proposed: competing, cooperating and compensating modes.
In this work we explore the compensating mode of target regulation upon Human Cytomegalovirus
(HCMV) infection. To achieve this, we develop a new algorithm which finds groups, called quasi-modules, of viral and human miRNAs that target the same human genes, and use our new miRNA expression data. We provide supporting evidence from biological and medical literature for two of our modules.
These modules may help understanding the role of miRNAs in host-viral interactions, and the targeted
genes may serve as candidates for experimental target validation.

## 82. Codon bias in pyrimidine-ending codons

### Naama Wald, Maya Alroy, Maya Botzman, Hanah Margalit

Abstract

Synonymous codons are unevenly distributed among genes, a phenomenon termed codon usage bias. Understanding the forces shaping codon bias is a major step towards elucidating the adaptive advantage codon choice can confer at the level of individual genes and organisms. We performed a large-scale analysis to assess codon usage bias of pyrimidine-ending codons in highly expressed genes in prokaryotes. We found that codon-pairs encoding two- and threefold degenerate amino acids tend to be biased towards the C-ending codon while codons encoding fourfold degenerate amino acids tend to be biased towards the U-ending codon. This codon usage pattern is widespread in prokaryotes, and its strength is correlated with translational selection both within and between organisms. We show that this bias is associated with an improved correspondence with the tRNA pool, avoidance of misincorporation errors during translation, and moderate stability of codon-anticodon interaction, all consistent with more efficient translation.

## 83. Systematic dissection of roles for chromatin regulators in a yeast stress response

### Assaf Weiner, Hsiuyi Chen, Chih Long Liu, Ayelet Rahat, Avital Klien, Luis Soares, Mohanram Gudipati, Jenna Pfeffner, Aviv Regev, Steven Buratowski, Jeffrey A. Pleiss, Nir Friedman, and Oliver J. Rando

Abstract

Packaging of eukaryotic genomes into chromatin has wide-ranging effects on gene transcription. Curiously, it is commonly observed that deletion of a global chromatin regulator affects expression of only a limited subset of genes bound to or modified by the regulator in question. However, in many single-gene studies it has become clear that chromatin regulators often do not affect steady-state transcription, but instead are required for normal transcriptional reprogramming by environmental cues. We therefore have systematically investigated the effects of 83 histone mutants, and 119 deletion mutants, on induction/repression dynamics of 170 transcripts in response to diamide stress in yeast. Importantly, we find that chromatin regulators play far more pronounced roles during gene induction/repression than they do in steady-state expression. Furthermore, by jointly analyzing the substrates (histone mutants) and enzymes (chromatin modifier deletions) we identify specific interactions between histone modifications and their regulators. Combining these functional results with genome-wide mapping of several histone marks in the same time course, we systematically investigated the correspondence between histone modification occurrence and function. We follow up on one pathway, finding that Set1-dependent H3K4 methylation primarily acts as a gene repressor during multiple stresses, specifically at genes involved in ribosome biogenesis. Set1-dependent repression of ribosomal genes occurs via distinct pathways for ribosomal protein genes and ribosomal biogenesis genes, which can be separated based on genetic requirements for repression and based on chromatin changes during gene repression. Together, our dynamic studies provide a rich resource for investigating chromatin regulation, and identify a significant role for the "activating" mark H3K4me3 in gene repression.

## 84. BitterDB - bitter compounds database and analysis of chemical features associated with bitterness

### Ayana Wiener-Dagan, Masha Niv

Abstract

BitterDB - bitter compounds database and analysis of chemical features associated with bitterness
Ayana Wiener and Masha Y. Niv
Institute for Biochemistry, Food Science and Nutrition,
The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel.
The sense of taste is a significant factor which guides animals in choosing their foods.
Animals avoid eating bitter food components, and indeed many of these bitter compounds are found to be toxic. Nevertheless, it is known today that bitterness is not always noxious, and that some of the bitter compounds hold beneficial effects on health.
It is estimated that there are thousands or even tens of thousands bitter molecules, and that they are very diverse in their chemical structure and physicochemical properties.
Bitter taste perception is mediated by G-protein coupled receptors (GPCRs) of the taste receptors type 2 (T2R) gene family. In humans this gene family includes 25 members.
An intriguing question in bitter-taste research is how hundreds or more of structurally diverse compounds can be detected by a limited number of receptors[2].
In this study we aimed to explore the molecular basis of the human bitter taste detection mechanism by characterizing the chemical features associated with bitter compounds in general, and by particularly characterizing the chemical features associated with hTAS214 ligands. In order to facilitate this study we established the BitterDB database, available at
http://bitterdb.agri.huji.ac.il/bitterdb/ [1].
The database includes over 550 compounds that were reported to be bitter to humans, their association with a particular human bitter taste receptor when available and detailed information about the human bitter taste receptors in general. The BitterDB offers its users a rich and friendly interface for querying and browsing the bitter compounds and bitter receptors datasets.
1. Wiener A, Shudler M, Levit A, Niv MY: BitterDB: a database of bitter compounds. Nucleic Acids Res 2011.
2. Meyerhof W, Batram C, Kuhn C, Brockhoff A, Chudoba E, Bufe B, Appendino G, Behrens M: The molecular receptive ranges of human TAS2R bitter taste receptors. Chem Senses 2010, 35(2):157-170.

## 85. Ontologies from Existence of Homologues: Human vs. Fly Viewpoints

### Jonathan Witztum, Erez Persi, David Horn, Metsada Pasmanik-Chor and Benny Chor

Abstract

The availability of a large number of annotated proteomes, resulting from recent high-throughput technologies, enables the systematic study of the relationships between protein conservation and functionality. In this work, we ask if this question can be addressed based solely on the existence (or non-existence) of protein homologues. We study the proteomes of 17 metazoans, and examine them from both the human and the fly viewpoints.
We downloaded the complete proteomes of 17 metazoan species from NCBI RefSeq protein database (release 47). These species include: 7 mammals (including 2 primates), 11 vertebrates, 4 non-mammalian vertebrates, and 6 invertebrates. The human and fly were used as "anchor species": for each protein in both species, we used BLAST (with default parameters) to find out which of the other 16 species lacks or possess a homologue protein.
Two relevant protein groups in this context are the "universal proteins"- those having homologues in all 16 other species, and the "orphan proteins"-those with no homologues. More complex patterns of conservation (e.g. proteins having homologues in all vertebrates, but no invertebrate homologue) are also of interest. In order to characterize the relations between such patterns and proteins functionality, and compare the two viewpoints, we employ quantum clustering and the Gorilla gene ontology tools.
Interestingly, the great majority of proteins in both human and fly have homologues in all species studied (approx. 66%). Moreover, there are more than 3 times more fly orphan proteins than human orphans. The enriched functions of universal proteins in both human and fly are quite similar. These include transcription regulation, biological adhesion, and development, which are crucial functions of life. In contrast, function enrichment of the non-universal proteins in human and fly are very different. Keratinization, heme metabolism, immune-response (cytokine and interferon signaling), calcium-independent cell-cell adhesion, translation and ncRNA metabolism are included among enriched GO terms in human non-universal proteins. Lipid metabolism, sensory perception of smell and taste, and chitin metabolism are included among enriched GO terms in fly non-universal proteins.
The standard view of protein conservation is taken with respect to the human proteome. However, the importance of different viewpoints is shown in this study by considering a different "anchor species", the fly. Interestingly, we can show many novel features solely on the basis of the binary data-existence or no existence of homologues.

## 86. Three-Dimensional Folding and Functional Organization Principles of the Drosophila Genome

### Tom Sexton, Eitan Yaffe, Ephraim Kenigsberg, Frédéric Bantignies, Benjamin Leblanc, Michael Hoichman, Hugues Parrinello, Amos Tanay, Giacomo Cavalli

Abstract

Chromosomes are the physical realization of genetic information and thus form the basis for its readout and propagation. Here we present a high-resolution chromosomal contact map derived from a modified genome-wide chromosome conformation capture approach applied to Drosophila embryonic nuclei. The data show that the entire genome is linearly partitioned into well-demarcated physical domains that overlap extensively with active and repressive epigenetic marks. Chromosomal contacts are hierarchically organized between domains. Global modeling of contact density and clustering of domains show that inactive domains are condensed and confined to their chromosomal territories, whereas active domains reach out of the territory to form remote intra- and inter-chromosomal contacts. Moreover, we systematically identify specific long-range intra-chromosomal contacts between Polycomb repressed domains. Together, these observations allow for quantitative prediction of the Drosophila chromosomal contact map, laying the foundation for detailed studies of chromosome structure and function in a genetically tractable system.

Abstract

Blah

## 88. How the time of replication shapes the genomic structure

### Yishai Yehuda, Ephraim Kenigsberg, Shlomit Farkash-Amar, Yaara David, Zohar Yakhini, Andrei Chabes, Amos Tanay, Itamar Simon

Abstract

Each genomic region is replicated at a distinct time during S phase through the activation of an origin of replication. The time each region is replicated is a function of its distance from an active origin and the time the origin was activated. Adjacent origins are usually activated at the same time giving rise to replication time zones. The time of replication (ToR) of a region, seems to reflect higher order genomic organization, since it correlates with basic chromosomal features such as the transcription, regional GC content, Giemsa banding and gene density. Systematic mapping of genome-wide replication timing in mammals revealed that the genome can be divided into large replication zones with uniform ToR, and temporal transition regions (TTRs) between them. Most TTRs are originless and thus they are ideal for studying the causality between ToR and other genomic features, since in those regions the ToR is determined solely by the distance from an origin. We have found strong correlations between ToR and GC content, transcription and chromatin structure also in TTRs, suggesting that the ToR plays a causal role in all those genomic features. One mechanism that can explain how the ToR can affect the genomic GC content is a bias in the type of mutations along S phase. Indeed we found that along the primate phylogeny the rate of GC to AT substitutions is faster in late replicating regions while the rate of AT to GC substitutions is the same along S phase. Bias in the type of mutations can be explained by a difference in the pool of dNTPs along S phase. Indeed measuring dNTP concentration at multiple time points along the cell cycle in two different cell lines confirms our hypothesis. We found that dATP and dTTP amounts climb with the progress of S phase while the amount of dGTP does not change and dCTP amount even decline along S phase. Taken together our results suggest that the ToR plays a key role in shaping the genome and provides an appealing mechanism for isochore formation.

## 89. A novel Metabolic Transformation Algorithm predicts perturbations counteracting aging in yeast and mammalian muscle tissue

### Keren Yizhak, Orshay Gabay, Haim Cohen, Eytan Ruppin

Abstract

Disease is classically viewed as a disruption of healthy homeostasis. Here we introduce a novel generic Metabolic Transformation Algorithm (MTA), which identifies perturbations that can transform cellular metabolism from a given disease state back to a desired healthy one. MTA works by integrating gene expression data from the disease and healthy states within a genome scale metabolic model. The prediction accuracy of MTA is extensively validated using data from known perturbations in Escherichia coli, Saccharomyces cerevisiae and mammalian cell lines. Analyzing gene expression data in aging Saccharomyces cerevisiae, seven novel lifespan-extending metabolic targets predicted by MTA were further tested experimentally. Two of those (a 10-fold increase over their expected frequency), GRE3 and ADH2, were successfully experimentally validated. Analyzing mammalian aging muscle expression data, MTA identifies novel drug targets transforming the metabolic state to that of the young. Its predictions are enriched with human orthologs of known lifespan-extending genes in Saccharomyces cerevisiae and Caenorhabditis elegans and highlight a key inflammatory pathway of eicosanoids metabolism. MTA offers a fundamentally new approach for identifying metabolic drug targets with minimal side-effects in a broad span of major metabolically-related human disorders, including obesity, neurodegeneration and cancer.

## 90. Uncovering pre-mRNA Splicing Regulation Code in S. Cerevisiae Using A Synthetic Intron Library

### Ido Yofe1, Tuval Ben Yehezkel2, Zohar Zafrir3, Tamir Tuller3, Ehud Shapiro2, Maya Schuldiner2

Abstract

RNA splicing is a process in which a pre-mRNA transcript maturates and where introns, intervening non-coding fragments within transcripts, are recognized and removed. It is therefore a crucial regulatory step in determining the expression profile of the cell. As such, splicing efficiency and specificity is used to regulate growth, development and response to external signals. The basic functions and sequences that enable splicing have been previously revealed. However, very little is known about the sequence determinants and the protein components that guide splicing efficiency and specificity. In yeast, only 5% of the genes contain an intron (283 of ~6000), and merely a few are mediated by two or more. In addition, several intron sequences are found to be duplicated.
In order to discover the intron sequence determinants that adjust the splicing process, a synthetic yeast intron library was created. Each of the 245 strains in the library represent a single full length natural intron (covering ~87% introns), were inserted into a Yellow Fluorescent Protein (YFP). Since the YFP insertion location, prompter and terminator are identical, differences in expression of the YFP in each strain can be used to measure the splicing efficiency incurred in a causal way, solely due to the intron sequence within it.
Machine learning and statistical approaches have been employed to understand how the sequence features determine the YFP expression level measurements. Specifically, our analysis demonstrates that in yeast the splicing efficiency is encoded in the GC content and the folding energy of the introns (r = 0.9531 ; p < 9.15e-09). Thus, we show for the first time that the intron sequences themselves contains information related to their splicing efficiency.

## 91. A micro-well array reveals contact independent suppression of effector CD4 T-cells by regulatory T-cells at short intercellular distances

### Irina Zaretsky, Eric Shifrut, Michal Polonsky, Nir Friedman

Abstract

Regulatory T cells (Treg) play a significant role in peripheral immunological tolerance by mediating T cells suppression. Suggested suppression mechanisms range from granzyme mediated cytolysis which requires a contact between a Treg and a target effector T-cell, to inhibition through the secretion of inhibitory cytokines, or deprivation of IL-2 through its rapid consumption, resulting in reduced cell survival. Previous studies showed that Tregs induced suppression of effector T-cells does not occur in trans-well assays, while it does occur in co-culture of the two cell populations, suggesting an apparent requirement for direct cell contact for effective suppression. However, the transwell apparatus separates responder T-cells and Tregs by a distance that is about 100 times larger than the diameter of a lymphocyte. This probably exceeds the distance required for effective IL-2 deprivation by Tregs.
To overcome this limitation, we study the necessity for cell contact between regulatory and effector T-cells using a controlled micro-culture system developed in our lab. This system allows for precise and real-time monitoring of cellular responses at the single cell level in a physiologically realistic separation of few tens of microns between the cells.
Effector and regulatory T-cells are seeded randomly into an array of micro-wells each holds up to two cells. This architecture creates a wide distribution of intercellular distances in the same culture, ranging from contacting cells in the same micro-well up to a distance of few hundreds microns between the two cell types. We observe that the presence of Tregs in the micro-culture induces effector T-cells death even without contact. Moreover, effector cell death probability decreases with distance from the nearest Treg. These results support the existence of a contact independent suppression mechanism that is limited to short distances due to diffusion. We can further investigate the underlying mechanisms by adding or blocking candidate cytokines.

## 92. Using Computational Biology Methods to Improve Post-silicon Microprocessor Testing

### Ron Zeira, Dmitry Korchemny and Ron Shamir

Abstract

Hardware testing is an expensive process at different stages of hardware design and manufacturing. It includes pre-silicon, post-silicon and production testing. Testing is expensive both in terms of manpower and in computing resources, and it directly affects the hardware profitability and the time to market. This problem is especially acute for Systems on Chip (SoC) where both manpower and timing constraints are very tight. Therefore it is important to reduce the total number of tests without sacrificing testing quality.
To learn the behavior of a large test set smart algorithms are needed. In addition, visualization techniques can provide a bird's-eye view of the total test coverage data.
Our goal is to optimize post-silicon hardware test suites based on coverage metrics and to provide test coverage visualization. We utilize ideas and methods developed in machine learning and bioinformatics, and develop new biology-inspired methods to analyze and visualize post-silicon data. In a different effort, we are exploring combinatorial methods of covering, domination and partition for the same problem.
Mathematically, the results of post-silicon tests can be presented as a matrix whose rows correspond to the tests performed on the chip and columns correspond to certain events of interest occurring during the test's runs. The matrix values are the number of times the event occurred in the test. Such a matrix can then be used to define a similarity measure between tests and analyze their relations. Graphic models can also be used to represent and analyze the data.
A rich spectrum of methods was developed for analysis of gene expression microarray data, and we adapt them for the post-silicon analysis. For example, clustering techniques divide the tests into similarity groups, identifying subsets of tests that cover similar events. The identified groups can then be analyzed by the hardware validation engineers in order to identify coverage holes and to improve the test suite quality. In addition, similar test groups can be investigated for enrichment of certain chip properties as done for gene groups with biological properties. Gene expression software tools that combine advanced analysis and visualization can assist in visual comprehension of the post silicon validation process.
Test results can be represented as a bi-partite graph with edges between tests and the event they hit. Dense subsets of such a graph correspond to subset of tests that hit common events. Decomposing the graph to its densest subgraphs gives a bi-clustering solution. We investigate polynomial algorithms for finding the densest subgraph and tweak the density definition to account for the subgraph separation.
We describe initial results obtained by applying computational biology methods to post-Si test suite optimization and visualization. Though we experimented only with post-silicon test data, most of the developed methods should be applicable with appropriate modifications also to pre-silicon, production, and even to software testing.

## 93. De-Novo prediction of Histone deacetylase 8 non histone substrates based on computational structural modeling

### Lior Zimmerman, Ora Furman

Abstract

Histone deacetylases (HDAC) are a class of enzymes that catalyze the deacetylation
of histones in the cell nucleus, subsequently altering the chromatin state. One of
the members of this class, HDAC 8 is one of the most intriguing: It is biologically
involved in skull morphogenesis and metabolic control of the ERR-alpha/PGC1-
alpha transcriptional complex and is the only member of the HDAC superfamily that
performs as a monomer. Recent and exciting evidence shows that this enzyme has the
ability to catalyze non - histone substrates, for example p53 and more intriguingly, its
specificity changes as a function of its co-factors (Fe or Zn).

All these interesting properties have led us to a quest to decipher the mechanism of
this enzyme substrate specificity under different conditions. By applying different
machine learning techniques, together with the FlexPepBind protocol that was
previously developed in our lab to evaluate binding ability of linear segments based
on structural models of the interaction, we hope to be able to de-novo predict binding
partners of this unique protein.

## 94. RFMapp: Ribosome Flow Model Application

### Hadas Zur and Tamir Tuller

Abstract

The RFMapp, is a GUI application based on the RFM (Ribosome Flow Model), enabling the first computationally efficient large scale estimation of genes' translation elongation rates and ribosomal density profiles, taking into account the biophysical nature of the translation process. The RFMapp is based on the approach previously described by Reuveni et al., and unlike other traditional approaches in the field, which are mainly related to the genes' mean codon translation efficiency, the RFM additionally considers the codon order and composition, cellular tRNA pool, and the ribosomes' size and interactions. Thus, it has been shown that RFM outperforms traditional predictors when analyzing both heterologous and endogenous genes.