Abstract

My thesis is focused on the study of Cayley graphs. A Cayley graph is a graph whose vertex set is a group *G* and two vertices *u* and *v* are adjacent if and only if *uv ^{-1}* is an element of a given subset

The first one is a special class of these graphs: complete rotational Cayley graphs; which are Cayley graphs admitting a particular automorphism

The second topic deals with the problem of finding a maximal clique in a given graph; where a clique is a subset of the vertex set such that the induced subgraph is complete. In my case the graph is a Paley graph of square order and I have analyzed some results which construct a maximal clique in the graph. Moreover, I have studied some bounds for the clique number (the size of the maximum clique in the graph), the independent number (the cardinality of the largest coclique in the graph) and the chromatic number (the minimum number of colors required to obtain a proper colouring of the graph) obtained with the eigenvalues of the graph (i.e. the eigenvalues of the adjacency matrix of the graph).

This is the reason why I have studied, as the last main topic, the spectrum of Cayley graphs. In order to obtain information about the spectrum of a Cayley graph, we have to investigate the structure of the group

Formal grammar modeling three-stranded DNA:RNA braids

*Joint work with N. Jonoska, S. Poznanović, M. Riehl and M. Vazquez.*

A formal grammar is a system to generate words; it consists of a set of symbols, partitioned into terminals and non-terminals, and a set of production rules. The production rules specify how to rewrite non-terminal symbols, so that successive applications of those rules yield words formed by only terminals. Adding probabilities to the production rules defines stochastic grammars, which can be used for biological sequence analysis. In this talk, we focus on a “braid grammar” to model R-loops, that are three-stranded structures formed by a DNA:RNA hybrid plus a single strand of DNA, often appearing during transcription. R-loops are described as strings of terminal symbols representing the braiding of the strands in the structure, where each symbol corresponds to a different state of the braided structure. We discuss approaches to develop a stochastic grammar and a probabilistic model for R-loop prediction, as well as refinements of the model by incorporating the effect of DNA topology on R-loop formation.

Graph clustering for RNA secondary structure analysis

*Joint work with H. Du, C. Heitsch, F. Hurley, C.V. Mennicke, B.D. Sullivan and B. Xu*

Unlike DNA, RNA is single stranded and folds via bonds between pairs of complementary nucleotides while it is still being synthesized from DNA. One central problem in molecular biology is understanding the specific shape into which an RNA molecule folds, as its shape encodes functional information. We will focus on RNA secondary structure in this talk, that is the 2D arrangement of the final RNA configuration. For an RNA sequence, we will consider a representative set of secondary structures and discuss combinatorial methods to mine the structural information from the given ensemble. In particular, we will present a graph algorithms approach based on dissimilarity scores and community detection.

Combinatorial insights into biomolecular interactions

In this talk, we look at combinatorial methods to describe biomolecular processes, namely DNA self-assembly and DNA:RNA interactions. In the first part, we focus on self-assembling graph-like structures using branched junction DNA molecules which are star-shaped molecules that join together through adhesion sites at the end of their arms. We describe a combinatorial representation of these molecules as vertices with labeled half-edges, and consider the problem of optimally building a target graph under different laboratory settings. We show how this question give rises to new graph invariants and how these invariants can be studied using edge-colorings and graph decompositions. In the remainder of the talk, we use formal grammars to describe R-loops, three-stranded structures formed by a DNA:RNA hybrid and a single strand of DNA. A formal grammar is a system to generate words; it consists of a set of symbols, terminals and non-terminals, and a set of production rules. The production rules specify how to rewrite non-terminal symbols to obtain words formed by only terminals. We introduce a “braid grammar” to model R-loops as strings of terminal symbols representing different states of the braided structure and discuss approaches to add probabilities to the production rules, thereby yielding a stochastic grammar and a probabilistic model for R-loop prediction.

Combinatorial questions arising from biomolecular processes

We will look at combinatorial problems that describe biomolecular processes, namely DNA self-assembly and DNA:RNA interactions. In the first part, we show how DNA self-assembly processes give rise to new graph invariants and how these invariants are related to known chromatic parameters. We also discuss how these invariants help in building molecules that can optimally assemble into a target structure. In the second part of the talk, we use formal grammars to model DNA:RNA interactions and the formation of R-loops. By identifying patterns specific to R-loops using experimental data, we can expand the current grammar into a probabilistic model to predict locations favorable for R-loops in a given DNA sequence.

Insertions on double occurrence words motivated by DNA rearrangement

*Joint work with D. A. Cruz, N. Jonoska, L. Nabergall and M. Saito*

A double occurrence word (DOW) is a word in which every symbol appears exactly twice; two DOWs are equivalent if one is a symbol-to-symbol image of the other. We consider the so-called repeat pattern (αα) and the return pattern (αα^{R}), with gaps allowed between the α’s. DOWs and repeat/return words have been used in studies of DNA rearrangement in certain species of ciliates. Motivated by the genome architecture of the ciliate *Oxytricha trifallax*, we introduce a notion of inserting repeat/return words into DOWs and study how two distinct insertions into the same word can produce equivalent DOWs. Given a DOW *w*, we characterize the structure of *w* which allows two distinct insertions to yield equivalent DOWs. This characterization depends on the locations of the insertions and on the length of the inserted repeat/return words and implies that when one inserted word is a repeat word and the other is a return word, then both words must be trivial (i.e., have only one symbol). The characterization also introduces a method to generate families of words recursively.

Graph-theoretical questions arising from biomolecular processes

We will look at graph-theoretical problems that describe biomolecular self-assembly processes and DNA:RNA interactions. In the first part, we show how DNA self-assembly processes give rise to new graph invariants and how these invariants are related to known chromatic parameters. We also discuss how these invariants help in building molecules that can assemble into a target structure. In the second part of the talk, we use directed graphs to analyze sequence variants that appear during RNA-guided DNA repair processes. Understanding the structure of these graphs requires the development of new graph-theoretical approaches, including topological methods.

Homology of directed graphs with application to DNA recombination

*Joint work with L. Fajardo Gómez, N. Jonoska and M. Saito*

A double-occurence word (DOW) is a word in which every symbol appears exactly twice. We consider the so called repeat patterns (αα) and return patterns (αα^{R}), with gaps allowed between the α’s; these patterns generalize square and palindromic factors of DOWs, respectively. In the context of genomics, pattern deletions on DOWs have been used to study DNA recombination in certain species of ciliates. We model these reduction processes with a directed graph where vertices are DOWs, and an edge from *w* to *w*‘ exists if *w*‘ is obtained from *w* through a pattern deletion. On this graph, we consider the cell complex consisting of products of directed simplices and define a new boundary operator. This allows the computation of homology groups, which will help in identifying rearrangement pathways that may be of interest.

Mathematical models for describing molecular self-assembly

*Joint work with N. Jonoska*

We present several mathematical models for describing molecular building blocks, called rigid tiles, that assemble in larger nanostructures. Rigid tiles can be seen as *k*-arm branch junction structures that join together by annealing to each other through the affinity of their arm-ends. Such a *k*-arm rigid tile is described with a set of *k* vectors joined at the origin that can be translated or rotated during the assembly. Besides the geometric shape of the building blocks, the models can take into account the geometry of the arm-ends joining together. We show distinctions between four models by characterizing types of structures that can be assembled from rigid tiles.

Formalism and design strategies for DNA tile assembly

*Joint work with A. Cook, A. Houlihan, R. Rouleau, N. C. Seeman, G. Pangborn and J. Ellis-Monaghan*

New laboratory techniques have been developed using the Watson- Crick complementarity properties of DNA strands to achieve the self-assembly of nanostructures, particularly polyhedra. For all of these methods, an essential step is designing the component molecular building blocks. Combinatorics is a natural tool to address these new design strategy problems.

We present a mathematical model that captures the geometric constraints of rigid tiles, which are the branched junction molecule building blocks for tile-based DNA self-assembly. We also provide DNA self-assembly strategies to efficiently construct Platonic and Archimedean 3-regular polyhedral skeletons using this model.

Abstract

It is known that some sequences of four elements, that are the nitrogenous bases A, C, G, T (namely adenine, cytosine, guanine and thymine) from which DNA is assembled, are repeated in the human genome many times. Because of the structure of DNA, different sequences that are equivalent up to cyclic rotation and reverse complementation, determine the same repeated pattern. Bell et al. [1] were able to derive a formula for the number of equivalence classes of primitive patterns of DNA sequences of length *n* using the bijective correspondence between words on two letters of length *2n* and words of length *n* on four letters. A further derivation of the formula for the classification of DNA sequences of length *n* is provided in [2] by using properties of cycle structures related to fixed points of the hypercube *Q _{n}*.

In this thesis a cyclic binary string of length

The second part of this work deals with techniques that use the Watson-Crick complementarity properties of DNA strands to self-assembly nanostructures, particularly polyhedra. For all of these methods, an essential step is designing the component molecular building blocks; thus there is an increased necessity for a mathematical approach to these design strategy problems.

We provide a mathematical model that captures the geometric constraints of rigid tiles, which are the branched junction molecule building blocks for tile-based DNA self-assembly. We also describe graph theoretical formalism for the DNA origami method. Using the aforementioned techniques we provide DNA self-assembly strategies to efficiently construct Platonic and Archimedean 3-regular polyhedral skeletons.

[1] G. I. Bell, R. L. Bivins, J. D. Louck , N. Metropolis, M. L. Stein, Properties of words on four letters from those on two letters with an application to DNA sequences. Advances in Applied Mathematics **14**(3), 348–367 (1993)

[2] W. Y. C. Chen, J. D. Louck, Necklaces, MSS sequences, and DNA sequences. Advances in Applied Mathematics **18**(1), 18–32 (1997)

Abstract

Tra gli argomenti considerati ormai classici nell’ambito della teoria dei grafi rientrano senz’altro i problemi di colorazione e i problemi di decomposizione.

Per quanto riguarda i primi si parla addirittura di teoria cromatica dei grafi, distinguendo in genere se si considerano colorazioni sui vertici o sugli spigoli. Nei problemi di decomposizione si studia la possibilità di spezzare un grafo assegnato in “pezzi” dotati di particolari proprietà. Di fatto si può pensare che le due tipologie di problemi siano facce diverse della stessa medaglia. In altre parole per una assegnata colorazione di un grafo si può andare ad esaminare la struttura costituita dai sottografi determinati dagli oggetti (vertici o spigoli) che hanno ricevuto lo stesso colore. Viceversa, data una decomposizione di un grafo, si può pensare che i sottografi della decomposizione in esame siano oggetti di colore assegnato in una colorazione di qualche tipo (sui vertici o sugli spigoli).

Per quanto riguarda la colorazione sugli spigoli, un risultato fondamentale si deve a Vizing [4] il quale dimostrò una relazione che lega il numero di colori necessari per determinare una colorazione con la valenza massima dei vertici del grafo in esame. Questo teorema consente di suddividere i grafi in due categorie: i grafi di Classe 1 e di Classe 2. I primi sono tali per cui il numero di colori necessari è pari alla valenza massima dei vertici, i secondi invece necessitano di un colore ulteriore rispetto al valore del grado massimo dei vertici del grafo.

Di notevole importanza è il concetto di 1-fattorizzazione, con il quale si intende la possibilità di decomporre il grafo dato in sottografi ricoprenti regolari di grado 1 tali che, a due a due, siano privi di archi comuni e la loro unione sia il grafo in esame. Il legame tra 1-fattorizzazione e colorazione sugli spigoli emerge in particolare quando si analizzano i grafi regolari; infatti per questi ultimi essere di Classe 1 equivale ad essere 1-fattorizzabili. Affinchè un grafo sia 1-fattorizzabile è necessario, come verrà meglio spiegato nel Cap. 3, che sia regolare e abbia un numero pari di vertici.

Non tutti i grafi regolari però risultano 1-fattorizzabili, come si vede analizzando, ad esempio, il grafo di Petersen. Nasce quindi il problema di determinare una condizione sufficiente che indichi se il grafo è 1-fattorizzabile (e quindi di Classe 1). In particolare è stata sviluppata l’idea che se un grafo regolare ha valenza “abbastanza” alta, allora è 1-fattorizzabile.

Qual’è un valore adeguato per la soglia minima della valenza? È in questo contesto che viene formulata la cosiddetta “congettura della 1-fattorizzazione”, che a tutt’oggi non ha ancora un controesempio:

se *G* è un grafo regolare di grado *d* su *2n* vertici tale che

• *d ≥ n – 1*, se *n* è pari

• *d ≥ n,*, se *n* è dispari

allora *G* è 1-fattorizzabile. Nel tempo si sono dimostrati vari risultati che tendono ad abbassare la soglia, avvicinandosi sempre più a quella della congettura. In questa tesi verranno riportati i risultati relativi ai grafi regolari di grado *d = 2n – 1, d = 2n – 2, d = 2n – 3, d = 2n – 4* e *d = 2n – 5* presenti in [3] e in [1].

Verranno presentati anche teoremi dovuti a Chetwynd ed Hilton nei quali la congettura in esame è stata dimostrata per *d* ≥ 6/7 |V (G)|∼0.857 |V (G)| in [1] e d ≥ 1/2(√7 – 1)|V (G)|∼ 0.823 |V (G)| in [2]. Infine verranno elencati alcuni risultati recenti che migliorano la soglia minima che deve soddisfare la valenza del grafo affinchè quest’ultimo sia 1-fattorizzabile.

[1] A. G. Chetwynd, A. J. W. Hilton, Regular graphs of high degree are 1-fatorizable. Proc. London Math. Soc. **3**(2), 193-206 (1985)

[2] A. G. Chetwynd, A. J. W. Hilton, 1-factorizing regular graphs of high degree-an improved buond. Discrete Mathematics **75**, 103-112 (1989)

[3] W. D. Wallis, One-Factorizations (Kluwer Academic Publishers, Dordrecht, 1997)

[4] V. G. Vizing, On an estimate of the chromatic class of *p*-graph [Russian]. Discretyni Analiz **3**, 25-30 (1964)

Abstract

This work deals with the concept of Cayley graphs and related properties. A Cayley graph is a graph whose vertex set is a group *G* and two vertices *u* and *v* are adjacent if and only if *uv ^{−}*

The first one is a special class of these graphs: complete rotational Cayley graphs; which are Cayley graphs admitting a particular automorphism ω such that, for some ordering of

The second topic deals with what is generally known as the maximum clique problem, which asks for a clique of maximum cardinality; where a clique is a subset of the vertex set such that the induced subgraph is complete. We have distinguished between maximal clique (i.e. a clique that is not a proper subset of any other clique) and maximum clique (i.e. a maximal clique that has the maximum cardinality) and we have analyzed some results that construct a maximal clique in the Cayley graph of square order. Moreover, we have studied some bounds for the clique number (the size of the maximum clique in the graph), the independent number (the cardinality of the largest coclique in the graph) and the chromatic number (the minimum number of colors required to obtain a proper coloring of the graph) obtained with the eigenvalues of the graph (i.e. the eigenvalues of the adjacency matrix of the graph).

This is the reason why we have introduced, as the last main topic, the spectrum of Cayley graphs. In order to obtain information about the spectrum of a Cayley graph, we have to investigate the structure of the group

In the appendices there are definitions and main properties about presentation of finite groups and representation theory, two subjects that are foundamental for studying the concepts presented in this work.

Request thesis

Richiedi tesi