Compute Ontario Research Day 2015

May 21
Conestoga College Institute of Technology and Advanced Learning

Poster Presenters

Ahmed Elbassiouny

Research Assistant, University of Toronto

Genomic Approaches for Understanding Electric Signals in Fishes

Electric fishes use electric organs to produce weak electric signals for navigation and communication. Different species of electric fishes produce different types of electric signals. However, the molecular mechanisms underlying the production and variation of electric signals remain poorly understood. Here, we describe the use of high performance computing resources to analyze high-throughput data from Next Generation RNA-sequencing experiments of electric organs.

Marcial Garbanzo-Salas

doctorate degree student, Western University

Atmospheric Studies with Radar Data and Simulations; HPC Applications in Radar Processing

HPC is widely used in physics. Radar applications provide a wide field of study where HPC techniques can be applied. Interferometry is used in radar processing to gather information about scatterers in the sky and wide beam radars provide a considerable amount of information. HPC techniques greatly improves the analysis of interferometric data and general circulation wind approximation. Finding large numbers of probable scatterers, discriminating them and solving equations for wind is no small task and better approached with HPC techniques. Another area of radar where HPC is greatly used is in atmospheric simulations. Large Eddy Simulations (LES) are used to better comprehend atmospheric motions, generation of turbulence and dissipation scales. In this presentation a simulation within a simulation is used to obtain radar back-scattering information from a virtual atmosphere. The results of the interferometric processing using HPC are also discussed.

Ioannis Haranas

Adjunct Professor, Wilfrid Laurier University

Perturbations Due to Dust in Mars Orbiting Satellites

In this paper we calculate the effect of atmospheric dust on the orbital elements of a satellite. Dust storms that originate in the Martian surface may evolve into global storms in the atmosphere that can last for months can affect low orbiter and lander missions. We model the dust as a velocity-square depended drag force acting on a satellite and we derive an appropriate disturbing function that accounts for the effect of dust on the orbit, using a Lagrangean formulation. A first-order perturbation solution of Lagrange’s planetary equations of motion indicates that for a local dust storm cloud that has a possible density of 8.323*10^(-10) kg/ m^3 at an altitude of 100 km affects the orbital semimajor axis of a 1000 kg satellite up -0.142 m /day. Regional dust storms of the same density may affect the semimajor axis up to of -0.418 m /day. Other orbital elements are also affected but to a lesser extent. Taking dust into account in more detailed effort to model the Martian gravity field high power supercomputing becomes really important.

Ricardo Harripaul

doctorate degree student, University of Toronto

Mapping Loci and Genes using a Hidden Markov Model for Bipolar Affective Disorder in Consanguineous Families

Bipolar Disorder (BD) is a psychiatric disorder characterized by transitions between depression and mania, a high rate of suicide (6% over age 20) and self-harm (30-40%). This debilitating condition has no known cause and both genetic and environmental factors contribute to its complex phenotype. We hypothesize that in rare cases, autosomal recessive mutations contribute to BD. To identify these genetic loci, 34 consanguineous Iranian families were genotyped with Affymetrix 5.0 Single Nucleotide Polymorphism microarray chips. This genotype information was analysed with the FSuite analysis pipeline and dCHIP to identify homozygosity-by-descent (HBD) regions as well as performing a HBD Genome Wide Association study to identify novel recessive risk variants. In addition, we looked for Copy Number Variations (CNVs) and through these approaches 43 large HBD regions were identified. We identified large HBD regions as the 56 Mb region on chromosome 8 (harboring candidate genes such as IMPA1, IMPAD1), a 10 Mb region on chromosome 17 (including SLC6A4) and a 7 Mb region on 5q35-2-qter including genes DRD1 and GRM6. Whole Exome and Sanger sequencing were used to search for homozygous coding mutations within these regions. Large runs of homozygosity have been identified in BD probands, including a 400 Kb loci that traverses the GRIK6 glutamate receptor gene. We have also identified 48 CNVs of interest that may disrupt candidate genes such as SYN3, SLC39A11 and S100A10. Whole exome sequencing has been applied to all families. Rare variants have been identified in more than one family for a number of genes. For instance, for ABCA13, in separate families, one homozygous nonsense and one homozygous non-synonymous variant were identified. Potential implications of these findings for genetics of bipolar disorder will be discussed.

Thomas Hemmy

master's degree student, Wilfrid Laurier University

Investigating the Impact of Horizontal Gene Transfer on Metabolic Conservation within Bacteria

Through the process of horizontal gene transfer bacteria are capable of acquiring new traits and capabilities never posed by their parents, deviating from the widely accepted view that evolution occurs in a strictly vertical fashion. Despite contrasting evidence, the idea that bacteria inherit their metabolic capabilities directly from their parents remains quite popular and problematic. In current studies it is assumed that bacteria posses the same metabolisms as other members of their species, leading to the poplar practice of using marker genes to predict the organisms present in environmental samples and the metabolic capabilities that these bacteria posses. In this work completely sequenced bacterial genomes have been functionally annotated in an effort to quantify horizontal gene transfer amongst distantly related bacteria. By using several functional annotation databases we have searched for the presence of metabolic functions being shared between distant species and the development of divergent functions within members of the same species. Preliminary results suggest that little metabolic difference can be found between members of even distantly related species; however, this occurrence seems to be due lack of functional annotation for novel metabolic functions. It appears that functional annotations exist primarily for metabolisms that are common across bacteria phyla.

Ilias Kotsireas

Professor, Wilfrid Laurier University

D-optimal Matrices

D-optimal matrices are square matrices of even order with elements from {+1,+1} that have maximal determinant. Finding D-optimal matrices of various orders is a very challenging and hard combinatorial problem. One of the most fertile and successful methods to find D-optimal matrices has been to look for a special kind called circulant D-optimal matrices. We will present circulant D-optimal matrices and their properties in detail. We will also present some classical algorithms used to find circulant D-optimal matrices.

Efficient Algorithms for Matching Problems

We will describe a class of matching problems that arise in combinatorial design theory We will outline several different approaches to solve these problems. These approaches give rise to algorithms that are for the most part amenable to parallelization, even though it is not always evident to recognize an optimal parallelization strategy.

Yuanhao Lai

master's degree student, Western University

Computing Numerical Distribution Functions for Periodicity Tests in Microarray Time Series

Identifying the periodicity in microarray time series data has become increasingly important in microarray technology. One of the methods is the Fisher's exact g-test, which can be used to test the periodicity in time series with Fourier frequencies. However, this method might fail as the time series are not guaranteed to have Fourier frequencies. As an alternative, we propose fitting a four parameter harmonic regression to each gene and use the log-likelihood ratio(LLR) as the test statistic of the frequency. A response surface regression approach is implemented with the open source software R to compute the inverse cumulative distribution functions of the statistic and hence obtain the p-value of the test. In addition, the computation is dramatically speeded up by utilizing the R parallel computing package RMPI and SHARCNET.

Ruipeng Lu

doctorate degree student, Western University

Discovery of Primary, Cofactor, and Novel Transcription Factor Binding Site Motifs by Recursive, Thresholded Entropy Minimization

Transcription factors regulate gene expression by binding to related DNA sequences of target genes. Cooperative interactions between multiple, bound factors can repress or activate expression of these genes. We apply Shannon information theory to discover conserved motifs recognized by these factors in ChIP-Seq data from the Encyclopedia of DNA Elements. The data consist of thousands of sequenced genomic fragments that have been co-immunoprecipated with a particular transcription factor. Motifs are built with Bipad, a C++ program that applies Monte Carlo-based entropy minimization to search multiple alignment space for homogeneous or bipartite models. These models can be used to determine the information contents (Ri) or binding affinity of functional binding sites and identify mutated sites. We built accurate information models for 168 transcription factors from unaligned sequences of ChIP-Seq fragments, biological and technical replicates, and from different cell lines. Resulting models were compared between replicates and immunoprecipated sequences from other cell lines, and with previously determined motifs. This process was then iterated to discover additional conserved sequence patterns in the same data. The original motif was masked, prior to derivation of a second model by entropy minimization. Those models consisting of low complexity, noise patterns were also thresholded to eliminate low read abundance ChIP-Seq peaks, and then reanalyzed with Bipad. Three quality control measures were used to evaluate the accuracy of these models including: 1) determining the Euclidean distance between the current information weight matrix and previously published motifs, 2) evaluating the linearity of Ri vs binding energy to distinguish between correct and noisy, low complexity motifs, and 3) validation of predicted binding sites with experimentally proven sites in known target genes.

Anindya Mitra

master's degree student, University of Waterloo

Electro-optical Properties of In-plane Wwitching Mode Liquid Crystal Light Shutters

Liquid crystal displays (LCDs) have been ubiquitous in electronic devices over the past several decades. The first widely commercialized type of LCD technology, the twisted nematic (TN) mode, suffers from major performance drawbacks including poor viewing angle characteristics and the requirement for transparent electronics to be present on both sides of the display leading to increased cost and reduced transparency. Modern LCD modes such as the In-Plane Switching (IPS) and the Fringe Field Switching(FFS) modes have been developed to address these issues and are used in the majority of modern LCDs. Unlike in the TN mode LCD, the IPS mode LCD requires transparent electrodes only on one side of the display. Interdigitated electrodes are used to induce an electric field which results in an electro-optical response of the LC layer within the IPS cell. One of the drawbacks is that the electric field in the IPS mode has spatial variation both in field strength and orientation, unlike the TN mode. The present work involves the study of the electro-optical properties of a IPS LC cell using a simulation-based approach. The Landau-de Gennes model for the nematic LC phase is used to study the response of a nematic LC domain under IPS conditions. Full three-dimensional transient simulations are performed which yield the response of the nematic domain in the form of an alignment tensor field. These fields are then used to determine the resulting dielectric tensor field of the nematic domain which is post-processed to yield the optical response of the IPS cell in the presence of a backlight. Design parameters such as electrode spacing, LC domain thickness, electric field strength, and temperature of operation are varied in order to determine qualitative trends in the electro-optical performance of the IPS configuration. Preliminary results are presented which show good agreement of simulation predictions compared to experimental data.

Brandon Seo

master's degree student, University of Waterloo

The Effect of Geometry and Grain Size on the Deformation of Nanocrystalline Ni Nanopillars

Metal nanospillars and nanowires have unique mechanical properties that have drawn significant attention in recent years. The plasticity in nanopillars and nanowires become more complicated compared to bulk nanocrystalline metals due to the possible contribution from the external free surfaces or grain boundaries. In order to study the deformation mechanisms of nanocrystalline metals - specifically nanocrystalline nickel (Ni), a model system of Ni nanocrystalline nanopillars with different grain size and geometry shape were constructed and tested under tensile deformation by molecular dynamic simulations using LAMMPS.

Tanyakarn Treeratanaphitak

master's degree student, University of Waterloo

Adaptive Mesh Refinement and Coarsening in CFD Simulations of Multiphase Flow

Multiphase flow systems are commonly found in industrial chemical processes. Multiphase flow systems can range from a simple phase separator to a fluidized bed reactor. These systems are highly complicated and one of the possible ways to study multiphase flow behaviour is through the use of computational fluid dynamics (CFD) simulations. CFD simulations have been extensively used to study multiphase flow systems and are a cost-effective way to isolate parameters of interest to further explore through experiments. However, in order to obtain usable information from the CFD simulations, the simulations must be numerically and physically accurate. The former depends on factors such as discretization scheme, mesh-independence, numerical stability, et cetera, while the latter depends on the formulation of the governing equations and closure laws. The focus of this study is on the numerical aspect, specifically, mesh-independence. Mesh-independence is extremely important in any kind of numerical simulation. If the mesh is too coarse, the physical details of the system are not fully captured. Any conclusions derived from a mesh-dependent solution will bear no physical meaning. To determine whether the solution is indeed mesh-independent, one can refine the mesh until the final solution remains unchanged. However, this approach is considered impractical in transient simulations. The alternative approach is to locally refine and/or coarsen the mesh based on an user-specified tolerance within each time step. This approach is also known as adaptive mesh refinement (AMR). With AMR, the mesh is continually changing at each time step to adapt to the tolerance. In areas of lower error (with respect to the tolerance), the mesh can coarsen to reduce the computation load. At the same time, areas of high error are refined to achieve the desired error tolerance. In the presented study, two-phase (gas-liquid) flow through a vertical cylinder is simulated using the multiphase, inter-dispersed Euler-Euler method with AMR in OpenFOAM. Simulation results indicate that the flux error is higher in areas with phase segregation and that the mesh was adapted accordingly.

Xingyu Wang

undergraduate degree student, University of Waterloo

On m-out-of-n Bootstrap for Testing Symmetry About an Unknown Median – Simple Data-driven Approach

Testing whether a distribution of a particular variable is symmetric can be a crucial problem in many applications ranging from ecology studies and environmental monitoring to implementation of economic policies and analysis of business development. While numerous authors have devised symmetry tests, the problem of deriving robust and computationally efficient methods for assessing symmetry about an unknown median still remains an active area of research and attracts substantial attention. We propose to employ an m-out-of-n bootstrap and develop a new data-driven and distribution-free test for symmetry about an unknown median. We assess the finite-sample performance of the test via a Monte Carlo study, and illustrate its application using water quality data. This work was made possible by the facilities of SHARCNET.

Shaobo Wei

master's degree student, Laurentian University

Somewhat Homomorphic Encryption Scheme for Secure Range Query Process in a Cloud Environment

Recently with the development of cloud computing, many service models have appeared that are based on cloud computing, such as "Infrastructure as a Service" (IaaS), "Platform as a Service" (PaaS), and "Software as a Service" (SaaS). There is also one called "Database as a Service". This service model lets the users store, manage, and access their data in a cloud database. However, the cloud database must be fully secured because of restrictions due to security problems. This corresponding research area of cloud computing is called "Cloud Security". One of the problems is that it is difficult to execute queries on encrypted data in a cloud database without any information leakage. This research proposes a secure range query process which is based on a somewhat homomorphic encryption scheme without any sensitive information leakage. The data being stored in the cloud database are integers which are encrypted in their binary forms bit by bit. A homomorphic "greater-than" algorithm is used in the process to compare two integers. Efficiency, security, and the maximum noise that can be controlled in the process are analyzed in terms of security and efficiency. Parameter settings of the process are also analyzed. Some experiments were performed to test the practicability of the secure range query process with some realistic parameter settings. Since there are many very large integers involved in the process, normal personal computers cannot compute them efficiently. The computing capabilities of SHARCNET were used in the experiments for computations involving large integers, and, at the same time, SHARCNET was also regarded as the cloud service provider of the cloud environment in the experiments.