The iReceptor Data Integration Platform takes this distributed approach and applies it to the domain name of next generation sequencing (NGS) of antibody/B-cell and T-cell receptor repertoires

The iReceptor Data Integration Platform takes this distributed approach and applies it to the domain name of next generation sequencing (NGS) of antibody/B-cell and T-cell receptor repertoires. to facilitate sharing and comparing AIRR-seq data. The iReceptor Scientific Gateway links distributed (federated) AIRR-seq repositories, allowing sequence searches or metadata queries across multiple studies at multiple institutions, returning sets of sequences fulfilling specific criteria. We present a review of the development of iReceptor, and how it fits in with the general trend toward sharing genomic and health data, and the development of standards for describing and reporting AIRR-seq data. Researchers interested in integrating their repositories of AIRR-seq LUF6000 data into the iReceptor Platform are invited to contactireceptor-help@sfu.ca. Keywords:immune repertoires, vaccines, therapeutic antibodies, cancer immunotherapy, distributed data federation, data sharing == 1. INTRODUCTION == The integration of large-scale genomic data with extensive health data is usually revolutionizing biomedical research and holds great potential for improving patient care. However, our ability to share these large-scale data across studies and institutions is limited. Facilitating sharing these data across studies will greatly increase sample sizes, strengthening our statistical inferences, and will be vitally important to searching for the patterns that underlie personalized medicine approaches, as we try to develop specific therapies based on an individuals genotype, personal exposure history, LUF6000 and clinical response. Goodhand (1) has argued that one efficient way to facilitate sharing data across studies and institutions is usually by establishing federated systems of data repositories. The iReceptor Data Integration Platform takes this distributed approach and applies it to the domain name of next generation sequencing (NGS) of antibody/B-cell and T-cell receptor repertoires. This review covers the development of the iReceptor Data Integration Platform, an implementation of a data commons for Adaptive Immune Receptor Repertoire (AIRR)-seq data, guided by the principles set out by the AIRR Community (airr-community.org; (2)). In this debut paper, we discuss the history and philosophy of iReceptor, the present status and future goals of the iReceptor Platform, and some of the challenges to attaining these goals through a federated system of repositories. We then present the results of two use cases to show the power of data integration across studies and repositories. Finally, we invite researchers who are producing AIRR-seq data to join the iReceptor network to facilitate sharing of their data. == 2. AIRR-SEQ DATA: CHALLENGES AND COMMUNITY RESPONSE == The adaptive immune system has evolved a unique molecular diversification mechanism designed to produce a highly diverse set of antigen receptors. This diverse set LUF6000 of antibody/B-cell and T-cell receptors is necessary to recognize and remove the vast and ever-changing array of pathogens that an individual will encounter over a lifetime, while differentiating these pathogens from self. This unique genetic mechanism, and the sheer immensity of the Antibody/B-cell and T-cell response, presents challenges for producing, storing, sharing and analyzing these data. The unique mechanism involves recombining sets of V-, D-, and J-genes that encode these receptors, along with the introduction of variability at the joints between these recombined gene segments (3). As a result of this recombination process, the random pairing of Ig heavy and light B-cell receptor (BCR) chains (or paired T-cell receptor (TCR) chains), and somatic hypermutation (which is unique to B-cell receptors (4)), the diversity Rabbit Polyclonal to MAN1B1 from the adaptive immune receptor repertoire exceeds the coding capacity from the genome greatly. For example, it’s estimated that human beings express 100 million or even more exclusive B-cell and T-cell receptors (5)(6) (7). It had been in ’09 2009 that NGS techniques were first utilized to characterize this Adaptive Defense Receptor Repertoire in beautiful detail, creating 106or 107sequences, for multiple period points, per test (AIRR-seq data). These data models have become in proportions and quantity quickly, and can be found in multiple repositories across labs, institutions and studies. Not merely perform these AIRR-seq data models comprise many an incredible number of sequences per test frequently, in addition they require extensive analysis or control after sequencing also to being interpreted prior. Such analyses are performed inside a sequential group of data or steps analysis pipelines that vary between investigators. An average data evaluation pipeline starts with uncooked reads (frequently by means of FASTQ sequences) made by NGS sequencing technology. Low-quality sequences are taken off these base-call data, which is accomplished with arbitrary cut-offs often. Paired-end reads are merged right into a solitary sequence to acquire full reads, frequently with arbitrary tips for excluding short sequences and imprecise merges seemingly. Different algorithms are utilized for assigning Adjustable (V-) after that, Variety (D-) and Becoming a member of (J-) gene section usage as well as for assigning somatic mutations regarding antibody/B-cell sequences (evaluated in (8)(9)). Furthermore, many very different techniques may be used to determine and characterize clonal lineages (each clonal lineage becoming the group of descendants of confirmed ancestral B- or T-cell created through the advancement of a person). For instance, clones could be identified on the foundation.