Introduction

Varda is an application for storing genomic variation data obtained from next-generation sequencing experiments, such as full-genome or exome sequencing of individuals or populations. Variants can be imported from standard formats such as VCF files and annotated with their frequencies in previously imported datasets.

Varda is implemented as a service exposing a RESTful HTTP interface. Two clients for this interface are under development:

  • Manwë - Python client library and command line interface to Varda.
  • Aulë - Web interface to Varda.

Use cases

The following are some example use cases which Varda is designed to support.

  • Private exome variant database for a sequencing lab

    Installed on the local network, Varda can be used to import and annotate variants from all exome sequencing experiments at a sequencing lab. Additionally, the database could contain public datasets from population studies (e.g., 1000 Genomes, Genome of the Netherlands), such that all exome experiments are also annotated with frequencies in those studies.

  • Shared database between several groups

    Several sequencing centers can import their variants in a central Varda installation which can subsequently be used by the same centers for frequency annotation. The system can be setup such that annotation is only possible on previously imported data (to encourage sharing).

    Data from one center can only be accessed anonymized by other groups, since only the frequencies over the entire databased are available. To accomodate even stricter anonymity, samples can be imported after pooling.

  • Publicly sharing variant frequencies from a population study

    Variation data from a population study can be imported in a Varda installation accessible over the internet such that others can annotate their data with frequencies in the study.

For contrast, consider the following examples of what Varda is not designed to do.

  • Sharing and browsing genomic variants

    Varda is focussed on sharing variant frequencies only, and as such is not designed for direct browsing. Other systems, such as LOVD, are much more suitable for sharing and browsing genomic variants and additionally store phenotypes and other metadata.

  • Ad-hoc exploration of genomic variation

    Again, Varda is focussed on sharing variant frequencies only, and does not store additional metadata nor does it allow for effective exploration of variants. If you have variation data from a disease or population study which you want to analyse in a flexible way, have a look at gemini.

Implementation

The server is implemented in Python using the Flask framework and directly interfaces the PostgreSQL (or MySQL) database backend using SQLAlchemy. It exposes a RESTful API over HTTP where response payloads are JSON-encoded.

Long-running actions are executed asynchonously through the Celery distributed task queue.