Application design¶
Note
Work in progress, some of this should probably be refactored into user documentation.
Implementation¶
Varda is implemented on top of the Flask web microframework, the Celery distributed task queue, and the SQLAlchemy object relational mapper.
A typical deployment looks like this:
________
/ \
|\________/|
___________ | | ___________
/ | Database | \
| | | |
| \________/ |
| |
|
__________ ________ +------------+
/ \ ____ / | Worker 1 |
| Varda | ________ / \ ________ +------------+
|____________| / \ | Worker 2 |
| | | Broker | _______ +------------+
| REST API | \ / | Worker 3 |
\__________/ \____/ +------------+
...
|
|
__________
/ \
| Client |
\__________/
Sample types¶
Three types of samples:
- Simple sample (one individual)
- Pooled sample (multiple individuals)
- Population study
The following table gives a summary of what is stored for each sample type:
Stored data | Simple sample | Pooled sample | Population study |
---|---|---|---|
Pool size | 1 | N | N |
Variant observations | O with support 1 | N x O with support 1 | O with support N |
Covered regions | R | N x R | None |
- N = number of individuals
- O = number of observations per individual
- R = number of covered regions per individual
For all sample types, data can be imported from an arbitrary number of data sources. This means you could for example import variant observations per individual, per chromosome, or per variant type.
Pooled samples can have their individuals effectively anonymized by importing
variant observations from one big data source in which the order is not
related to the individuals. For example, cat
the VCF files for all
individuals and sort the result by genome position before sending it to the
server.
Frequency calculation¶
Only take samples into account with covered regions (to rule out population studies).
Binning of regions and observations¶
Todo. UCSC binning scheme.
Security¶
Todo: Add a page on security to the Managing Varda section.
Authentication via HTTP Basic Authentication, or API tokens, so only use with SSL.
Sample state¶
A sample can be either active or inactive (default). An inactive sample is ignored in any frequency calculations.
Importing data sources is only possible for inactive samples. A sample cannot be made active while any data source is being imported for that sample. Users can only make a sample active, not inactive.