Research within the VIROGENESIS project is divided into specific work packages, each directed towards a specific bottleneck for the implementation of NGS and meta-genomics technologies for virus discovery, detection and tracing in a clinical and epidemiological context.

A first hurdle is the poor resolution of current metagenomic classifiers and reducing the proportion of unassigned sequence reads. Developed tools will ensure dedicated assembly of viral populations from metagenome data at reduced computational costs, in presence or absence of a reference alignment. Novel virus detection models will allow to identify unmapped reads as potential viral genes and detect new viruses, in addition to fast annotation for known virus families.

A second hurdle is the inadequate performance of comparative, phylogenetic and phylodynamic methods in the analysis of high-throughput NGS data. During outbreak investigations, the characterization of virome data can provide important insights into microbial shifts, the source of transmission, the geographical origin, the modes and the extent of the spread of pathogens. However, appropriate phylogenetic and phylodynamic inference models are needed that can deal with large NGS datasets lacking fully resolved assemblies of genomes or gene. Furthermore, fast and sensitive algorithms for assessing and comparing the microbial and viral diversity should be developed.

A third hurdle is the lack of appropriate visualization software to present the wealth of information resulting from large scale phylogenetic and phylodynamic analyses to clinical, research and public health users. The development of novel dynamic visualisation software should focus on the challenges related to scalability, uncertainty, associated and interactivity.