Arthropod Bioinformatics Workshop // Eck Institute for Global Health // University of Notre Dame

Eck Institute for Global Health

Arthropod Bioinformatics Workshop

Instructor (alphabetical)

Anthony Bretaudeau (AphidBase)
Chris Childers, PhD (i5k Workspace@NAL)
Chris Elsik, PhD (Hymenoptera Genome Database)
Gloria Giraldo-Calderon, PhD (VectorBase)
Sujai Kumar, PhD (Lepbase)
Alistair Miles (Panoptes; Anopheles gambiae 1000 genomes project)
Monica Munoz-Torres, PhD (Apollo, GO, i5k)
Monica Poelchau, PhD (i5k Workspace@NAL)
Robert Waterhouse, PhD (OrthoDB, BUSCO, i5k, assisted by Mosè Manni and Felipe Simão)

On site, we will also have the participation of VectorBase PIs Frank H. Collins, PhD and Scott J. Emrich, PhD

About the hands-on workshops

This year the VectorBase team at Notre Dame has partnered up with a group of researchers from diverse backgrounds to provide a more inclusive, joint workshop led by scientists from seven of the most widely used arthropod informatics resources. We plan to offer a half-day of joint presentations on available tools, followed by a variety of concurrent workshops. Most workshops will be repeated to maximize opportunities for all attendees, allowing workshop attendees to receive hands-on training in, for example, both manual annotation (Apollo) and vector genomics (VectorBase). Below are brief individual workshop summaries.


See AGENDA for complete details

* Apollo
Apollo is a web-based genome annotation editor designed to manually revise and edit the structure of genomic elements (e.g., protein coding genes and non-coding RNAs). Users can also add references to external databases, functional assignments for genes and gene products with specific lookup support for Gene Ontology (GO) terms, as well as references to published literature in support of these annotations. Apollo enables collaborative, real-time curation (akin to Google Docs) and is currently used in over one hundred genome projects around the world, ranging from the annotation of a single species to lineage-specific efforts supporting annotation for dozens of organisms at a time. Learn more at

During the intro we will review biological principles required to improve on automated gene predictions and more accurately represent the underlying biology. During the hands-on sessions, we will put to use these principles for manually curating genes with the support of available experimental data. Exercises will be conducted using the honey bee (Apis mellifera) genome.

Apollo Computer Requirements: Attendees must bring their own computer. Computer must be able to wirelessly connect to the internet, must have an up-to-date version of a web browser (please use Chrome, Firefox, or Safari), and must be able to open PDF files. No additional software installations necessary. Some people find using Apollo with the aid of a mouse much easier than with a trackpad, so please bring a mouse if you think it would suit you better.

*VectorBase / Ensembl
VectorBase is a Bioinformatics Resource Center focused on invertebrate vectors of human pathogens. We currently host sequence data from 40 genomes. VectorBase also provides transcriptomic, proteomic and population biology data from colony and field specimens for an even broader selection of species. Follow these links for a quick overview of the website. The goal of our workshop will be to offer hands-on practical sessions with VectorBase data, tools and resources for scientists working in vector biology and medical entomology. Please follow this link for details about VectorBase workshop agenda.

HymenopteraMine is a data mining warehouse for the Hymenoptera Genome Database, which hosts genomes of bee, wasp and ant species. The objective of HymenopteraMine is to enable users to create annotation data sets that can be exported for use in down stream analyses. HymenopteraMine leverages the InterMine platform to combine genome assemblies and official gene sets with data from RefSeq, Gene Ontology, UniProt, InterPro, KEGG, OrthoDB and EnsemblMetazoa, as well as pre-computed gene expression information based on publicly available RNAseq. Built-in query templates provide starting points for data exploration, while the QueryBuilder tool supports construction of complex queries. The List Analysis and Genomic Regions search tools execute queries based on uploaded lists of identifiers and genome coordinates, respectively. HymenopteraMine facilitates cross-species data mining based on orthology and supports meta-analyses by tracking identifiers across gene sets and genome assemblies.

The goal of the HymenopteraMine workshop is to show researchers how to use an InterMine-based resource to create custom annotation datasets associated with their own data (e.g. their lists of identifiers or genome coordinates). Although our focus is Hymenopteran insects, the hands-on experience will be relevant to other InterMine resources, such as FlyMine. There is no need for previous genome sequencing, computer command-line or scripting experience. Example datasets will be provided, but participants may contact us ahead of time if they would like to work with their own data.

HymenopteraMine Computer Requirements: Attendees should bring a laptop computer that has an up-to-date web browser and can connect wirelessly to the internet.

* i5k Workspace@NAL
The National Agricultural Library (NAL) has implemented the i5k Workspace@NAL ( as a centralized platform to help meet the i5k initiative's goal of sequencing and analyzing 5,000 arthropod species. The i5k Workspace@NAL includes organism pages and tools for creating a project, downloading data, sequence search using BLAST, Clustal and HMMer, genome visualization using JBrowse and Apollo for manual annotation. All of our resources are open source, and include the use of Tripal for content management and OGS generation services. As of November 2016, the i5k Workspace@NAL incorporates 54 arthropod species, including species relevant to agriculture, invasion biology, systematics, ecology, evolution, and developmental research. The goals of this session are to 1) introduce the content and the tools of the i5k Workspace, and 2) demonstrate how to submit data to the i5k Workspace.

Audience: Any research group with an arthropod genome in need of a home is encouraged to participate.

i5k Computer Requirements: A laptop computer that has an up-to-date web browser and can connect wirelessly to the internet is recommended but not required.

*OrthoDB and BUSCO
Orthology is a cornerstone of comparative genomics, offering evolutionarily-qualified hypotheses on gene function by identifying “equivalent” genes in different species. The OrthoDB hierarchical catalogue of orthologues represents a comprehensive resource that delineates the evolutionary histories of millions of genes from thousands of species, especially for arthropods. OrthoDB resources and tools enable extensive orthology-based genome annotation and interpretation in a comparative genomics framework that

incorporates the growing numbers of sequenced genomes. The goal of the workshop will be to introduce researchers to OrthoDB resources and tools, including the BUSCO genomics data quality assessment tool, with the opportunity to work through participants’ questions and example queries and/or tasks.

Audience: In order to cater for as wide a range of participants as possible, the workshop will cover both broad introductory topics and then delve into the specifics. The comparative genomics analyses facilitated by orthology delineation are likely to interest researchers who already have a sequenced and annotated genome, while BUSCO assessments are likely to interest those still in the process of assembling a genome. Thus the workshop will be designed for a varied audience, and the focal topics will be motivated by requests/questions from attendees BEFORE the workshop.

OrthoDB and BUSCO Computer Requirements: Attendees should bring a laptop computer, which must be WiFi-enabled and have an up-to-date version of a web browser. No other software are necessary for the main part of the workshop, however, if you have hands-on questions relating to running BUSCO or connecting to OrthoDB using the API then your laptop should be fully equipped to do so. If you want to participate in the BUSCO hands-on part then make sure you have installed BUSCO (Windows users make sure you have the BUSCO Virtual Machine set up correctly). Participants are strongly encouraged to get in touch BEFORE the workshop with such questions/examples so that we can prepare to work through them together.

* Set up your own genome hub
Lepbase is the Lepidopteran genome database. It offers a core set of tools to make genomic data widely accessible including an Ensembl genome browser, text and sequence homology searches and bulk downloads of consistently presented and formatted datasets. The Lepbase team will be conducting a hands-on workshop to teach attendees how to set up your own genome hub in hours rather than months!

* Askomics
Research programs involving genetics, genomics and epigenetics are quickly growing. Computational challenges of analyzing datasets can be dealt with separately; integrating and interpreting large and complex biological data, however, still largely remain in the hands of biologists.

Here we will present AskOmics, a tool supporting both intuitive data integration and querying while shielding a non-expert user from most of the technical difficulties underlying the web semantic technologies (RDF and SPARQL). Because large and heterogeneous biological datasets are often difficult to integrate, AskOmics users can provide this platform with simple tabulation-separated files. This structure allows AskOmics to transform automatically data as RDF triples for storing them. Finally, for data querying, AskOmics provides a visually intuitive interface to obtain a comprehensive view of the function of a gene of interest.

Askomics, is accessible at and has been applied successfully to the analysis of large scale datasets including lncRNA, miRNA, piRNA and transcriptomic profiles, and is actively used for integrating data at the BioInformatics Platform for Agroecosystems Arthropods.

Askomics Computer Requirements: Attendees must bring their own computer. Computer must be able to wirelessly connect to the internet, must have an up-to-date version of a web browser (please use Chrome, Firefox, or Safari), and must be able to open PDF files. No additional software installations necessary. Some people find using Apollo with the aid of a mouse much easier than with a trackpad, so please bring a mouse if you think it would suit you better.


Registration Fee:
    Workshop ONLY - $50 

Registration for the DataBase workshop includes meeting materials, daily breaks and a bag lunch on Wednesday. Dinner is on your own after the joint presentations. Lodging must be booked separately.

Computer and Internet Access

Most sessions will be held in rooms without computers. Please bring your own laptop and ensure your web browser meets the requirements of the workshop you would like to attend.

Other information

For registration, costs and fees; travel and hotel accommodations and for the travel grant, please visit the home page of the database workshop and 10th Arthropod Genomics Symposium.



Please direct inquiries or comments related specifically to the VectorBase Workshop to: