Web-based phylogenetics of whole genomes and complex sequences in just three clicks

Phylogenetic relationships are usually inferred based on a few very specific and highly conserved genes termed phylogenetic markers, such as the 16S ribosomal RNA. However, the evolutionary history of any single gene may differ from that of other genes or of the whole organism, especially in species with ample horizontal gene transfer. To detect and quantify traces of common ancestrality, it is often useful to examine the relationships between longer and apparently unrelated stretches of the genome. CrocoPhylogeny is a web server for phylogenetic analyses of whole genomes and complex nucleotide sequences such as clusters of non-ortholog genes with severe rearrangement (full or partial inversions) and variable length. CrocoPhylogeny conducts BLASTN-like local alignment between all pairs of nucleotide sequences broken into pieces of a certain length and obtains an average sequence similarity score. CrocoPhylogeny runs as a CPU-supervised queue and file management system, while the BLASTN-like alignment core is implemented in CUDA C and runs on a high-performance GPU. Whereas its primary result is a distance matrix quantifying the similarity between all pairs of sequences, the results page of CrocoPhylogeny also provides pre-built phylogenetic trees, and links to immediatly visualize them on iTOL, a dedicated phylogenetic tree visualization server.

Full documentation about CrocoPhylogeny is available at our Wiki. Don't forget to check out the case studies.
If you found CrocoPhylogeny helpful, please cite this website.
Manuscript currently in preparation, please check again in a few weeks.
If you have any questions or comments, do not hesitate to contact us at


Licence conditions in accordance with § 11 of Act No. 130/2002 Coll. The owner of the software is Masaryk University, a public university, ID: 00216224. Masaryk University allows other companies and individuals to use this software free of charge and without territorial restrictions in usual way, that does not depreciate its value. This permission is granted for the duration of property rights. This software is not subject to special information treatment according to Act No. 412/2005 Coll., as amended. In case that a person who will use the software under this licence offer violates the licence terms, the permission to use the software terminates.

Here we provide a few sample input files for the purpose of giving the users an idea of how the input files should look like. Basically, the input files should be in FASTA format, containing only non-ambiguous nucleotides (only A, G, T, and C). Each run should include at least 3 input files.


If you want more samples, don't forget to check out our case studies. They contain more sample files together with their CrocoPhylogeny results.
If you have any questions or comments, do not hesitate to contact us at


Bacteria: whole genomes Examine results

To help in prioritizing the research and development of new and effective antibiotics treatments, the World Health Organization recently issued a global priority list of antibiotic-resistant bacteria considered to have the most substantial impact on global health at the current time. Whereas the phylogeny of prokaryotes is often judged based only on the 16S ribosomal RNA gene... [show more]


Plants: plastid genomes Examine results

Plastids, commonly referred to as chloroplasts, are organelles that perform photosynthesis in algal and plant cells. Plastids retain their own DNA, distinct from that of the cell itself, because they are the descendants of cyanobacteria that evolved within the cytoplasm of heterotrophic single-celled eukaryotes. Plastid genomes vary strongly in length and composition across different species of plants and algae... [show more]

Mammals: Y chromosome Examine results

The Y chromosome is primarily responsible for sex determination but also carries other genes. However, the Y chromosome is an enigmatic part of the genome. For example, in humans, the Y chromosome is approximately 60 million bp in length but carries less than 100 protein-coding genes. For comparison, the X chromosome is only 2.5 times bigger (approximately 150 million bp in length) but... [show more]

Please select multiple files containing nucleotide sequences in fasta format (max 100 MB or 4000 files in total; got a bigger dataset? let us know at, and we will make sure it runs for you).
Note that CrocoPhylogeny does not currently handle ambiguous nucleotides (N/n, R/r, etc.), so you will need to remove or replace these before uploading the fasta files. For tips on obtaining optimal results with your input files, check section II.1 of the CrocoPhylogeny user manual.

Each fasta file will correspond to a single leaf in the phylogenetic tree. Tree leaves will be named according to the names of the corresponding uploaded files. Fasta headers will not be used for naming tree leaves.

Upload nucleotide sequences:  

Direction of alignment:
 Only forward -- use if your dataset contains only files with corresponding sequences in the same order, such as ordered sets of genes, genomes of relatively close species, etc
 Forward and reverse -- use if your dataset may contain files with corresponding sequences in reverse order, such as in whole genomes including more distant dispecies or complex gene clusters

 Fastest    Normal    Sensitive  

Length of subsequences:
Larger values are useful for investigating similarity over longer stretches; retain default setting if unsure about how it works
 320    240    160    80  
Service version (change log).