NCBIminer is freely available, cross-platform and user-friendly software for
mining nucleotide sequence data from GenBank. It has several features that
enable users to accurately and efficiently download sequences with specific
attributes from the GenBank database:
1) it uses a novel search strategy, and
can download sequences for distantly related taxonomic groups with high
accuracy;
2) it deals with genes, CDS, rRNA, and other GenBank-defined feature
types;
3) it can filter sequences by length and similarities with the reference
sequence using user-defined parameters;
4) it can download information on DNA
sample collections, e.g. voucher specimen, country, latitude and longitude, and
collector;
5) it takes advantage of parallelization for a high efficiency
workflow. We demonstrate the use and performance of NCBIminer by downloading
sequences for the plant family Campanulaceaes. Compared to other methods,
NCBIminer harvests more and longer sequences, and is less sensitive to query
sequences.
Citation:
Xu, X., Dimitrov, D., Rahbek, C. and Wang, Z. (2015), NCBIminer: sequences harvest from
Genbank. Ecography, 38: 426–430. doi:10.1111/ecog.01055