Abstract
The GoSh database (http://www.itb.cnr.it/gosh/) is an online resource including expressed sequence tags (ESTs) from Ovis aries and Capra hircus. A total of 58,990 sheep and goat sequences were downloade from GenBank and processed by a semi-automated pipeline, integrating public programs and Perl scripts. Data were collected in a MySQL database, which can be queried via a PHP-based web interface. Sequences were assembled and a unigene dataset was defined. Three annotation procedures were carried out on all the EST sequences and all the contig consensus sequences. A procedure was also implemented to infer statistical classification among Gene Ontology (GO) categories from theontology occurrences related to the sequences included in the database. A number of programs were used to extract features and give significance to rough sequences. Among these, AutoSNP was used to perform putative SNP detection. Further analyses were performed on the GoSh db dataset, including tandem repeats search and protein patterns identification. The web interface allows users to retrieve significant data and correspondent external links and to download selected sequences and accessory information in different formats. The resulting web site is a resource of data and links related to goa and sheep expressed genes.