03250nam a2200265 a 450000100080000000500110000800800410001910000180006024500940007826003170017230000190048950000210050852022110052965000140274065000260275465000180278065300310279865300250282965300340285465300160288870000170290470000240292170000260294570000130297119159062023-02-27 2011 bl uuuu u01u1 u #d1 aVIEIRA, F. D. aDatabases & data integration text mining & information extraction.h[electronic resource] aIn: INTERNATIONAL CONFERENCE OF THE BRAZILIAN ASSOCIATION FOR BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 7.; INTERNATIONAL CONFERENCE OF THE IBEROAMERICAN SOCIETY FOR BIOINFORMATICS, 3., 2011, Florianópolis. Proceedings... Florianópolis: Associação Brasileira de Bioinformática e Biologia Computacionalc2011 aNão paginado. aX-MEETING, 2011. aIn recent years, with the joint development of biotechnology and bioinformatics, applications that use DNA analysis in the animal breeding area have spread out. In particular, applications that make use of molecular markers of type SNP are among the most important ones. With the recent advances achieved by the DNA sequencing technology, it is possible to identify hundred of thousands of SNPs variants per animal. In this work, we propose a methodology to identify in a dataset a set of SNPs whose allele frequencies certify each of the sheep breeds (Creole, Santa Ines and Morada Nova). We used a dataset of SNP markers provided by the Sheep Genome Consortium International, obtained through the Network of Animal Genomics Embrapa. We applied data mining techniques, especially attribute selection methods and algorithms for generating association rules. The first step was to make the selection of attributes, due to the high number of SNP markers, with almost 60,000 markers for each animal. Subsequently, we applied the Apriori algorithm in order to generate association rules with the purpose of obtaining SNPs whose allelic values could determine the breed that each animal belongs to. The results showed that, with minimal support of 15% and minimum confidence of 90%, some SNPs have values that appear only in a particular breed. The best rules appear at the top of the list due to high values of support and confidence achieved. It was observed that the top 35 associations rules are related to Creole, with support of 30% and confidence of 100% . The first rule indicates that one of the selected markers with a certain a homozigous allele characterizes the breed is Creole, appearing in 22 records (30% of the total, which has 72 records) and 100% confidence, i.e., in 22 times that the marker came up with this value, the breed was related to Creole. The results need to be better evaluated and validated by experts, but they seem to be of great importance for future work. The methodology proposed in this work could be used to support breeding programs for sheep in progeny tests. References MACQUEEN, J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations. aDatabases aInformation retrieval aBase de Dados aMarcadores moleculares SNP aMineração de texto aRecuperação da informação aText mining1 aPAIVA, S. R.1 aYAMAGISHI, M. E. B.1 aOLIVEIRA, S. R. de M.1 aHIGA, R.