About two weeks ago I learned about this website OpenSNP where people can share their genetic information and not only. It is similar to 1000genomes, but I think it is much more interesting to work with because aside of genetic information (SNP sequencing, exome, etc.) most users also share phenotype data; data is not anonymized. This is what sparked my interest.
With phenotype data and user’s genetic mutations – SNPs – (or other relevant genetic information), I could run analyses and find possible correlations. This is applied big data.
In this post, I’ll explain how I conducted my first analysis. I want to provide an outline with enough relevant details so I can have a reference point to make things easier in future analyses. Of course, I could simply do this in private but I’d rather post it on the blog so that others who are interested to run similar analyses can have starting point.
This involves: knowledge of genomics, genomics related software and raw data formats, programming, and a lot of patience.