A while ago I ran a few experiments with SPatial Ancestry Analysis (SPA) for selected Eurogenes members (see here). The results were very impressive, but only thanks to my large collection of samples, many of which are private. A few additions have now been made to the SPA package to make it more accessible to the average 23andMe user, including "model" files. What this means is that it's no longer necessary to have a reference dataset to analyze single samples.
SPA can predict the ancestral origins for 23andMe users with genotype file. Let's assume that you have an account with 23andMe. Then you can follow the below steps to get your prediction.
Download SPA software.
Download the models for world and europe and decompress them.
Put your 23andMe genotype file into the same directory as the above two.
Open a terminal if you are using Mac OS X, or a command window if you are using Windows.
Go to the directory where you put SPA software, models and 23andMe genotype file.
Run the following command in your terminal or command window. Make sure to replace 23andMe.txt with your 23andMe genotype file name.
spa --mfile 23andMe.txt --model-input europe.model --location-output europe.loc
Double click the resulting world.loc.html or europe.loc.html to check where your ancestral origin is. Note that if you are European population, you only need to check europe.loc.html for a better resolution and if you are non-European population, you only need to check world.loc.html.
SPA can also predict two origins in the case that your mother and father are from different locations. In order to do that, use the versions of the command lines that include -n 2 below.
spa --mfile 23andMe.txt --model-input europe.model --location-output europe.loc -n 2
My dual result can be seen below. I'm actually Polish rather than Finnish/French, but I guess it's kind of the same thing...or not. Anyway, for all the instructions and updates see the SPA website.
Also worth mentioning is that the SPA team will be making a presentation at the upcoming ASHG 2012 Annual Meeting. It looks like we can expect more interesting updates to the the program any day now.
A model-based approach for analysis of spatial structure in genetic data.
W. Yang1,4, J. Novembre3,4, E. Eskin1,2,4, E. Halperin5,6,7 1) Department of Computer Science, UCLA, Los Angeles, CA; 2) Department of Human Genetics, UCLA, Los Angeles, CA; 3) Department of Ecology and Evolutionary Biology, UCLA, Los Angeles, CA; 4) Bioinformatics IDP, UCLA, Los Angeles, CA; 5) International Computer Science Institute, Berkeley, California, USA; 6) Department of Molecular Microbiology and Biotechnology, Tel Aviv University, Tel Aviv, Israel; 7) School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
Characterizing genetic diversity within and between populations has broad applications in studies of human disease and evolution. Two key step towards this objective are spatially global ancestry inference, which aims at predicting geographical locations for the ancestries of individual, and spatially local ancestry inference, which aims at predicting the geographical locations for chromosome segments, or ancestry blocks. We propose a new approach, SPALL (SPatial Ancestry analysis LocaL), for solving the two inference problems in a unified probabilistic model. This model takes linkage disequilibrium into account and can be solved efficiently by Expectation Maximization (EM) algorithm in conjunction with forward-backward algorithm. This new method allows us to assign geographical locations for parents, grandparents, and ancestries from more generations ago of an given individual. It also allows us to assign geographical locations for each locus-specific variant. We analyzed a European and a worldwide dataset, and showed that the SPALL can actually predict locations with a high accuracy. The proposed model is build as a generalization of our recently published work called Spatial Ancestry Analysis (SPA), which explicitly models the spatial distribution of each SNP by assigning an allele frequency as a continuous function in geographic space. The method allows us to assign an individual, or an admixed individual to geographical locations instead of predefined categories of population.
SPatial Ancestry analysis (SPA) "model" files from Eurogenes