AutoCluster

AutoCluster groups together your DNA Matches into clusters of matches that most likely descend from common ancestors. This analysis is available for profiles from 23andme and FamilyTreeDNA and the results are presented in an interactive visualization.

Register now!

AutoCluster concepts

AutoCluster organizes your DNA matches into shared match clusters that likely represent branches of your family. In the visualization of this analysis, each of the colored cells represents an intersection between two of your matches, meaning, they both match you and each other. If we would perform this clustering of matches using first cousins, they would sort into two groups: maternal and paternal. However, if we cluster our matches based on our 2nd cousins, they could form into four groups. Dana Leeds of the Leeds method suggests starting the first analysis using matches between 400 and 90 cM. In this example, we performed the analysis using a range between 400 and 70 cM. The different clusters represent the four grandparental lines.

AutoCluster concepts

AutoCluster organizes your DNA matches into shared match clusters that likely represent branches of your family. In the visualization of this analysis, each of the colored cells represents an intersection between two of your matches, meaning, they both match you and each other. If we would perform this clustering of matches using first cousins, they would sort into two groups: maternal and paternal. However, if we cluster our matches based on our 2nd cousins, they could form into four groups. Dana Leeds of the Leeds method suggests starting the first analysis using matches between 400 and 90 cM. In this example, we performed the analysis using a range between 400 and 70 cM. The different clusters represent the four grandparental lines.

Annotated AutoCluster clusters

By applying lower minimum cM values more clusters can be identified. In this example, the different lines are annotated in the chart.

Annotated AutoCluster clusters

By applying lower minimum cM values more clusters can be identified. In this example, the different lines are annotated in the chart.

Results in Excel file

In some cases, especially when using low minimum cM values, the charts become quite large and difficult to interpret on the screen. An Excel file is therefore provided that contains the clusters as well.

Results in Excel file

In some cases, especially when using low minimum cM values, the charts become quite large and difficult to interpret on the screen. An Excel file is therefore provided that contains the clusters as well.

Super-clusters

Sometimes clusters group together to form larger clusters, so-called super clusters. These clusters might represent ancestral lines that are related to each other.

Super-clusters

Sometimes clusters group together to form larger clusters, so-called super clusters. These clusters might represent ancestral lines that are related to each other.

Table with DNA match information

Underneath the AutoCluster report a table is available that contains information concerning the matches per cluster.

Table with DNA match information

Underneath the AutoCluster report a table is available that contains information concerning the matches per cluster.

FTDNA AutoCluster analysis

Some of the AutoCluster options are specific for a certain DNA testing company. Let's start with FamilyTreeDNA. In addition to the maximum and minimum shared cM it is possible to specify the minimum size of the largest DNA segment that is shared with the match. The screenshot of the interface shows some additional options which will be discussed in an upcoming newsletter.

FTDNA AutoCluster analysis

Some of the AutoCluster options are specific for a certain DNA testing company. Let's start with FamilyTreeDNA. In addition to the maximum and minimum shared cM it is possible to specify the minimum size of the largest DNA segment that is shared with the match. The screenshot of the interface shows some additional options which will be discussed in an upcoming newsletter.

23andme AutoCluster analysis

For 23andme analyses, two specific features stand out. First, it is possible to specify the minimum shared cM between shared matches. Raising this threshold might help users that originate from endogamous groups. Next, 23andme provides information concerning overlapping DNA segments between shared matches.

23andme AutoCluster analysis

For 23andme analyses, two specific features stand out. First, it is possible to specify the minimum shared cM between shared matches. Raising this threshold might help users that originate from endogamous groups. Next, 23andme provides information concerning overlapping DNA segments between shared matches.

23andme triangulated segments

These segments have a high probability of forming triangulating groups. It is also possible to cluster matches using these overlapping segments. To visualize the overlapping segments a DNA helix is added to the visualization.

23andme triangulated segments

These segments have a high probability of forming triangulating groups. It is also possible to cluster matches using these overlapping segments. To visualize the overlapping segments a DNA helix is added to the visualization.

FTDNA/23andme group-like analysis

The group-like AutoCluster analyses allow for a focused analysis of certain groups of DNA matches to create clusters. This reduces the computational load on the servers of FTDNA or 23andme and provides a more targeted cluster analysis. Even more powerful is the addition of the "extend cluster" feature. Enabling this option gathers the shared matches of the DNA matches in whatever group you have selected and retrieves shared matches for these as well.

FTDNA/23andme group-like analysis

The group-like AutoCluster analyses allow for a focused analysis of certain groups of DNA matches to create clusters. This reduces the computational load on the servers of FTDNA or 23andme and provides a more targeted cluster analysis. Even more powerful is the addition of the "extend cluster" feature. Enabling this option gathers the shared matches of the DNA matches in whatever group you have selected and retrieves shared matches for these as well.

MyHeritage & GEDmatch

In 2019 MyHeritage licensed our AutoCluster tool. The AutoCluster approach was also integrated within the GEDmatch website and is available for Tier 1 users.

MyHeritage & GEDmatch

In 2019 MyHeritage licensed our AutoCluster tool. The AutoCluster approach was also integrated within the GEDmatch website and is available for Tier 1 users.

Rule-based AutoCluster

The rule based AutoCluster allows users to filter and/or merge their matches using matches from other profiles (of the same DNA testing company). Three different rules allow for the exclusion (NOT rule), inclusion (AND rule) or combination (OR rule) of matches after which the resulting matches are used for an AutoCluster analysis. The use of these rules allow for a focus on matches from a particular branch of the family, for instance paternal or maternal matches. This feature might therefore even be more relevant for persons with an unknown parentage to their birth families (for instance adoptees or donor conceived persons that have tested one biological parent). By applying the NOT rule on the matches of the known biological parent, the AutoCluster analysis is performed without matches of the known biological parent. This strategy allows people to solely focus on the clusters of the unknown parent.

Rule-based AutoCluster

The rule based AutoCluster allows users to filter and/or merge their matches using matches from other profiles (of the same DNA testing company). Three different rules allow for the exclusion (NOT rule), inclusion (AND rule) or combination (OR rule) of matches after which the resulting matches are used for an AutoCluster analysis. The use of these rules allow for a focus on matches from a particular branch of the family, for instance paternal or maternal matches. This feature might therefore even be more relevant for persons with an unknown parentage to their birth families (for instance adoptees or donor conceived persons that have tested one biological parent). By applying the NOT rule on the matches of the known biological parent, the AutoCluster analysis is performed without matches of the known biological parent. This strategy allows people to solely focus on the clusters of the unknown parent.

Rule-based AutoCluster example

To illustrate the rule based AutoCluster, we combined the matches (and shared matches) of five siblings. The first sibling contributed 146 matches, whereas the matches of the the remaining uncles added respectively 81, 41, 20 and 16 matches. Shared matches that are missing in the original dataset are also added. An anonymized version of this output can be found here. The plus symbols in the visualization either represent new shared matches or completely new matches (=plus symbol in the diagonal). For instance cluster 1, 9 and 10 were already almost completely present in the primary profile. Clusters 2, 5 and 6 are however completely new clusters whereas cluster 7 represents a mixture of matches that were already present and new (shared) matches.

Rule-based AutoCluster example

To illustrate the rule based AutoCluster, we combined the matches (and shared matches) of five siblings. The first sibling contributed 146 matches, whereas the matches of the the remaining uncles added respectively 81, 41, 20 and 16 matches. Shared matches that are missing in the original dataset are also added. An anonymized version of this output can be found here. The plus symbols in the visualization either represent new shared matches or completely new matches (=plus symbol in the diagonal). For instance cluster 1, 9 and 10 were already almost completely present in the primary profile. Clusters 2, 5 and 6 are however completely new clusters whereas cluster 7 represents a mixture of matches that were already present and new (shared) matches.

Chromosome segments from DNA matches in clusters

A chromosome browser allows user to perform a graphical comparison between one or more matches to see how much DNA the user shares in common with them. Before we visualize the shared DNA segments we perform a clustering to group segments that are overlapping (min 5 cM). Next, these segment clusters are visualized using a certain color. In addition to the graphical representation a table is available that contains the detailed information for the segment clusters. Segments for the DNA matches for each AutoCluster cluster are available and can be accessed using the table underneath the chromosome browser. This table contains a link to the detailed chromosome browser, the number of multiple segment clusters, number of single segment clusters and number of clusters that are on the X chromosome.

Chromosome segments from DNA matches in clusters

A chromosome browser allows user to perform a graphical comparison between one or more matches to see how much DNA the user shares in common with them. Before we visualize the shared DNA segments we perform a clustering to group segments that are overlapping (min 5 cM). Next, these segment clusters are visualized using a certain color. In addition to the graphical representation a table is available that contains the detailed information for the segment clusters. Segments for the DNA matches for each AutoCluster cluster are available and can be accessed using the table underneath the chromosome browser. This table contains a link to the detailed chromosome browser, the number of multiple segment clusters, number of single segment clusters and number of clusters that are on the X chromosome.

A chromosome browser

In addition, it is now possible to generate a chromosome map from your clusters of 23andme or FTDNA DNA matches into DNA painter using the cluster auto painter tool. Importing the chromosome map from your clusters of DNA matches into DNA painter allows you to: Make notes and identify clusters as maternal or paternal Look at the segments behind the clusters and identify potential pile-up regions.

A chromosome browser

In addition, it is now possible to generate a chromosome map from your clusters of DNA matches into DNA painter using the cluster auto painter tool. Importing the chromosome map from your clusters of DNA matches into DNA painter allows you to: Make notes and identify clusters as maternal or paternal Look at the segments behind the clusters and identify potential pile-up regions.

Register now!

More YouTube videos can be found in our FAQ