In the above 6,000 person family tree cleaned and organized using graph theory, individuals spanning seven generations are represented in green, with their marital links in red. Photo: Columbia University

A crowdsourced set of genealogical data has produced what could be the best example of an interconnected, whole human family: a family tree extending to 13 million people.

The massive web of lineage is presented in the latest issue of Science, and produces some surprising findings. For instance, the people of the 21st century are more diverse. While people before 1850 married their fourth cousins on average, currently we only get hitched to our seventh cousins.  

“We leveraged genealogy-driven media to build a dataset of human pedigrees of massive scale that covers nearly every country in the Western world,” the study authors write. “Multiple validation procedures indicated that it is possible to obtain a dataset that has similar quality to traditionally collected studies, but at much greater scale and lower cost.”

The genealogical information was collected from 86 million profiles that are public on, a genealogy site. The computerized mathematical graph theory organized the data, and reconstructed 5.6 million separate family trees of varying degrees of generations and numbers of individuals, they report. The largest of these was the 13 million chain of lineage, the authors report.

Eighty-five percent of the profiles originate from Europe and North America, and show birth and death locations.

The processed data show some changing trends of partnership. For instance, the mobility data shows that people in America generally found their spouses within 6 miles of where they were born before 1750. But by 1950, that distance had increased to 60 miles by 1950, they report.

Also, the marriages became more distant because of changing social norms – like how we currently only marry our seventh cousins, they add.

Another goal of the data was to determine genetic ties to longevity. By assessing some 3 million people born between 1600 and 1910 who had lived past the age of 30, they concluded that particular good “longevity” genes can add approximately five years to someone’s life. But that is relatively low, considering factors such as smoking can subtract 10 years.

The lead author of the study is Yaniv Erlich, a Columbia University computer scientist who is also the chief scientific officer at MyHeritage, the DNA and genealogy company which own s

Erlich said in a Columbia statement that many hands made a manageable work load to collect all the data together.

“Through the hard work of many genealogists curious about their family history, we crowdsourced an enormous family tree and boom, same up with something unique,” said Erlich.