Buchans as a case study in Clustering

I have used clustering extensively in my Buchan study, and it illustrates well the principles of using clusters to solve mysteries (or to break down brick walls). Diahan Southard paid me to write about it on her YourDNAGuide blog, and last year she used it at a conference, and gave me permission to use her slides.

My ‘nearest’ family history mystery were the parents of my x2 great-grandfather Robert Buchan. My x2 great-grandparents independently emigrated to Australia from Scotland in the 1850s, married in 1860 and had a son Robert in 1863, as well as nine other children. Robert’s wife was Margaret Bain, a lass from the Highlands, and I could trace her family easily. From their marriage certificate I knew Robert’s parents to be Robert Buchan and Janet McCray, but not where they came from. The emigrating Robert was said to be born about 1834 in Edinburghshire, now Midlothian in Scotland. But I could not find his baptism or their marriage anywhere in Scotland. Not did I find any siblings.

Although I knew the names of my x3 great-grandparents, in practice I was looking for my x4 great-grandparents Mr Buchan and his wife (I might presume) and Mr McCray and his wife. McCray is a very rare name in Scotland, and so I expected she might have been a McRae.







I solved the mystery doing Diahan Southard’s DNA Skills Workshop in 2022 within three weeks. It all involved using the right clusters. What were they? The family structure in a DNA sense is shown here:


Step 1 – isolate the DNA of the emigrant Robert Buchan

I needed to use the DNA of Robert’s descendants (THEIRS) to isolate a cluster of matches who were descendants of HIS (the Buchan family). It should be clear that the Buchan family was really the Buchan and McCray families. In forming this cluster, I would be gathering matches who might descend from either of Robert’s parental sides.

The first clustering was therefore based on Robert and Margaret’s descendants. From extensive existing research on this family, I knew of three people in Ancestry who were a generation closer to Robert and Margaret than I was. One of these was my father’s first cousin (1C) Pat, my 1C1R, whose test I managed. I was able to use her because an older 1C1R shares only my great-grandparents (who were her grandparents). Robert and Margaret had named their second son Robert, the third in the line (to date at least). They had broken with Scottish naming traditions, and later knowledge might explain why.

Prior to the availability of auto-clusters at Ancestry, I used the three older matches to build a cluster of about 60 people. The three matched each other as expected since they were full 2C to each other. In practical terms this cluster was given a Dark Blue dot and called Buchan+Bain. Some of these matches were descendants of Margaret Bain’s known sisters (ie HERS matches) and these needed to be eliminated. So, the second cluster I needed was one of Bain DNA. 

Once again, this family was already well researched, so I could identify quite a few Bain-only descendants in the original cluster. I started a Bain-only cluster de novo as I needed to go beyond the several Bain descendants in the original cluster, to find all Bain-only descendants, because the purpose of this new cluster was to eliminate them from a Buchan-only cluster. There would be matches that might be under 75cM whom I did not recognise and might therefore have left in as Buchans when really, they were not! In fact, there were Bain only descendants in Scotland that I was certainly unaware of till I fully tackled this family in 2025, and some were found as matches. This second cluster was given a Yellow dot and called Bain+Matheson.

When I did this work in 2022 there were no Thrulines for Robert Buchan senior or for Janet McCray. Since building up the family tree, Ancestry now provides these, and occasionally new matches are added.

Step 2 – create a Buchan-only cluster

It is necessary to identify the Buchan-only cluster with a new Light Blue dot, which was allocated if a match had a Dark Blue dot but NOT a Yellow dot. This is because you can only filter matches on Ancestry (and on MyHeritage) by having a dot, not be ‘not having a dot’. It may seem tiresome, but in fact makes things incredibly clearer. Buchan-only matches may have both the Dark Blue dot and the Light Blue dot, but smaller matches after extending the cluster, might have only the Light Blue dot. The label given was 'Buchan+McCray'.

Some of the matches with a Light Blue dot were THEIRS, like me and my siblings and Pat and all the known descendants of Robert and Margaret. But the ones who are not THEIRS must be HIS, must be descendants of Robert’s siblings if he had any (and I only ever found half siblings), or descendants of his Robert’s 1Cs, therefore sharing Robert’s grandparents. If the latter was the case, then these matches would be 4C to Pat – which has a 50% chance of matching if they had tested. However, this was much better than using myself and my siblings – any matches would therefore be our 5C, which has only a 10% chance of matching if they had tested. It was critical that I had Pat’s test, and later when I was investigating the McCray family, I was advised to gather more matches of her generation to widen the pool of McCray DNA (as you’ll see later). KJL shared her match list, while KP did not return my messages and is now well into his 90s if he is still alive. As a cautionary tale, he once messaged me with a ‘be in touch’ reply, but because I was not actively researching the family I failed to do so. I now think his test is in limbo.

Diahan also calls the Light Blue group ‘Leftovers’ as shown in her seminal graphic of her Plan;


For all Diahan's great educational resources see yourdnaguide.com

Step 3 – examine the left-over Light Blue matches for common surnames and locations

At this stage I had 11 matches, six with trees.  I was heartened that one male match had the surname Buchan, and his tree extended to his x2 great grandfather George Buchan who was from Midlothian in Scotland. I was more heartened when four matches included the name Inglis or Mckenzie, though all were seemingly unrelated and all the trees were small (One had only five people).



 

I did not know how to use the ScotlandsPeople website at that time, but another very transferrable strategy is to look for other trees on Ancestry with the names in the match’s trees. This can be done from within the match’s DNA profile page. An amazing death certificate was attached to one of these non-match’s trees. A George Buchan died in 1865, and the informant for his death was his brother Robert. George was born about 1802, so was of the right age to be a brother to my Robert Buchan (my x3 great-grandfather). This man had a lot of descendants, so there were several trees naming his parents as George Buchan and Janet (or Jean or Jane) Johnson. Similarly, I found trees showing an Isabella Buchan who married Daniel Inglis, and this Isabella had the same parents. I was confident that I had found Robert’s parents.

Descendancy research took over, as I documented all of Robert’s siblings, found his second marriage and an illegitimate daughter born just a year or two after the emigrant Robert. Now I know to search for siblings by conducting a search of the death records on ScotlandsPeople using the parents’ names. All but one of his siblings died after 1855, and therefore detailed death certificates exist for eight siblings (including Robert's, and one outside of Midlothian) all naming George Buchan and Jean Johnson as their parents. As is common in Scotland in the early 1800s, only three of these siblings have baptism records.



Step 4 – search for the McCray members of the cluster

Just as I did to identify the Buchan-only cluster, I needed to identify and label the now-recognised descendants of George and Jean. These matches got an Orange dot and the label was 'Buchan+Johnson'. Once again, I had to expand this cluster by adding the shared matches of these people who had not already been collected, and they all had small cM values. ProTools allowed me to find matches below 20cM down to 8cM. Without knowing who was significant, I included them all. Matches in the Orange group who were not descendants of George and Jean were given a Pink dot called 'McCray+?'

I now knew that there were half 3C and 4C matches to Pat in the cluster, and these same people were half 4C and 5C to me. The average cM value for half 3C is 48, 4C is 35 and 5C is 25, so they were findable in the match list.

In 2026 I have 49 matches in this Pink group using Pat’s test. With no family lines to guide me, I chose the highest match and collected all of their shared matches into what I called Target A group which got a Purple dot. Similar to doing a Leeds analysis, I then found the next highest match without a Pink dot, collected all their shared matches to form the Target B  group. And so on, so that I was able to find five mutually exclusive clusters. Two clusters turned out to be Bain+Matheson groups, missed when I formed these from Pat’s list, because it seems Pat has a lot less Bain DNA than Buchan DNA.

No amount of tree building has yet found definitive answers. In Target A group I can take six matches back to a couple in Lanarkshire in the 1820s. But these six are three pairs of closely related people, and the 1820s is too late to identify Janet McCray who was likely born about 1813 (when Robert was born). As well, the parents of this 1820s couple remain elusive.




Summary

Clusters allowed me to find the parents of Robert Buchan senior and his other children, but not to find the parents of Janet McCray. Was Janet an only sibling (to have descendants) and I am actually searching for a generation even further back?

In total I made ten clusters as shown above. Although these Buchan families now populate Thrulines, still nothing emerges independently for Janet McCray. 

One of the most useful tools I find to understanding a cluster is to draw out the DNA family, and this is one of the clearest ways to explain to others what all the cMs mean. This can be done on paper, on Lucidchart software, using Excel, or other diagramming programs. I use Lucidchart. In the following chart of the DNA descendants of Robert, the only one of the Buchan siblings to have more than one partner, DNA matches are in cream boxes, a red border indicates I have access to segment data, and a blue box indicates the match has the surname of Buchan. For simplicity I have omitted cM values.



I also highly recommend writing your cluster development and findings into either a research note or a story (or a talk!) Not only will this step clarify your own mis-steps, it helps to remind you of how you got where you are. In the ever changing world of DNA analysis, new matches, new tools, new objectives, it is easy to fall into a mass of confusion. Text and graphs are really helpful to avoid such confusion. [It works for me].

Comments

Popular posts from this blog

What is a Family-Specific match list?

Bottoms Up!

Finding Isabella and Dan Inglis

So few descendants - why finding DNA matches can be so hard

My life in ScotlandsPeople Centres

The Expanding Buchan Study in 2025