Buchans as a case study in Clustering
I have used clustering extensively in my Buchan study, and it illustrates well the principles of using clusters to solve mysteries (or to break down brick walls). Diahan Southard paid me to write about it on her YourDNAGuide blog, and last year she used it at a conference, and gave me permission to use her slides.
My ‘nearest’ family history mystery were the parents of my
x2 great-grandfather Robert Buchan. My x2 great-grandparents independently
emigrated to Australia from Scotland in the 1850s, married in 1860 and had a
son Robert in 1863, as well as nine other children. Robert’s wife was Margaret
Bain, a lass from the Highlands, and I could trace her family easily. From
their marriage certificate I knew Robert’s parents to be Robert Buchan and
Janet McCray, but not where they came from. The emigrating Robert was said to
be born about 1834 in Edinburghshire, now Midlothian in Scotland. But I could
not find his baptism or their marriage anywhere in Scotland. Not did I find any
siblings.
Although I knew the names of my x3 great-grandparents, in
practice I was looking for my x4 great-grandparents Mr Buchan and his wife (I
might presume) and Mr McCray and his wife. McCray is a very rare name in
Scotland, and so I expected she might have been a McRae.
I solved the mystery doing Diahan Southard’s DNA Skills
Workshop in 2022 within three weeks. It all involved using the right clusters.
What were they? The family structure in a DNA sense is shown here:
I needed to use the DNA of Robert’s descendants (THEIRS) to
isolate a cluster of matches who were descendants of HIS (the Buchan family). It
should be clear that the Buchan family was really the Buchan and McCray
families. In forming this cluster, I would be gathering matches who might
descend from either of Robert’s parental sides.
The first clustering was therefore based on Robert and
Margaret’s descendants. From extensive existing research on this family, I knew
of three people in Ancestry who were a generation closer to Robert and Margaret
than I was. One of these was my father’s first cousin (1C) Pat, my 1C1R, whose test
I managed. I was able to use her because an older 1C1R shares only my great-grandparents (who were her grandparents). Robert and Margaret had named their second son Robert, the
third in the line (to date at least). They had broken with Scottish naming
traditions, and later knowledge might explain why.
Prior to the availability of auto-clusters at Ancestry, I used the three older matches to build a cluster of about 60 people. The three matched each other as expected since they were full 2C to each other. In practical terms this cluster was given a Dark Blue dot and called Buchan+Bain. Some of these matches were descendants of Margaret Bain’s known sisters (ie HERS matches) and these needed to be eliminated. So, the second cluster I needed was one of Bain DNA.
Once again, this family was already well researched, so I
could identify quite a few Bain-only descendants in the original cluster. I started a Bain-only cluster de novo as I needed to go beyond the several
Bain descendants in the original cluster, to find all Bain-only
descendants, because the purpose of this new cluster was to eliminate them from
a Buchan-only cluster. There would be matches that might be under 75cM whom I
did not recognise and might therefore have left in as Buchans when really, they
were not! In fact, there were Bain only descendants in Scotland that I was
certainly unaware of till I fully tackled this family in 2025, and some were
found as matches. This second cluster was given a Yellow dot and called
Bain+Matheson.
When I did this work in 2022 there were no Thrulines for
Robert Buchan senior or for Janet McCray. Since building up the family tree,
Ancestry now provides these, and occasionally new matches are added.
Step 2 – create a Buchan-only cluster
It is necessary to identify the Buchan-only cluster with a
new Light Blue dot, which was allocated if a match had a Dark Blue dot but NOT
a Yellow dot. This is because you can only filter matches on Ancestry (and on
MyHeritage) by having a dot, not be ‘not having a dot’. It may seem tiresome,
but in fact makes things incredibly clearer. Buchan-only matches may have both
the Dark Blue dot and the Light Blue dot, but smaller matches after extending
the cluster, might have only the Light Blue dot. The label given was 'Buchan+McCray'.
Some of the matches with a Light Blue dot were THEIRS, like me and my siblings and Pat and all the known
descendants of Robert and Margaret. But the ones who are not THEIRS must be
HIS, must be descendants of Robert’s siblings if he had any (and I only ever
found half siblings), or descendants of his Robert’s 1Cs, therefore
sharing Robert’s grandparents. If the latter was the case, then these matches
would be 4C to Pat – which has a 50% chance of matching if they had tested. However,
this was much better than using myself and my siblings – any matches would
therefore be our 5C, which has only a 10% chance of matching if they had tested. It was critical that I had Pat’s test, and later when I was investigating the
McCray family, I was advised to gather more matches of her generation to widen
the pool of McCray DNA (as you’ll see later). KJL shared her match list, while
KP did not return my messages and is now well into his 90s if he is still
alive. As a cautionary tale, he once messaged me with a ‘be in touch’ reply,
but because I was not actively researching the family I failed to do so. I now
think his test is in limbo.
Diahan also calls the Light Blue group ‘Leftovers’ as shown
in her seminal graphic of her Plan;
For all Diahan's great educational resources see yourdnaguide.com
Step 3 – examine the left-over Light Blue matches for common surnames and locations
At this stage I had 11 matches, six with trees. I was heartened that one male match had the
surname Buchan, and his tree extended to his x2 great grandfather George Buchan
who was from Midlothian in Scotland. I was more heartened when four matches
included the name Inglis or Mckenzie, though all were seemingly unrelated and all the trees were small (One had only five people).
I did not know how to use the ScotlandsPeople website at
that time, but another very transferrable strategy is to look for other trees
on Ancestry with the names in the match’s trees. This can be done from within
the match’s DNA profile page. An amazing death certificate was attached to one
of these non-match’s trees. A George Buchan died in 1865, and the informant for
his death was his brother Robert. George was born about 1802, so was of the
right age to be a brother to my Robert Buchan (my x3 great-grandfather). This
man had a lot of descendants, so there were several trees naming his parents as
George Buchan and Janet (or Jean or Jane) Johnson. Similarly, I found trees
showing an Isabella Buchan who married Daniel Inglis, and this Isabella had the
same parents. I was confident that I had found Robert’s parents.
Descendancy research took over, as I documented all of Robert’s siblings, found his second marriage and an illegitimate daughter born just a year or two after the emigrant Robert. Now I know to search for siblings by conducting a search of the death records on ScotlandsPeople using the parents’ names. All but one of his siblings died after 1855, and therefore detailed death certificates exist for eight siblings (including Robert's, and one outside of Midlothian) all naming George Buchan and Jean Johnson as their parents. As is common in Scotland in the early 1800s, only three of these siblings have baptism records.
Step 4 – search for the McCray members of the cluster
Just as I did to identify the Buchan-only cluster, I needed
to identify and label the now-recognised descendants of George and Jean. These
matches got an Orange dot and the label was 'Buchan+Johnson'. Once again, I had to expand this cluster by adding
the shared matches of these people who had not already been collected, and they
all had small cM values. ProTools allowed me to find matches below 20cM down to
8cM. Without knowing who was significant, I included them all. Matches in the Orange group who were not descendants of George and Jean were given a Pink dot called 'McCray+?'
I now knew that there were half 3C and 4C matches to Pat in
the cluster, and these same people were half 4C and 5C to me. The average cM
value for half 3C is 48, 4C is 35 and 5C is 25, so they were findable in the
match list.
In 2026 I have 49 matches in this Pink group using Pat’s test.
With no family lines to guide me, I chose the highest match and collected all
of their shared matches into what I called Target A group which got a Purple dot.
Similar to doing a Leeds analysis, I then found the next highest match without
a Pink dot, collected all their shared matches to form the Target B group. And so on, so that I was able to find five mutually exclusive clusters. Two clusters turned out to be Bain+Matheson
groups, missed when I formed these from Pat’s list, because it seems Pat has a
lot less Bain DNA than Buchan DNA.
No amount of tree building has yet found definitive answers.
In Target A group I can take six matches back to a couple in Lanarkshire in the
1820s. But these six are three pairs of closely related people, and the 1820s
is too late to identify Janet McCray who was likely born about 1813 (when
Robert was born). As well, the parents of this 1820s couple remain elusive.
Summary
Clusters allowed me to find the parents of Robert Buchan senior and his other children, but not to find the parents of Janet McCray. Was Janet an only sibling (to have descendants) and I am actually searching for a generation even further back?
In total I made ten clusters as shown above. Although these Buchan families now populate Thrulines, still nothing emerges independently for Janet McCray.
One of the most useful tools I find to understanding a
cluster is to draw out the DNA family, and this is one of the clearest ways to
explain to others what all the cMs mean. This can be done on paper, on Lucidchart software, using Excel, or other diagramming programs. I use Lucidchart.
In the following chart of the DNA descendants of Robert, the only one of the
Buchan siblings to have more than one partner, DNA matches are in cream boxes,
a red border indicates I have access to segment data, and a blue box indicates
the match has the surname of Buchan. For simplicity I have omitted cM values.
I also highly recommend writing your cluster development and
findings into either a research note or a story (or a talk!) Not only will this
step clarify your own mis-steps, it helps to remind you of how you got where
you are. In the ever changing world of DNA analysis, new matches, new tools,
new objectives, it is easy to fall into a mass of confusion. Text and graphs
are really helpful to avoid such confusion. [It works for me].







Comments
Post a Comment