The design represents a genomics perspective on the biodiversity of the Earth.
The basis of the design are the shape of continents. These were taken from the award-winning Authagraph Map, which is a projection that strongly preserves shapes and areas. Wired calls it “not perfect, but close”.
I rearranged the authagraph map to fit the aspect ratio of the cover image (nearly 1:1) and allow for enough room for text on the cover. To make things interesting, I put Antarctica in the center — north is everywhere. Given how the image is constructed (read below), I had to remove some islands. Sorry, Madagascar — you're a star.
Australia has the job of representing all of Oceania, such as Inaccessible Island.
For each continent and ocean, a travelling salesman problem (TSP) was solved to find the shortest (or close to it) path that covers the area. The path joins points that were sampled at a level of detail that I thought would strike a good balance between detail and legibility.
Some continents do not have full coverage by the path &mdsah; some small islands and peninsulas were excluded (or extended). This needed to be done to give room for the ocean path between continents and to avoid situations where the ocean path crossed a landmass. The shortest solution isn't always the most appropriate.
The number of points visited for each landmass were
1,575 africa 848 antarctica 2,818 asia 475 australia 713 europe 1,674 namerica 17,354 ocean 872 samerica 26,329 ALL
I then smoothed the path using a J-spline curve.
The DNA double helix is a trope — let's embrace it.
And before you start pixel peeping: the helix doesn't have handedness. It's composed of three independent stacked layers: one layer per strand strands and one layer with bonds.
Once the helix was drawn, I added bases as circles and resized to increase visibility while avoiding overlap.
The number of base pairs per area is
nbp australia 596 europe 872 antarctica 1,024 samerica 1,064 africa 1,911 namerica 2,033 asia 3,403 ocean 21,208 ------ 32,111
I mapped species sequenced as part of the Earth BioGenome Project to the path that corresponded to their habitat location. This was done using a combination of automated searches through the Global Biodiversity Information Facility (GBIF) and manual corrections.
For a given path, the length of the sequence of each species was fixed and depended on the length of the area's double helix and number of species assigned to it.
For example, in Africa we have 1,911 bases to distribute across 87 species, giving us `1911/87-1=20` bases per species. The value is rounded down to the nearest integer and the `-1` term allows for a gap between each sequence.
The ocean area is very large and has relatively few species, so we can have long sequences — 396 bases per species.
nbp species bp_per_species australia 596 71 7 europe 872 113 6 antarctica 1,024 2 511 samerica 1,064 93 10 africa 1,911 87 20 namerica 2,033 182 10 asia 3,403 206 15 ocean 21,208 52 396 ------ --- 32,111 806
Any bases left over on the helix were assigned to gaps. Initially, I thought of concatenating the sequences without any gaps. While this would allow me to represent (neglibibly) longer sequences, I was persuaded against this idea (thank you Emma!).
First, the Earth Biogenome sequencing isn't complete (nor will it ever be), so gaps in the sequence are a nice metaphor for this. Second, the gaps offer some hope of actually reading the sequences off for each species. Below is a view of one of the strands with the backbone and base pair locations within gaps shown in magenta.
In the final figure, the helix backbone in gaps is faded.
In the final figure, bases are colored by nucleotide (A, T, C, G) and region (land vs ocean).
You can download each species's sequence and its location.
Sequences were sampled from the Genbank records starting at the first non-repeated base. This was a quick way to avoid spamming the design with polytails.
For example, the sequence for the Hanuman langur
>PVII010000001.1:1-1000 Semnopithecus entellus isolate BS30 SemEnt_scaffold_0, WGS AAAAA AAAAA AAAAA AAAAA AAAAA AAATT ATTTT GCTAA GGTCT GAGAA CTCCA GAAGG TGGTG TCGTG ----- ----- ----- ----- ----- ---++ +++++ +++++ +++++ +++++ +++++ +++++ +++++ +++++ ** ***** ***** ***
was sampled by skipping the first 28 bases (A's) and uses the sequence indicated by *
above.
The position and sequence for each species is shown below. The first base of the sequence is capitalized and remaining are lowercase. The numeric code below the species name is the taxon, a unique taxonomy ID used by Genbank.
A version of the above without the sequence is shown below.
Fuelled by philanthropy, findings into the workings of BRCA1 and BRCA2 genes have led to groundbreaking research and lifesaving innovations to care for families facing cancer.
This set of 100 one-of-a-kind prints explore the structure of these genes. Each artwork is unique — if you put them all together, you get the full sequence of the BRCA1 and BRCA2 proteins.
The needs of the many outweigh the needs of the few. —Mr. Spock (Star Trek II)
This month, we explore a related and powerful technique to address bias: propensity score weighting (PSW), which applies weights to each subject instead of matching (or discarding) them.
Kurz, C.F., Krzywinski, M. & Altman, N. (2025) Points of significance: Propensity score weighting. Nat. Methods 22:1–3.
Celebrate π Day (March 14th) and sequence digits like its 1999. Let's call some peaks.
I don’t have good luck in the match points. —Rafael Nadal, Spanish tennis player
Points of Significance is an ongoing series of short articles about statistics in Nature Methods that started in 2013. Its aim is to provide clear explanations of essential concepts in statistics for a nonspecialist audience. The articles favor heuristic explanations and make extensive use of simulated examples and graphical explanations, while maintaining mathematical rigor.
Topics range from basic, but often misunderstood, such as uncertainty and P-values, to relatively advanced, but often neglected, such as the error-in-variables problem and the curse of dimensionality. More recent articles have focused on timely topics such as modeling of epidemics, machine learning, and neural networks.
In this article, we discuss the evolution of topics and details behind some of the story arcs, our approach to crafting statistical explanations and narratives, and our use of figures and numerical simulations as props for building understanding.
Altman, N. & Krzywinski, M. (2025) Crafting 10 Years of Statistics Explanations: Points of Significance. Annual Review of Statistics and Its Application 12:69–87.
I don’t have good luck in the match points. —Rafael Nadal, Spanish tennis player
In many experimental designs, we need to keep in mind the possibility of confounding variables, which may give rise to bias in the estimate of the treatment effect.
If the control and experimental groups aren't matched (or, roughly, similar enough), this bias can arise.
Sometimes this can be dealt with by randomizing, which on average can balance this effect out. When randomization is not possible, propensity score matching is an excellent strategy to match control and experimental groups.
Kurz, C.F., Krzywinski, M. & Altman, N. (2024) Points of significance: Propensity score matching. Nat. Methods 21:1770–1772.