2026 π Day latest news buy art
I'm not real and I deny I won't heal unless I cry.Cocteau Twinslet it gomore quotes
very clickable
data visualization + art
The Personal OncoGenomics Program (POG) is a research initiative to study the impact of embedding genomic sequencing into real-time treatment planning for BC patients with metastatic cancers. Based out of the BC Cancer Research Centre and the GSC, POG is a large world-class clinical research collaboration of BC Cancer oncologists, pathologists and other clinical staff, researchers and technical personnel throughout BC healthcare facilities.
Interested in more art based on the POG570 cohort from the Personal OncoGenomics Program? Check out our 5-year POG anniversary posters and desktops.

Science Magazine cover — Human Genome Special Issue

The issue explores lessons from past studies of human genomics, with an eye toward future research efforts | 24 September 2021 | Volume 373 | Issue 6562

data sources
1. Amberger JS, Bocchini CA, Schiettecatte FJM, Scott AF, Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Research 43:D789–98 (2015).
2. Pleasance E, Titmuss E, Williamson L et al. (2020) Pan-cancer analysis of advanced patient tumors reveals interactions between therapy and genomic landscapes. Nature Cancer 1:452–468.

genome as a spiral

The length of the spiral was chosen so that the scale on the cover is exactly 1,000,000 bases per centimetre. The spiral is given by $$ r = \frac{r_0}{2\pi} \left( 1 - \frac{\theta ^{k}}{n_{\textrm{turns}}} \right) $$

where `r_0 = 490.06321`, `n_{\textrm{turns}} = 60` and `k = 1.01`. The angle starts at `\theta = 0` (corresponds to 12 o'clock in the image) and progresses clockwise until `r = 0`.

The parameter `k` allows me to slightly perturb the rate of winding of the spiral so that the spiral can be of the exact length, all the while fully filling the cover.

Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The spiral on the cover has 56.53 turns and a length of 3,088.27 cm. Cover size is 594 pt × 756 pt (20.96 cm × 26.67 cm).
Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
If the genome were to be drawn as a set of straight lines then it would take 147.37 widths of the cover to achieve the same scale as the spiral (1 Mb/cm).

chromosomes

The lengths of the chromosomes in the latest version of the human genome assembly (hg38, December 2013) are as follows.

 1 len 248,956,422 clen   248,956,422
 2 len 242,193,529 clen   491,149,951
 3 len 198,295,559 clen   689,445,510
 4 len 190,214,555 clen   879,660,065
 5 len 181,538,259 clen 1,061,198,324
 6 len 170,805,979 clen 1,232,004,303
 7 len 159,345,973 clen 1,391,350,276
 8 len 145,138,636 clen 1,536,488,912
 9 len 138,394,717 clen 1,674,883,629
10 len 133,797,422 clen 1,808,681,051
11 len 135,086,622 clen 1,943,767,673
12 len 133,275,309 clen 2,077,042,982
13 len 114,364,328 clen 2,191,407,310
14 len 107,043,718 clen 2,298,451,028
15 len 101,991,189 clen 2,400,442,217
16 len  90,338,345 clen 2,490,780,562
17 len  83,257,441 clen 2,574,038,003
18 len  80,373,285 clen 2,654,411,288
19 len  58,617,616 clen 2,713,028,904
20 len  64,444,167 clen 2,777,473,071
21 len  46,709,983 clen 2,824,183,054
22 len  50,818,468 clen 2,875,001,522
 x len 156,040,895 clen 3,031,042,417
 y len  57,227,415 clen 3,088,269,832

There are other scaffold entries in the assembly file (e.g. chr M and unanchored segments) but these were omitted from the graphic. The mitochondrial chromosome is only 16,569 bp long, which corresponds to only 0.2 mm on the cover.

Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The spiral on the cover has 56.53 turns and a length of 3,088.27 cm. Cover size is 594 pt × 756 pt (20.96 cm × 26.67 cm).

revenge of the rainbow color palette

The chromosome color palette used in the UCSC genome browser is this

Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
UCSC Genome Browser chromosome color palette. (zoom)

One of the things the original UCSC color palette doesn't get exactly right is luminance normalization. Some colors are very bright (e.g. chr10 yellow) while others are very dark (e.g. chr14 blue). Given that bright elements (generally) command more attention than dark ones, most color palettes for ordinal or nominal variables have some kind of threshold to how much luminance (and chroma) can differ.

Below you can see the UCSC color palette normalized to luminance `L = 70, 80, 90`. While this kind of normalization satisifes some technical requirements for perceptual uniformity, these palettes lack vigor. For strict visualization you probably don't want vigor, but for artistic treatments you probably do.

Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
UCSC Genome Browser chromosome color palette and versions normalized for 70, 80 and 90 luminance. (zoom)

A while back I looked into generating color palettes of maximally distinct colors. These palettes are formed by clustering all possible colors using perceptually uniform metric (e.g. CIE 2000 `\Delta E`) into `n` clusters. The centroid of each cluster gives the color and, as a set, these centroids are reasonably distributed through color space (e.g. Lab) in such a way that the `\Delta E` between any color and its closest neighbour is roughly the same.

I knew that I was going to make chromosomes X and Y grey, so all I needed were 22 maximally distinct colors for chromosomes 1 – 22. To select these colors, I took all colors in the LCH range `L = [60,80]`, `C = [20,100]` and `H = [0,359]` (all hues) and sampled them every `\Delta L = 5`, `\Delta C = \Delta H = 1`. This yielded 24,928 colors.

picking colors lum 60 ncolors 0
picking colors lum 65 ncolors 5715
picking colors lum 70 ncolors 10975
picking colors lum 75 ncolors 15925
picking colors lum 80 ncolors 20602
initial colors 24928

As I was building the list, any colors within `\Delta E_\textrm{00} < 2` a color already in the list were rejected (to speed up the calculations). This reduced the number of colors to cluster from 24,928 to 6,240.

removing nearby colors within dE00 2
filtering colors 0 nfiltered 0
filtering colors 1000 nfiltered 5159
filtering colors 2000 nfiltered 8600
filtering colors 3000 nfiltered 11024
filtering colors 4000 nfiltered 12728
filtering colors 5000 nfiltered 13947
filtering colors 6000 nfiltered 14958
filtering colors 7000 nfiltered 15673
filtering colors 8000 nfiltered 16261
filtering colors 9000 nfiltered 16698
filtering colors 10000 nfiltered 17084
filtering colors 11000 nfiltered 17442
filtering colors 12000 nfiltered 17697
filtering colors 13000 nfiltered 17891
filtering colors 14000 nfiltered 18037
filtering colors 15000 nfiltered 18147
filtering colors 16000 nfiltered 18269
filtering colors 17000 nfiltered 18346
filtering colors 18000 nfiltered 18428
filtering colors 19000 nfiltered 18483
filtering colors 20000 nfiltered 18541
filtering colors 21000 nfiltered 18584
filtering colors 22000 nfiltered 18624
filtering colors 23000 nfiltered 18657
filtering colors 24000 nfiltered 18679
got 6240 colors

Clustering was done using `k`-means with `\Delta E_\textrm{94}` as the distance metric. It's much faster than `\Delta E_\textrm{00}` and only subtly different.

Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Maximally distinct set of 22 colors used in the graphic. Sorted by hue. (zoom)

This color cluster centroids are

#        R   G   B   L   C   H    nearest & distance
i    0 235 161 135  72  34  46 j  1 dE00 11.64
i    1 255 126  70  68  72  48 j  0 dE00 11.64
i    2 246 158  67  72  64  66 j  3 dE00 12.04
i    3 216 173  84  73  51  83 j  2 dE00 12.04
i    4 187 186  58  73  63 103 j  5 dE00 10.88
i    5 169 184 118  72  35 116 j  4 dE00 10.88
i    6 145 205  64  76  73 123 j  7 dE00  9.54
i    7  88 183  68  66  69 135 j  8 dE00  9.06
i    8  64 214 110  76  73 146 j  7 dE00  9.06
i    9 123 193 144  72  36 151 j  8 dE00 11.02
i   10   0 208 161  73  62 171 j  9 dE00 11.90
i   11  86 195 191  72  32 194 j 12 dE00 12.86
i   12   0 206 221  71  74 201 j 11 dE00 12.86
i   13  79 189 227  72  34 235 j 11 dE00 14.52
i   14 111 176 255  71  51 277 j 15 dE00 13.47
i   15 166 166 249  70  45 295 j 16 dE00 12.71
i   16 210 140 255  70  73 313 j 15 dE00 12.71
i   17 252 123 223  69  67 335 j 18 dE00 11.58
i   18 212 163 219  73  34 323 j 17 dE00 11.58
i   19 243 152 176  72  37   2 j 20 dE00 12.56
i   20 255  83 166  66  78   0 j 17 dE00 12.40
i   21 255 110 117  67  67  24 j 19 dE00 14.34

Each color is indentified by index `i`. Its closest neighbour (in terms of `\Delta E`) is color `j` and the distance between them is `\Delta E_{\textrm{00}}`. You can see that this distance to the nearest neighbour is reasonably constant(ish) (average difference is 12.3).

Note that there are two color pairs with very similar hue (`i = 0,1` and `i = 19,20`).

Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Chromosome color legend. (zoom)

There are lots of greenish colors — this reflects the fact that green occupies a relatively large part of the color space.

other palette options

It's worth exploring a few different clustering scenarios, which differ by how colors to be clustered are sampled across luminance and chroma. For example, option "I" clusters 2,544 colors sampled from ranges `L = [60,80]` and `C = [80,100]` (very saturated). If we restrict color sampling to a single luminance and chroma, we get something like option "O", which clusters 104 colors with `L = 60` and `C = 80`.

Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Other color palette options based on k-means clustering of colors within a given luminance (L) and chroma (C) range.

If we want 22 colors that then it's necessary to sample from a variety of luminance and chroma values. Otherwise, colors vary only by hue and some are difficult to discern. For example, in option "N" which samples from fixed `L = C = 70`, the clustering gives

i    0 255 124  98  70  69  37 j 1 dE00 14.25
i    1 255 141  63  70  69  57 j 2 dE00 14.19
i    2 228 157  35  70  69  75 j 1 dE00 14.19
i    3 193 172  21  70  69  95 j 4 dE00 13.21
i    4 151 183  41  70  69 114 j 5 dE00 10.02
i    5 104 191  72  70  69 133 j 6 dE00  7.50
i    6  40 196 101  70  69 148 j 5 dE00  7.50
i    7   0 199 140  70  69 166 j 8 dE00  8.93
i    8   0 201 174  70  69 182 j 9 dE00  7.97
i    9   0 201 202  70  69 195 j 8 dE00  7.97
i   10   0 200 233  70  69 210 j 11 dE00  6.85
i   11   0 198 255  70  69 224 j 10 dE00  6.85
i   12   0 190 255  70  69 249 j 13 dE00  9.47
i   13   0 181 255  70  69 266 j 14 dE00  8.43
i   14  43 173 255  70  69 278 j 15 dE00  5.53
i   15  99 168 255  70  69 285 j 14 dE00  5.53
i   16 152 159 255  70  69 296 j 15 dE00  9.75
i   17 195 147 255  70  69 308 j 18 dE00  8.99
i   18 230 134 251  70  69 322 j 19 dE00  6.60
i   19 254 122 228  70  69 335 j 18 dE00  6.60
i   20 255 112 195  70  69 351 j 19 dE00  7.76
i   21 255 110 148  70  69  12 j 20 dE00 11.42

If I increase the number of k-means iterations to 10,000 (from a modest 10) (see options N and N' in the figure above), then we get a set where minimum `\Delta E` varies much less. We still have the same average `\textrm{avg} \Delta E = 9.1`.

> more colors.22.txt
i    0 255 121 106  70  69  32 j 1 dE00 10.86
i    1 255 111 140  70  69  15 j 2 dE00 10.28
i    2 255 110 181  70  69 357 j 3 dE00  7.98
i    3 255 118 216  70  69 340 j 2 dE00  7.98
i    4 235 132 247  70  69 324 j 3 dE00  8.37
i    5 195 147 255  70  69 308 j 4 dE00 10.07
i    6 151 159 255  70  69 296 j 7 dE00  9.60
i    7  99 168 255  70  69 285 j 8 dE00  5.53
i    8  43 173 255  70  69 278 j 7 dE00  5.53
i    9   0 181 255  70  69 266 j 8 dE00  8.43
i   10   0 190 255  70  69 249 j 9 dE00  9.47
i   11   0 199 253  70  69 222 j 12 dE00  7.87
i   12   0 201 225  70  69 206 j 11 dE00  7.87
i   13   0 201 192  70  69 191 j 12 dE00  9.32
i   14   0 200 154  70  69 173 j 13 dE00 10.41
i   15  14 196 106  70  69 150 j 16 dE00  8.17
i   16 101 191  74  70  69 134 j 15 dE00  8.17
i   17 157 182  38  70  69 112 j 18 dE00 11.78
i   18 193 172  21  70  69  95 j 19 dE00 10.42
i   19 219 161  29  70  69  80 j 18 dE00 10.42
i   20 242 150  47  70  69  66 j 19 dE00 10.59
i   21 255 137  71  70  69  52 j 20 dE00 10.84

If we're only clustering 102 colors, then 10,000 iterations is practical.

If only one component is varying, clustering isn't very efficient. A better way to select colors in this case would be to step along the component and select adjacent colors based on a fixed `\Delta E`.

cytogenetic bands

Cytogenetic bands are chromosome landmarks obtained by Giemsa staining.

For these colors, I sampled from the grey Brewer palette. Stalks are variable regions in the five acrocentric chromosomes that connect the small arm to the chromosome.

Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Chromosome cytogenetic band color. (zoom)
Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Cytogenetic bands.

The bands were composited on top of chromosomes at 30% opacity.

Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Cytogenetic bands at 30% opacity on top of chromosomes.

The the bands and chromosomes were further muted on the cover with a faint white overlay and two gradients were added to provide visibility to text.

Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Chromosomes and bands were tinted with light gradients.

genes

The genes in the graphic are taken from the OMIM database, which tracks genes associated with Mendelian disorders.

Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Counts of OMIM genes in 250 kb regions.

In the 2021-08-20 version of the database, there were 17,638 entries. The genomic position of each entry was taken as the midpoint between start and end position and the number of entry in each 250 kb region was tallied.

There were 6,973 regions that had non-zero counts of OMIM entries. The exact count of regions with a given number of entries is below. For example, there were 24 regions with 10 entries.

omim all 6973
omim 1 3097 size 1.0083
omim 2 1607 size 1.49850609149058
omim 3 846 size 1.88934938415614
omim 4 515 size 2.22703610655001
omim 5 290 size 2.53000291563677
omim 6 178 size 2.80789602411182
omim 7 136 size 3.06653415956866
omim 8 84 size 3.30975619521437
omim 9 68 size 3.54025696262142
omim 10 24 size 3.76001664243838
omim 11 42 size 3.97054229000266
omim 12 32 size 4.17301328612888
omim 13 12 size 4.36837373638714
omim 14 8 size 4.55739374975462
omim 15 9 size 4.74071154475013
omim 16 7 size 4.91886325486207
omim 17 6 size 5.09230456765918
omim 18 2 size 5.26142678164251
omim 19 1 size 5.4265689495722
omim 21 1 size 5.74606211035883
omim 22 1 size 5.90090430238019
omim 23 2 size 6.05275922451503
omim 24 1 size 6.2018107994994
omim 25 1 size 6.34822448986466
omim 26 2 size 6.4921498104568
omim 27 1 size 6.63372241603022

The last value in the list above is the radius of the circle used to encode the number of events in a region. The Flannery compensation was used in this encoding $$ r = 1.0083 r_0 \left( c/c_\textrm{min} \right)^{0.5716} $$

where `r_0` is the minimum radius, `c` is the count and `c_\textrm{min}` is the minimum count.

Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Size of circles used to encode the number of OMIM records and mutation clusters. (zoom)

The same scheme was used for the mutation cluster circles.

mutation clusters

Mutation clusters were taken from Pan-cancer analysis of advanced patient tumors reveals interactions between therapy and genomic landscapes. (Pleasance E et al. (2020) Nature Cancer 1:452–468).

Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Counts of mutation clusters in 250 kb regions.

There were 2,596 clusters and their bin counts and circle sizes were as follows

cluster all 1823
cluster 1 1323 size 1.0083
cluster 2 343 size 1.49850609149058
cluster 3 96 size 1.88934938415614
cluster 4 38 size 2.22703610655001
cluster 5 10 size 2.53000291563677
cluster 6 6 size 2.80789602411182
cluster 7 4 size 3.06653415956866
cluster 9 1 size 3.54025696262142
cluster 10 1 size 3.76001664243838
cluster 14 1 size 4.55739374975462
news + thoughts

Propensity score weighting

Mon 04-05-2026

It is not certain that everything is uncertain. —Blaise Pascal

We have already explored how we can mitigate bias caused by confounding variables in observational studies using propensity score (PS) matching (PSM) and propensity score weighting (PSW). However, any statistical model is only as good as its assumptions and, if it is specified incorrectly, it can itself produce biased estimates of the treatment effect.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Double Robustness. (read)

This month, we explore double robustness, a powerful statistical concept that provides a valuable “safety net” against the risk of an incorrect model. It offers two opportunities, instead of just one, to obtain a valid estimate of the treatment effect — making it possible to draw credible causal inferences from observational data without having to depend on a single set of modeling assumptions.

Kurz, C.F., Krzywinski, M. & Altman, N. (2026) Points of significance: Double Robustness. Nat. Methods 23:868–869.

Nature Biotechnology cover

Thu 23-04-2026

My cover design on the 7 April 2026 Nature Biotechnology issue shows the dendrogram that represents a cluster of uniquely expressed (or downregulated) genes in human naive stem cells induced from such cells. Within each dendrogram block, the genomic barcode sequence (sampled from Supplementary Table 1) is depicted with a Code 39 barcode. The highlighted barcode is one of those used for cell isolation.

Ishiguro S. et al. A multi-kingdom genetic barcoding system for precise clone isolation (2026) Nature Biotechnology 44:616–629.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
My Nature Biotechnology phylogenetic tree cover (volume 44, issue 4, 7 April 2026). (more)

Browse my gallery of cover designs.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A catalogue of my journal and magazine cover designs. (more)

Happy 2026 π Day—
Art for the 5%

Fri 13-03-2026

Celebrate π Day (March 14th) and enjoy the art — but only if you're part of the 5%.

Go ahead, see what you can't see.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
2026 π DAY | Art for the 5%. Shown in the style of Ishihara color test plates, the art is visible only to those with colour blindness. (details)

Ishihara's Tests for Colour Deficiency

Sun 08-03-2026

Authentic and accurate images of Ishihara's test plates photographed (and lovingly color-corrected) from the 38-plate Ishihara's Tests for Colour Deficiency.

I also provide the position, size, and color of each circle on each test plate.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
ISHIHARA'S TEST PLATE 6 | This plate is part of the set of transformation plates. If you see 5, you're ok. If you see 2, you're not. (details)
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
ISHIHARA'S TEST PLATE 18 | This plate is part of the set of mysterious hidden plates. If you don't see anything, you're ok. If you see 5, you're not. (details)

Symmetric alternatives to the ordinary least squares regression

Wed 23-07-2025

What immortal hand or eye, could frame thy fearful symmetry? — William Blake, "The Tyger"

This month, we look at symmetric regression, which, unlike simple linear regression, it is reversible — remaining unaltered when the variables are swapped.

Simple linear regression can summarize the linear relationship between two variables `X` and `Y` — for example, when `Y` is considered the response (dependent) and `X` the predictor (independent) variable.

However, there are times when we are not interested (or able) to distinguish between dependent and independent variables — either because they have the same importance or the same role. This is where symmetric regression can help.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Symmetric alternatives to the ordinary least squares regression. Geometry of quantities minimized in OLS and symmetric regression. OLS minimizes `\Sigma e_y^2` in `Y` ~ `X` and `\Sigma e_x^2` `X` ~ `Y`. Pythagorean regression minimizes AB (magenta). Geometric means regression (GMR) minimizes area of ABP (orange). Orthogonal regression (OR) minimizes HP (blue). (read)

Luca Greco, George Luta, Martin Krzywinski & Naomi Altman (2025) Points of significance: Symmetric alternatives to the ordinary least squares regression. Nat. Methods 22:1610–1612.

Martin Krzywinski | contact | Canada's Michael Smith Genome Sciences CentrePHSA
Google whack “vicissitudinal corporealization”
{ 10.9.234.160 }