2025 π Day latest news buy art
Twenty — minutes — maybe — more.Naomichoose four wordsmore quotes
very clickable
music + art
If you like space, you'll love my the 12,000 billion light-year map of clusters, superclusters and voids. Find the biggest nothings in Boötes and Eridanus.The largest map there is shows the location of voids and galaxy superclusters in our visible universe.

There is no sound in space, but there is music (and genomes)

There is no sound in space, but there is music (and genomes) () -- science + art + data visualization / Martin Krzywinski / Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The first 12 seconds of a 1-bit encoding of a 128 mel 3-bit spectrogram of Flunk's Down Here / Moon Above

1 · Music as an image

The Sanctuary discs are 10 cm sapphire wafers. Each disc has about 3 billion 1.4 micron pixels that each store 1-bit of information — the pixel is either on or off. Reading the information off the disc is easy — just look at the pixels. Very closely. In other words, each disc is a very high resolution image.

To send music (or any kind of data), we need to convert it to an image. Enter the spectrogram.

2 · The spectrogram

A spectrogram is a 2-dimensional representation of sound. The x-axis is time and the y-axis is frequency. At each `(x,y)` position the strength of frequency `f=y` at time `t=x` is encoded by the brigthness of a pixel. This way to "draw the sound", which can be decoded back.

The National Music Centre has an excellent short tutorial on how to interpret spectrograms. And if you're a birder, then spectograms of bird calls won't be new to you.

And while I realize that aliens (almost certainly) and future humans (quite possibly) might not perceive sound in the same way, I see this as a minor point. I'm sure they'll work it out. You know... science.

I am indebted to Tim Sainburg for providing assistance and code. The analysis uses the librosa music and audio analysis Python library.

3 · Down Here / Moon Above

This song was written for the Moon. It sounds best there.

Flunk's History of Everything Ever album contains two versions of the song. A final vocal mix as well as an instrumental version, which you get with the purchase of the album.

But there is another. In the intermediate version, when the lyrics weren't quite finalized, Anja sang incomplete phrases and loose vocalizations.

We called this the "gibberish" version and even though the final song was ready before the discs were created, we thought gibberish made for the perfect space language.

Pretend you're an alien or human from the future and give it a listen:

Down Here / Moon Above (Flunk) [gibberish version]

4 · Encoding the song as an image

If we had all the pixels on the Moon, we would encode the spectrogram with a large number of frequencies (e.g. `n = 1,024`) with very fine sampling of time (e.g. 5 ms). The song is about 4 minutes, so this would require an image of 48,000 × 1,024 × 8. The last factor of 8 is for the 8-bit encoding of each pixel.

Although this represents only about 13% of the capacity of a disc (about 200 Mb of genome sequence using our encoding) it's more than we had to spare. There weren't enough pixels on 4 discs to write 4 genomes and the proteome and instructions and the song.

It was clear that I needed a reasonably small spectrogram. There are two ways to achieve this: larger time bins and fewer frequencies. It turns out that a 50 ms time window that stepped along every 20 ms was sufficient — the music didn't have a lot of fast notes. To make the most out of the frequencies I used a psychoacoustic scale.

The mel scale is based on psychoacoustics. It is a logarithmic frequency scale and reflects the fact that we can discriminate low frequencies better than high ones. In other words, to faithfully reproduce sound you need to include more of the low frequencies than high frequencies.

The conversion between `f` in Hertz to `m` in mels is `m = k_0 \textrm{log}(1+f/k_1)`. Because mels are very efficient at spacing frequencies based on perception, I can get away with using very few mels! A-mel-zing!

4.1 · 512 mel spectrogram

I started with 512 mels and 1, 2, 3, 4 and 8 bits per mel. In this encoding, each pixel, which encodes how much of each mel is present in the sound, can have one of `2^b` values (e.g. in the 3-bit encoding we can have up to 8 values).

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
512 mel 3-bit encoding of Down Here / Moon Above by Flunk. The original is 11,688 × 512 pixel, shown here sliced into 10 rows and resized to 600 × 1000.

Optimizing the number of bits is really important because I didn't have that much spare space on the discs. Every pixel of music took away from pixel of genome information. Each bit of each mel requires 11,688 pixels. Thus, going from 3 bits to 4 bits in a 512 mel encoding required an additional 5,984,256 pixels. Two pixels encoded a base, so this corresponded to about 3 Mb of sequence.

Here is what the decoding of the each spectrogram sounds like — this verifies whether the music is reasonably preserved during the encoding-decoding process.

512 mel, 8-bit
512 mel, 4-bit
512 mel, 3-bit
512 mel, 2-bit
512 mel, 1-bit

You can hear that the 8-bit and 4-bit encodings are very good. Remember, we're talking about music on the Moon here, so manage your expectations.

The 3-bit encoding is great. This is the bit sweet spot.

The 2-bit encoding isn't awesome but it's not horrible. You can definitely make out the music and lyrics but there's a warble to the sound.

The 1-bit encoding amazingly still sounds like something. It's very ghostly. The 1-bit encoding is binary — it stores whether a frequency exists at a given point in time or not. All frequencies have the same strength. I imagine this is what music in space sounds like.

4.2 · 128 mel spectrogram

The 512 mel 3-bit encoding took the 17,852,768 pixels, which was about 9 Mb of sequence. Could we do better?

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
128 mel 3-bit encoding of Down Here / Moon Above by Flunk. The original is 11,688 × 128 pixel, shown here sliced into 10 rows and resized to 600 × 250.

It turns out that 128 mels is all we need. Well, maybe not all we need but all we can get! And while the 128 mel 1-bit and 2-bit encodings are sketchy, the 3-bit is amazingly good.

Just think about how little information is being stored here. For each 20 ms of music, we have 128 frequencies, each of which is specified by one of 8 discrete volume levels (because we have only 3 bits).

128 mel, 8-bit
128 mel, 4-bit
128 mel, 3-bit (this is the one!)
128 mel, 2-bit
128 mel, 1-bit

5 · Decoding instructions

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Instructions of how to decode the spectrogram.

6 · Spectrogram on the disc

Because the discs are a 1-bit medium, to store each pixel of the 128 mel 3-bit spectrogram, I needed 3 pixels. This was done by taking the 3-bit pixel and representing it as a column of 3 1-bit pixels. Don't worry, everything is explained in the very clear instructions on the discs.

Below is the final spectrogram as it appears on the disc, shown here wrapped into 19 rows of 600 pixels, each of which correspond to 12 seconds of music.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The final 128 mel 3-bit spectrogram encoded in a 1-bit image.
news + thoughts

Symmetric alternatives to the ordinary least squares regression

Wed 23-07-2025

What immortal hand or eye, could frame thy fearful symmetry? — William Blake, "The Tyger"

This month, we look at symmetric regression, which, unlike simple linear regression, it is reversible — remaining unaltered when the variables are swapped.

Simple linear regression can summarize the linear relationship between two variables `X` and `Y` — for example, when `Y` is considered the response (dependent) and `X` the predictor (independent) variable.

However, there are times when we are not interested (or able) to distinguish between dependent and independent variables — either because they have the same importance or the same role. This is where symmetric regression can help.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Symmetric alternatives to the ordinary least squares regression. Geometry of quantities minimized in OLS and symmetric regression. OLS minimizes `\Sigma e_y^2` in `Y` ~ `X` and `\Sigma e_x^2` `X` ~ `Y`. Pythagorean regression minimizes AB (magenta). Geometric means regression (GMR) minimizes area of ABP (orange). Orthogonal regression (OR) minimizes HP (blue). (read)

Luca Greco, George Luta, Martin Krzywinski & Naomi Altman (2025) Points of significance: Symmetric alternatives to the ordinary least squares regression. Nat. Methods 22:1610–1612.

Beyond Belief Campaign BRCA Art

Wed 11-06-2025

Fuelled by philanthropy, findings into the workings of BRCA1 and BRCA2 genes have led to groundbreaking research and lifesaving innovations to care for families facing cancer.

This set of 100 one-of-a-kind prints explore the structure of these genes. Each artwork is unique — if you put them all together, you get the full sequence of the BRCA1 and BRCA2 proteins.

Propensity score weighting

Mon 17-03-2025

The needs of the many outweigh the needs of the few. —Mr. Spock (Star Trek II)

This month, we explore a related and powerful technique to address bias: propensity score weighting (PSW), which applies weights to each subject instead of matching (or discarding) them.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Propensity score weighting. (read)

Kurz, C.F., Krzywinski, M. & Altman, N. (2025) Points of significance: Propensity score weighting. Nat. Methods 22:638–640.

Happy 2025 π Day—
TTCAGT: a sequence of digits

Thu 13-03-2025

Celebrate π Day (March 14th) and sequence digits like its 1999. Let's call some peaks.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
2025 π DAY | TTCAGT: a sequence of digits. The digits of π are encoded into DNA sequence and visualized with Sanger sequencing. (details)

Crafting 10 Years of Statistics Explanations: Points of Significance

Sun 09-03-2025

I don’t have good luck in the match points. —Rafael Nadal, Spanish tennis player

Points of Significance is an ongoing series of short articles about statistics in Nature Methods that started in 2013. Its aim is to provide clear explanations of essential concepts in statistics for a nonspecialist audience. The articles favor heuristic explanations and make extensive use of simulated examples and graphical explanations, while maintaining mathematical rigor.

Topics range from basic, but often misunderstood, such as uncertainty and P-values, to relatively advanced, but often neglected, such as the error-in-variables problem and the curse of dimensionality. More recent articles have focused on timely topics such as modeling of epidemics, machine learning, and neural networks.

In this article, we discuss the evolution of topics and details behind some of the story arcs, our approach to crafting statistical explanations and narratives, and our use of figures and numerical simulations as props for building understanding.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Crafting 10 Years of Statistics Explanations: Points of Significance. (read)

Altman, N. & Krzywinski, M. (2025) Crafting 10 Years of Statistics Explanations: Points of Significance. Annual Review of Statistics and Its Application 12:69–87.

Propensity score matching

Mon 16-09-2024

I don’t have good luck in the match points. —Rafael Nadal, Spanish tennis player

In many experimental designs, we need to keep in mind the possibility of confounding variables, which may give rise to bias in the estimate of the treatment effect.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Propensity score matching. (read)

If the control and experimental groups aren't matched (or, roughly, similar enough), this bias can arise.

Sometimes this can be dealt with by randomizing, which on average can balance this effect out. When randomization is not possible, propensity score matching is an excellent strategy to match control and experimental groups.

Kurz, C.F., Krzywinski, M. & Altman, N. (2024) Points of significance: Propensity score matching. Nat. Methods 21:1770–1772.

Martin Krzywinski | contact | Canada's Michael Smith Genome Sciences CentrePHSA
Google whack “vicissitudinal corporealization”
{ 10.9.234.159 }