home>Art of `\pi`, `\phi`, and `e`>2025 π Day>Method

visualization + math

π Day 2025 Art Posters - TTCAGT: a sequence of digits
/ Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca

▲ 2025 π DAY | TTCAGT: A sequence of digits. 768 digits of `\pi` as a Sanger sequencing trace of 1,536 peaks. Decode the sequence (BUY ARTWORK)

▲ Pi Day 2023 — Repeated Sequence — A modular synthesizer experience.

π Day 2025 Art Posters - TTCAGT: a sequence of digits

▲ 2024 π DAY | Explore the garden of digits.

▲ 2023 π DAY | Repeated sequence

▲ 2022 π DAY | three one four: a number of digits

▲ 2021 π DAY | Good things grow for those who wait.' edition.

▲ 2020 π DAY | The piku.

▲ 2019 π DAY | Hundreds of digits, hundreds of languages and a special kids' edition.

▲ 2018 π DAY | Street maps to new destinations.

▲ 2017 π DAY | Imagine the sky in a new way.

▲ 2016 π APPROXIMATION DAY | What would happen if about right was right.

▲ 2016 π DAY | These digits really fall for each other.

▲ 2015 π DAY | A transcendental experience.

▲ 2014 π APPROXIMATION DAY | Spirals into roughness.

▲ 2014 π DAY | Hypnotizes you into looking.

▲ 2014 π DAY | Come into the fold.

▲ 2013 π DAY | Where it started.

▲ CIRCULAR π ART | And other distractions.

π, φ, e pi day tau day pi approximation day art resources music

On March 14th celebrate `\pi` Day. Hug `\pi`—find a way to do it.

For those who favour `\tau=2\pi` will have to postpone celebrations until July 26th. That's what you get for thinking that `\pi` is wrong. I sympathize with this position and have `\tau` day art too!

If you're not into details, you may opt to party on July 22nd, which is `\pi` approximation day (`\pi` ≈ 22/7). It's 20% more accurate that the official `\pi` day!

Finally, if you believe that `\pi = 3`, you should read why `\pi` is not equal to 3.

support my work ·

Most of the art is available for purchase as framed prints and, yes, even pillows. Sleep's never been more important — I take custom requests.

≡ 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025

Well—well; the sad minutes are moving,
Though loaded with trouble and pain;
And some time the loved and the loving
Shall meet on the mountains again!
—Emily Bronte

Welcome to this year's celebration of `\pi` and mathematics.

The theme this year is Sanger sequencing — old-school, one base at a time.

This year's `\pi` poem is Loud Without The Wind Was Roaring by Emily Bronte.

This year's `\pi` day song is Movements by Luca Musto.

Also, the tabbed menu above is full. Gasp.

▲ TTACGT: A SEQUENCE OF DIGITS | This year, `\pi` is encoded into a sequence of nucleotides, shown as peaks that appear during Sanger sequencing.

buy artwork

▲ 2025 PI DAY | TTCAGT: a sequence of digits. (buy artwork / see all my art)

three one four: a number of notes ·

The 2022 π Day art is a music album composed by Greg Coles for solo piano. It tells stories from the very beginning (314…) to the very (known) end of π (…264) as well as math (Wallis Product) and math jokes (Feynman Point), repetition (nn) and zeroes (null).

This year's sequence of digits reflects on a number of notes from 2022.

≡ method photos song poem posters credits

1 · Sanger sequencing — one base at a time

1.1 · PCR
1.2 · Chain-terminating nucleotides
1.3 · Gel Electrophoresis
1.4 · Calling peaks
1.5 · Automation

2 · TTCAGT: a sequence of digits

2.1 · Encoding the digits

2.1.1 · Optimal encodings

2.2 · Designing the art

2.2.1 · Signal peaks
2.2.2 · Backround peaks

2.3 · Poster layout

keyboard_arrow_up

1 · Sanger sequencing — one base at a time

Here's a simplified explanation of how Sanger sequencing works.

I'm skipping any detail about primers, reaction conditions and the fact that some sequences will be complementary (e.g. A→T, C→G, G→C, T→A).

keyboard_arrow_up

1.1 · PCR

Let's suppose we want to determine the sequence in TTCAGT.

To do this, we make use of a DNA copying process called polymerase chain reaction (PCR). But the name here isn't important.

PCR will take our DNA and make millions of copies of it. This kind of PCR is good for one-to-many amplification but, in its basic form, is not that useful for us.

keyboard_arrow_up

1.2 · Chain-terminating nucleotides

Normally, PCR works by using a template strand of DNA (that which is to be copied) and a protein called DNA polymerase (among others), which synthesizes a new strand on top of the template by stitching together a complementary sequence using free-floating nucleotides in the solution buffer.

However, we can change how the PCR copying process happens by throwing in a few extra molecular ingredients into the reaction buffer.

We add a small amount of "special" nucleotides (A*, C*, G* and T*) which will terminate the PCR copy reaction. These special nucleotides are available to the PCR machinery in the same way that the regular nucleotides A, C, G, T are. Except, because the special bases are available at much lower concentration (e.g. 1/100), they will be incorporated into the new string at a low probability.

In this new copy reaction, we will get all the possible subsequences that start at the first base

T*
TT*
TTC*
TTCA*
TTCAG*
TTCAGT*

For example, in the copied sequence TTC*, PCR has incorporated two regular T's followed by the terminating C*.

keyboard_arrow_up

1.3 · Gel Electrophoresis

We now take these fragments (which are all floating around in a solution buffer) and order them by size using gel electrophoresis.

Briefly, this process takes advantage of the fact that (a) DNA molecules are negative charged and (b) smaller molecules diffuse faster through a gel matrix than larger ones.

We diffuse the DNA molecules through a polyacrylamide gel. But waiting for diffusion would take forever. To speed things up, we apply voltage across the gel. This pulls the negatively charged DNA molecules to the positive terminal. Shorter fragments pass through the gel with minimal hinderance but larger ones get occasionally caught up and temporarily stuck in the gel matrix and thus take longer to pass through

If all this happens in a capillary, we get a procession of size-ordered fragments coming out the other end.

keyboard_arrow_up

1.4 · Calling peaks

Finally, remember how I said that these terminating nucleotides were "special"? They fluoresce under a laser. We use this light to detect the fragment — which shows up as a fluorescence peak.

Ideally, if the signal is clean, we will see a uniformly (more or less) series of smudges on the gel. The relative positions of the peaks tell us which DNA fragment comes next (e.g. TT* and TTCAGT* are separated by 3 peaks that correspond to TTC* TTCA* and TTCAG*).

We are able to tell the bases apart because we run four parallel and independent copy processes, each having access to only one of the terminating nucleotides. For example, the T*, TT* and TTCAGT* peaks would all show up in the T* reaction but not in the other reactions.

keyboard_arrow_up

1.5 · Automation

This used to all be done manually but in the late 90's and early 2000's this all happened inside automated sequencers.

One of these sequencers was an ABI 3700. Below I show what the screen interface looked like during a run. Traditionally, the color assignments to the bases were A (green), C (blue), G (black/yellow) and T (red). Pure RGB for intensity.

▲ ABI 3700 INTERFACE | Screen closeup of the visualization of the sequencing gel.

▲ VIRTUAL SEQUENCING GEL. | Screen closeup of the visualization of the sequencing gel from an ABI 3700 sequencer.

keyboard_arrow_up

2 · TTCAGT: a sequence of digits

keyboard_arrow_up

2.1 · Encoding the digits

The posters show `\pi` up to the Feynman Point, which are six 9's at decimal places 762–767. This position in `\pi` is a great place to stop because of the unexpected pattern of 9's at the end.

Each digit is encoded by two bases:

0 GA
1 CA
2 TC
3 TT
4 GT
5 GC
6 AA
7 CC
8 TA
9 GG

With this scheme, 3.14 reads as TTCAGT. Hence, the title of the art "TTCAGT: a sequence of digits".

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca

▲ TTCAGT: A SEQUENCE OF DIGITS | This year, `\pi` is encoded into a sequence of nucleotides, which appear as peaks that appear during Sanger sequencing. Shown here are the first three and last rows that appear in the art poster.

This encoding was chosen so that the number of bases in the sequence was balanced, to the extent possible. The number of peaks per base on the trace is

I fixed 9 to be GG because G is traditionally shown as black in Sanger traces and I wanted to end on this color.

I also fixed 3 to be TT (traditionally red) so that the trace starts with two red (or magenta) peaks.

keyboard_arrow_up

2.1.1 · Optimal encodings

There are many other encodings possible.

One kind of encoding is Huffman, which creates a tree of unique representations formed from an alphabet of symbols to encode information. Check out the paper Toward a Better Compression for DNA Sequences Using Huffman Encoding. Try the online Huffman encoder

Here's one of the optimal Huffman encodings of the first 768 digits of `\pi` into nucleotides.

1 symbol  1  C count 88/768 11.5%
9 symbol  0  A count 85/768 11.1%
2 symbol 33 TT count 81/768 10.5%
4 symbol 32 TG count 79/768 10.3%
3 symbol 31 TC count 76/768  9.9%
6 symbol 30 TA count 75/768  9.8%
8 symbol 23 GT count 72/768  9.4%
7 symbol 22 GG count 71/768  9.2%
0 symbol 21 GC count 71/768  9.2%
5 symbol 20 GA count 70/768  9.1%

The most common digits are 1 and 9, so these can be encoded by a single base (C and A, respectively). The remaining digits need two bases.

If we encoded each digit with two bases, then we'd need a string of 1,536 bases. But with the Huffman encoding, we only need 1,363 bases because we now realize a savings of 88 bases for 1 (which is now encoded by one base instead of two) and 85 bases for 9.

For simplicity, the posters use two bases per digit.

keyboard_arrow_up

2.2 · Designing the art

The peaks were generated from a simple model that drew each peak as a Normal distribution.

The peaks that corresponded to a digit each had a mean height, width, and position, which was perturbed on a peak-by-peak basis using random values drawn from a Normal distribution.

keyboard_arrow_up

2.2.1 · Signal peaks

For example, the peak height mean was `\bar{h} = 0.6` times row height with a standard deviation of `\sigma_h = 0.1\bar{h}`. The width of each peak was `\bar{w} = 0.15S`, where `S` is the spacing between peaks, with a standard deviation of `\sigma_w = 0.1\bar{w}`. The position standard deviation was `sigma_x = 0.075S`.

Towards the last 20 peaks (10 digits), the peak height is reduced and width is increased to taper off the signal.

keyboard_arrow_up

2.2.2 · Backround peaks

For each signal peak, up to four noise peaks were added to the signal. The peaks were positioned at horizontal offsets of `-2, -1, +1, 2` peak spacings. Neighbour error peaks (offset by `-1` and `1`) had a 50% probability of being drawn and the next-nearest neighbour error peaks (offset by `-2` and `2`) had a 25% probability.

The error peaks were on average 10% (neighbours) or 5% (next-nearest neighbours) of the height of the signal peaks.

These background peaks arise during the Sanger reaction for a variety of reasons. Typically the start of the trace is messy, but I don't account for this.

keyboard_arrow_up

2.3 · Poster layout

The posters are designed for 50 cm × 50 cm (19.7" × 19.7"). At this size the title font (Futura Medium) is 16 pt and the legend font (Futura Book) is 12 pt.

You can easily display the poster at half this size and still have the legend font readable.

There are 30 rows with up to 52 peaks per row. The first and last rows have fewer peaks.

VIEW ALL

news + thoughts

Propensity score weighting

Mon 17-03-2025

The needs of the many outweigh the needs of the few. —Mr. Spock (Star Trek II)

This month, we explore a related and powerful technique to address bias: propensity score weighting (PSW), which applies weights to each subject instead of matching (or discarding) them.

▲ Nature Methods Points of Significance column: Propensity score weighting. (read)

Kurz, C.F., Krzywinski, M. & Altman, N. (2025) Points of significance: Propensity score weighting. Nat. Methods 22:1–3.

Happy 2025 π Day—
TTCAGT: a sequence of digits

Thu 13-03-2025

Celebrate π Day (March 14th) and sequence digits like its 1999. Let's call some peaks.

▲ 2025 π DAY | TTCAGT: a sequence of digits. The digits of π are encoded into DNA sequence and visualized with Sanger sequencing. (details)

Crafting 10 Years of Statistics Explanations: Points of Significance

Sun 09-03-2025

I don’t have good luck in the match points. —Rafael Nadal, Spanish tennis player

Points of Significance is an ongoing series of short articles about statistics in Nature Methods that started in 2013. Its aim is to provide clear explanations of essential concepts in statistics for a nonspecialist audience. The articles favor heuristic explanations and make extensive use of simulated examples and graphical explanations, while maintaining mathematical rigor.

Topics range from basic, but often misunderstood, such as uncertainty and P-values, to relatively advanced, but often neglected, such as the error-in-variables problem and the curse of dimensionality. More recent articles have focused on timely topics such as modeling of epidemics, machine learning, and neural networks.

In this article, we discuss the evolution of topics and details behind some of the story arcs, our approach to crafting statistical explanations and narratives, and our use of figures and numerical simulations as props for building understanding.

▲ Crafting 10 Years of Statistics Explanations: Points of Significance. (read)

Altman, N. & Krzywinski, M. (2025) Crafting 10 Years of Statistics Explanations: Points of Significance. Annual Review of Statistics and Its Application 12:69–87.

Propensity score matching

Mon 16-09-2024

I don’t have good luck in the match points. —Rafael Nadal, Spanish tennis player

In many experimental designs, we need to keep in mind the possibility of confounding variables, which may give rise to bias in the estimate of the treatment effect.

▲ Nature Methods Points of Significance column: Propensity score matching. (read)

If the control and experimental groups aren't matched (or, roughly, similar enough), this bias can arise.

Sometimes this can be dealt with by randomizing, which on average can balance this effect out. When randomization is not possible, propensity score matching is an excellent strategy to match control and experimental groups.

Kurz, C.F., Krzywinski, M. & Altman, N. (2024) Points of significance: Propensity score matching. Nat. Methods 21:1770–1772.

Understanding p-values and significance

Tue 24-09-2024

P-values combined with estimates of effect size are used to assess the importance of experimental results. However, their interpretation can be invalidated by selection bias when testing multiple hypotheses, fitting multiple models or even informally selecting results that seem interesting after observing the data.

We offer an introduction to principled uses of p-values (targeted at the non-specialist) and identify questionable practices to be avoided.

▲ Understanding p-values and significance. (read)

Altman, N. & Krzywinski, M. (2024) Understanding p-values and significance. Laboratory Animals 58:443–446.

Depicting variability and uncertainty using intervals and error bars

Thu 05-09-2024

Variability is inherent in most biological systems due to differences among members of the population. Two types of variation are commonly observed in studies: differences among samples and the “error” in estimating a population parameter (e.g. mean) from a sample. While these concepts are fundamentally very different, the associated variation is often expressed using similar notation—an interval that represents a range of values with a lower and upper bound.

In this article we discuss how common intervals are used (and misused).

▲ Depicting variability and uncertainty using intervals and error bars. (read)

Altman, N. & Krzywinski, M. (2024) Depicting variability and uncertainty using intervals and error bars. Laboratory Animals 58:453–456.

π Day 2025 Art Posters - TTCAGT: a sequence of digits

contents

1 · Sanger sequencing — one base at a time

1.1 · PCR

1.2 · Chain-terminating nucleotides

1.3 · Gel Electrophoresis

1.4 · Calling peaks

1.5 · Automation

2 · TTCAGT: a sequence of digits

2.1 · Encoding the digits

2.1.1 · Optimal encodings

2.2 · Designing the art

2.2.1 · Signal peaks

2.2.2 · Backround peaks

2.3 · Poster layout

Propensity score weighting

Happy 2025 π Day—TTCAGT: a sequence of digits

Crafting 10 Years of Statistics Explanations: Points of Significance

Propensity score matching

Understanding p-values and significance

Depicting variability and uncertainty using intervals and error bars

Happy 2025 π Day—
TTCAGT: a sequence of digits