2025 π Day latest news buy art
And she looks like the moon. So close and yet, so far.Future Islandsaim highmore quotes
very clickable
visualization + design

Genome Informatics 2010 cover

Genome Informatics, September 15-19, 2010 / Hinxton, UK

1 · The conference program cover

The program cover shows sequences of some of the genes and viruses that appear in the 2010 Genome Informatics conference's abstracts.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
GENOME INFORMATICS 2010 FRONT COVER | The conference program cover shows sequences of some of the proteins and genes reported in the abstracts drawn as paths
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
GENOME INFORMATICS 2010 BACK COVER | The conference program cover shows sequences of some of the proteins and genes reported in the abstracts drawn as paths

The booklet was published with a black cover background. Below is an inverted and pinkish take on the cover.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
GENOME INFORMATICS 2010 FRONT AND BACK COVER | The conference program cover shows sequences of some of the proteins and genes reported in the abstracts drawn as paths

2 · Design of the cover

2.1 · Sequence as a path

Each sequence is represented by a continuous path. The length of the path is proportional to the length of the sequence.

2.2 · Path color — GC Content

At each point on the path, color is used to show the GC content computed over a window of 20 bases at that position.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
GC CONTENT ENCODING | GC content is encoded by color

Because the GC content doesn't vary greatly, values in the range 0.2–0.6 are mapped onto hues 0–300, with GC values outside that range assigned to the start and end hues. To smooth the color mpaping, a running average is calculated across 10 adjacent samples.

2.3 · Path direction — relative GC content

Direction of the curvature of the path is determined by the GC content relative to the average GC content of the human genome.

2.4 · Path curvature — Repeat content

The magnitude of path curvature is informed by the repeat content near that location, which is calculated by determining the average frequency of 10-mers sampled within a window of 200 bases relative to their frequency in the human exon sequence.

This quantity is expressed relative to the chance of observing these 10-mers randomly and used to inform the angle of the path. Regions that are composed of 10-mers that are relatively rare are straighter than those which contain repetitive regions.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
CURVATURE SHOWS REPEATS | The degree to which the path turns is informed by how much of the sequence at that position is repeated.

The path is confined within a circular area to keep it compact, at the cost of losing translational and rotational invariance of the representation. This limitation is due to the fact that the segments of the path depend on the angle and position at which the path approaches the circular boundary.

2.5 · Interpreting structure

For genes, the transcribed sequence is shown, which includes both introns and exons.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
GENES ARE HIGH-INFORMATION AREAS | Areas of high information are more straight (fewer repeats). Where sequence for areas outside genes and in repeats tend to curl up on themselves.

The overall effect of the path encoding is a qualitative, artistic interpretation of local sequence structure. Two paths can be directly compared to interrogate differences in their corresponding sequence.

3 · Deadly genome series

The Deadly Genomes poster demonstrates how entire genomes appear when encoded as paths. The poster compares the incidence rates and mortality of harmful viruses and bacteria, such as malaria, syphilis, AIDS and SARS.

Discover all the things that are not trying to make you stronger.
The cover design uses the same approach to depicting genomes as the Deadly Genomes poster.

As on the conference covers, on the poster each genome is drawn as a path. The length of the path is proportional to the size of the genome. Every fifth base is drawn as a circle whose color is based on the GC content (fraction of guanines and cytosines). The path curvature is proportional to the repeat content and the direction of curvature is determined by whether the GC content is lower or higher than average. Genomes are labeled by disease, organism, size (in bases) and GC content. Updated with the genome of SARS-CoV-2 (Wuhan-Hu-1 isolate) and COVID-19 case statistics as of 3 March 2020."

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
DEADLY GENOMES | Genomes of harmful bacteria and viruses.

The poster was a finalist in the 2009 National Science Foundation Visualization Challenge.

news + thoughts

Happy 2025 π Day—
TTCAGT: a sequence of digits

Thu 13-03-2025

Celebrate π Day (March 14th) and sequence digits like its 1999. Let's call some peaks.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
2025 π DAY | TTCAGT: a sequence of digits. The digits of π are encoded into DNA sequence and visualized with Sanger sequencing. (details)

Crafting 10 Years of Statistics Explanations: Points of Significance

Sun 09-03-2025

I don’t have good luck in the match points. —Rafael Nadal, Spanish tennis player

Points of Significance is an ongoing series of short articles about statistics in Nature Methods that started in 2013. Its aim is to provide clear explanations of essential concepts in statistics for a nonspecialist audience. The articles favor heuristic explanations and make extensive use of simulated examples and graphical explanations, while maintaining mathematical rigor.

Topics range from basic, but often misunderstood, such as uncertainty and P-values, to relatively advanced, but often neglected, such as the error-in-variables problem and the curse of dimensionality. More recent articles have focused on timely topics such as modeling of epidemics, machine learning, and neural networks.

In this article, we discuss the evolution of topics and details behind some of the story arcs, our approach to crafting statistical explanations and narratives, and our use of figures and numerical simulations as props for building understanding.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Crafting 10 Years of Statistics Explanations: Points of Significance. (read)

Altman, N. & Krzywinski, M. (2025) Crafting 10 Years of Statistics Explanations: Points of Significance. Annual Review of Statistics and Its Application 12:69–87.

Propensity score matching

Mon 16-09-2024

I don’t have good luck in the match points. —Rafael Nadal, Spanish tennis player

In many experimental designs, we need to keep in mind the possibility of confounding variables, which may give rise to bias in the estimate of the treatment effect.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Propensity score matching. (read)

If the control and experimental groups aren't matched (or, roughly, similar enough), this bias can arise.

Sometimes this can be dealt with by randomizing, which on average can balance this effect out. When randomization is not possible, propensity score matching is an excellent strategy to match control and experimental groups.

Kurz, C.F., Krzywinski, M. & Altman, N. (2024) Points of significance: Propensity score matching. Nat. Methods 21:1770–1772.

Understanding p-values and significance

Tue 24-09-2024

P-values combined with estimates of effect size are used to assess the importance of experimental results. However, their interpretation can be invalidated by selection bias when testing multiple hypotheses, fitting multiple models or even informally selecting results that seem interesting after observing the data.

We offer an introduction to principled uses of p-values (targeted at the non-specialist) and identify questionable practices to be avoided.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Understanding p-values and significance. (read)

Altman, N. & Krzywinski, M. (2024) Understanding p-values and significance. Laboratory Animals 58:443–446.

Depicting variability and uncertainty using intervals and error bars

Thu 05-09-2024

Variability is inherent in most biological systems due to differences among members of the population. Two types of variation are commonly observed in studies: differences among samples and the “error” in estimating a population parameter (e.g. mean) from a sample. While these concepts are fundamentally very different, the associated variation is often expressed using similar notation—an interval that represents a range of values with a lower and upper bound.

In this article we discuss how common intervals are used (and misused).

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Depicting variability and uncertainty using intervals and error bars. (read)

Altman, N. & Krzywinski, M. (2024) Depicting variability and uncertainty using intervals and error bars. Laboratory Animals 58:453–456.

Nasa to send our human genome discs to the Moon

Sat 23-03-2024

We'd like to say a ‘cosmic hello’: mathematics, culture, palaeontology, art and science, and ... human genomes.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
SANCTUARY PROJECT | A cosmic hello of art, science, and genomes. (details)
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
SANCTUARY PROJECT | Benoit Faiveley, founder of the Sanctuary project gives the Sanctuary disc a visual check at CEA LeQ Grenoble (image: Vincent Thomas). (details)
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
SANCTUARY PROJECT | Sanctuary team examines the Life disc at INRIA Paris Saclay (image: Benedict Redgrove) (details)
Martin Krzywinski | contact | Canada's Michael Smith Genome Sciences CentreBC Cancer Research CenterBC CancerPHSA
Google whack “vicissitudinal corporealization”
{ 10.9.234.151 }