2026 π Day latest news buy art
Drive, driven. Gave, given.YelloGive me a number of games.more quotes
very clickable

genomics + data mining

ICDM2012 Keynote

Needles in Stacks of Needles: genomics + data mining

Download talk

visual abstract

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
The talk introduces genomics and cancer biology to computer scientists and outlines areas in which data mining methods are being used to further our understanding of the genome. The theme is one of complexity and relevance — computers manage the former, but we are the ultimate judges of the latter. (download talk, ICDM2012)

abstract

In 2001, the first human genome sequence was published. Now, just over 10 years later, we capable of sequencing a genome in just a few days. Massive parallel sequencing projects now make it possible to study the cancers of thousands of individuals. New data mining approaches are required to robustly interrogate the data for causal relationships among the inherently noisy biology. How does one identify genetic changes that are specific and causal to a disease within the rich variation that is either natural or merely correlated? The problem is one of finding a needle in a stack of needles. I will provide a non-specialist introduction to data mining methods and challenges in genomics, with a focus on the role visualization plays in the exploration of the underlying data.

references

The title of the talk was drawn from the paper

Gregory M. Cooper & Jay Shendure Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data Nature Reviews Genetics 12, 628-640 (September 2011)

I will be posting a full list of references for the talk shortly.

news + thoughts

Nature Biotechnology cover

Thu 23-04-2026

My cover design on the 7 April 2026 Nature Biotechnology issue shows the dendrogram that represents a cluster of uniquely expressed (or downregulated) genes in human naive stem cells induced from such cells. Within each dendrogram block, the genomic barcode sequence (sampled from Supplementary Table 1) is depicted with a Code 39 barcode. The highlighted barcode is one of those used for cell isolation.

Ishiguro S. et al. A multi-kingdom genetic barcoding system for precise clone isolation (2026) Nature Biotechnology 44:616–629.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
My Nature Biotechnology phylogenetic tree cover (volume 44, issue 4, 7 April 2026). (more)

Browse my gallery of cover designs.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
A catalogue of my journal and magazine cover designs. (more)

Happy 2026 π Day—
Art for the 5%

Fri 13-03-2026

Celebrate π Day (March 14th) and enjoy the art — but only if you're part of the 5%.

Go ahead, see what you can't see.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
2026 π DAY | Art for the 5%. Shown in the style of Ishihara color test plates, the art is visible only to those with colour blindness. (details)

Ishihara's Tests for Colour Deficiency

Sun 08-03-2026

Authentic and accurate images of Ishihara's test plates photographed (and lovingly color-corrected) from the 38-plate Ishihara's Tests for Colour Deficiency.

I also provide the position, size, and color of each circle on each test plate.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
ISHIHARA'S TEST PLATE 6 | This plate is part of the set of transformation plates. If you see 5, you're ok. If you see 2, you're not. (details)
Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
ISHIHARA'S TEST PLATE 18 | This plate is part of the set of mysterious hidden plates. If you don't see anything, you're ok. If you see 5, you're not. (details)

Symmetric alternatives to the ordinary least squares regression

Wed 23-07-2025

What immortal hand or eye, could frame thy fearful symmetry? — William Blake, "The Tyger"

This month, we look at symmetric regression, which, unlike simple linear regression, it is reversible — remaining unaltered when the variables are swapped.

Simple linear regression can summarize the linear relationship between two variables `X` and `Y` — for example, when `Y` is considered the response (dependent) and `X` the predictor (independent) variable.

However, there are times when we are not interested (or able) to distinguish between dependent and independent variables — either because they have the same importance or the same role. This is where symmetric regression can help.

Martin Krzywinski @MKrzywinski mkweb.bcgsc.ca
Nature Methods Points of Significance column: Symmetric alternatives to the ordinary least squares regression. Geometry of quantities minimized in OLS and symmetric regression. OLS minimizes `\Sigma e_y^2` in `Y` ~ `X` and `\Sigma e_x^2` `X` ~ `Y`. Pythagorean regression minimizes AB (magenta). Geometric means regression (GMR) minimizes area of ABP (orange). Orthogonal regression (OR) minimizes HP (blue). (read)

Luca Greco, George Luta, Martin Krzywinski & Naomi Altman (2025) Points of significance: Symmetric alternatives to the ordinary least squares regression. Nat. Methods 22:1610–1612.

Beyond Belief Campaign BRCA Art

Wed 11-06-2025

Fuelled by philanthropy, findings into the workings of BRCA1 and BRCA2 genes have led to groundbreaking research and lifesaving innovations to care for families facing cancer.

This set of 100 one-of-a-kind prints explore the structure of these genes. Each artwork is unique — if you put them all together, you get the full sequence of the BRCA1 and BRCA2 proteins.

Martin Krzywinski | contact | Canada's Michael Smith Genome Sciences CentrePHSA
Google whack “vicissitudinal corporealization”
{ 10.9.234.160 }