National Institute on Aging
National Institutes of Health
NIA Home
Research Programs
NIA Intramural Research Program
Laboratory of Genetics
Identifying Sub-cellular Organelles

The purpose of the datasets 2D HeLa and CHO is to train a computer program to automatically identify sub-cellular organelles.

2D HeLa is a dataset of fluorescence microscopy images of HeLa cells stained with various organelle-specific fluorescent dyes. The images include 10 organelles, which are DNA (Nuclei), ER (Endoplasmic reticulum), Giantin, (cis/medial Golgi), GPP130 (cis Golgi), Lamp2 (Lysosomes), Mitochondria, Nucleolin (Nucleoli), Actin, TfR (Endosomes), Tubulin.
Automated identification of sub-cellular organelles is important when characterizing newly discovered genes or genes with an unknown function. It is possible to flurescently tag the protein(s) produced by any given gene, and the ability to identify the organelle where the protein resides provides an important clue to its possible function.
It is important to note that human experts have trouble distinguishing Endosomes and Lysosomes, and find the two Golgi proteins in the dataset exceedingly difficult to distinguish. Bob Murphy, who pioneered this field at CMU, considers this type of classification problem essentially solved. These datasets provide useful benchmarks when characterizing new classifiers.
Highest published performance for 2D HeLa is currently 95.3% (A. Chebira, Y. Barbotin, C. Jackson, T. Merryman, G. Srinivasa, R.F. Murphy, and J. Kovacevic (2007). A multiresolution approach to automated classification of protein subcellular location images. BMC Bioinformatics 8:210.)

Journal references:
HeLa - M. V. Boland and R. F. Murphy (2001). A Neural Network Classifier Capable of Recognizing the Patterns of all Major Subcellular Structures in Fluorescence Microscope Images of HeLa Cells. Bioinformatics 17:1213-1223.
CHO - M. V. Boland, M. K. Markey and R. F. Murphy (1998) Automated Recognition of Patterns Characteristic of Subcellular Structures in Fluorescence Microscopy Images. Cytometry 33: 366-375.

Sample images for each class are given below:

DNA (Nuclei)

ER (Endoplasmic reticulum)

Giantin (cis/medial Golgi)

GPP130 (cis Golgi)

Lamp2 (Lysosomes)


Nucleolin (Nucleoli)


TfR (Endosomes)


CHO is a dataset of fluorescence microscope images of CHO (Chinese Hamster Ovary) cells. The images were taken using 5 different labels. The labels are: anti-giantin, Hoechst 33258 (DNA), anti-lamp2, anti-nop4, and anti-tubulin.


Hoechst 33258 (DNA)




The source for both datasets is Robert F. Murphy

Download CHO dataset
Download 2D HeLa dataset

Wndchrm performance report on CHO
Wndchrm performance report on 2D HeLa

Feature file for HeLa (6.0 MB)
Feature file for CHO (2.3 MB)