Instrument Platform

The Affymetrix GeneChip® Instrument System is a fully integrated platform for expression and DNA analysis using GeneChip® microarrays and comprises:

Principle of technology

The basic principle involves hybridisation of your sample of interest ('target') to commercially made GeneChip® arrays ('probe'). The target is usually total RNA (Gene expression arrays; total RNA is typically converted to labelled cDNA or cRNA by reverse transcription and in vitro transcription with biotin-nucleotides) or genomic DNA (DNA analysis arrays) isolated from your sample of interest.

Following hybridisation of the labelled-target to the probe in the hybridisation oven, the array is processed in the fully automated FS450 fluidics station (includes wash and staining steps) using optimised and validated workstation controlled fluidics protocols specific to each GeneChip® microarray. The array is imaged on a GCS3000 7G confocal laser scanner for fluorochrome detection of the bound target. The GCS3000 7G scanner provides high optical resolution (3.5μm excitation spot size), robust optical alignment, low system noise (high SNR) and a broad dynamic range (over 65000 fluorescence levels) for superior data precision. Affymetrix GeneChip Command Console (AGCC) ® and Expression Console® software provide instrument control and generation of normalised expression data for initial QC. These raw or normalised data can be exported to further Affymetrix freeware or third party software for further analysis and biological interpretation.

Microarrays

Affymetrix GeneChip® Arrays

Affymetrix GeneChip® arrays enable the analysis of thousands to millions of biomolecules from a single sample in a single assay. Commercial manufacture of a GeneChip® array, requires synthesise of up to millions of unique oligonucleotides in situ in precise locations on a small glass slide using photolithography and solid phase chemistry. This unique state-of-the-art design strategy results in highly specific, sensitive and reproducible data. Sequence information from databases such as UniGene and Genbank is used to select and design the synthetic probes, with each 'target' on a GeneChip® array typically represented by multiple oligonucleotide probes. The combined signal from all probe pairs for a specific 'target', is used to determine the signal intensity for that 'target' in your sample. Each GeneChip® array is provided in a sealed unit to limit background problems caused by dust.

In addition to global normalisation algorithms for expression arrays, each array also contains multiple house-keeping and other reference probes, allowing for alternative normalisation of data from multiple experiments where global normalisation may be problematic due to the experimental samples/design. ‘Spiking' of target material allows the efficiency of labelling and hybridisation to be monitored using these controls.

A broad range of GeneChip® arrays are available for expression and DNA analysis. These include high density genome wide (e.g. Human Exon 1.0 ST array with 5.5 million features and >1m exon clusters for exome analysis; Cytoscan HD: 2.6 million copy number markers; Human SNP 6.0: 1.8 million genetic markers – 906600 SNPs and 946000 CNVs) to low density targeted arrays (e.g. Universal 3K Tag Arrays for MIP assays).

RNA Analysis

DNA Analysis

Custom Arrays

My GeneChip Custom Array service for expression and genotyping of animal, plant and microbial genomes.

Affymetrix offers a Community Array Program, which enables you to design genotyping arrays for your species of interest. The Canine, Arabidopsis, Lettuce and Pepper DNA analysis arrays were designed and made commercially available via this programme. A minimum order volume is typically required for arrays designed within this programme.

Sample QC

We have well established and robustly validated QC/QA workflows. Quality is assessed at each stage of sample preparation and only samples meeting QC thresholds will move forward in each step of the protocol. Following hybridisation to arrays and data generation, the dataset is stringently assessed and any outlier samples are removed from (or flagged, as appropriate) the data set prior to further downstream analysis.

We have a range of equipment for QC during sample preparation:

  • Nanodrop 1000 microvolume UV-Vis spectrophotometer for accurate and sensitive DNA, RNA and protein quantitation
  • Qubit 2.0 fluorimeter for accurate and sensitive DNA, RNA and protein quantitation that is unaffected by free nucleotides, salts and other contaminants
  • Agilent Tapestation System for assessment of RNA and DNA quality

The Tapestation system provides accurate, reproducible and validated QC of RNA and DNA with minimum sample volumes and high throughput. Agilent 2100 BioAnalyser QC analysis can also be performed on request.

Validation

You should always validate microarray data using biological and technical replicate sets. This can be done using Northerns, real-time PCR or qRT-PCR, sequencing, etc. as appropriate. For expression analysis, additional studies assessing the protein expression levels of key transcripts in important pathways as well as mechanistic studies will add to the strength of your study. Ensure existing scientific literature supports your findings/ biological interpretation.

Software

Affymetrix Software

Affymetrix software manuals including data analysis fundamentals and an overview of experimental design, statistical analysis and biological interpretation of Affymetrix gene expression data using Affymetrix software are available for download. If you do not have the library file for the genome you are analysing on your computer, download the library files for the relevant Affymetrix GeneChip® array.

A range of Affymetrix freeware is available for data transformation, analysis or visualisation of the datasets generated on the relevant GeneChip® arrays.

NetAffx

NetAffx Analysis Centre: This analysis tool can be used to obtain extensive probe/ target sequence, design, annotation and ontological information for your data, thereby providing a source of all publicly available data for your gene/s of interest on the Affymetrix GeneChip™ array and facilitating data filtering, supervised analyses, biological interpretation and validation studies. Annotation files associated with all GeneChip™ catalogue arrays can be downloaded from the NetAffx analysis centre, as CSV tabular or MAGE-ML XML files.

The following functions are available in NetAffx:

  • NetAffx Query - Search probe sets for a term or identifier
  • Batch Query - Retrieve annotations for a probe list
  • BLAST - Find probe sets that BLAST align to your sequence(s) through BLAST
  • Probe Match - Find probes that identically match your sequence(s)
  • UCSC Query - Query the UCSC Browser including a custom track for your array of interest

NetAffx Query: This facility allows you to search array contents for each GeneChip™ array (e.g. search your selected GeneChip™ for your gene or pathway of interest - this information can then be downloaded into tools for advanced data analysis), access extensive annotation information from the public domain (including GenBank, Unigene, RefSeq, DBEST, SWALL, SWISPROT, etc. - the annotation information from each of these databases can be integrated into a single file for each gene of interest), and visualize probe and target alignments (full sequence information is available for all probe sets used on each GeneChip™ array as well as for all cluster members/ 'targets' recognised by each probe set).

Batch Query: This facility can be used to query GeneChip™ arrays using up to 500 gene names, probe set IDs or accession numbers. For example, this could be used to obtain full annotation information on your probe sets of interest (e.g. 100 genes in your predictive gene set).

Third Party Software

Raw and normalised datasets generated using Affymetrix GeneChip® arrays can be directly imported into many third party software solutions for more comprehensive and sophisticated analysis.

Some commonly used are listed below but many others are available:

Freeware

  • dChipfor analysis and visualisation of expression and SNP microarrays
  • MaxDa suite of programmes that provide a software package for storage and analysis of microarray data
  • SAM Significance Analysis of Microarrays. This is supervised learning software for genomic expression data mining (cDNA and oligo microarrays, and can also be applied to protein expression, SNP data and RNASeq). Correlates gene expression data to a wide variety of clinical parameters including treatment, diagnosis, survival and time trends. Provides estimate of False Discovery Rate for multiple testing. Convenient Excel Add-in
  • Bioconductor: This is an open source and open development software project for the analysis of genomic data. It initially focussed on analysis of microarray data, but can be used broadly in the analysis of genomic data generated using other technologies. This is a useful tool for bioinformaticians who may wish to further develop their own algorithms/genomic analysis software. Ris the language and environment for statistical computing used in the Bioconductor project. Bioconductor contains many different packages, including general, annotation, graphics, pre-processing and differential gene expression based packages. Bioconductor also provides metadata and experimental data submitted by various contributors. The affy and affydata packages provide example datasets for software development
  • RMA: Is a stand-alone graphic user interface (GUI) program for computing gene expression data from Affymetrix GeneChip™ arrays and carry out QC analysis using probe level metrics. This program uses the same algorithms as affy/Bioconductor (Robust Multichip Average expression summary) but is a windows-based program independent of R, targeted more at the biologist than the bioinformatician. It does not require R and is not dependent on the Bioconductor project
  • GenMAPP: This denotes GENe Microarray Pathway Profiler and is a graphical interface allowing gene expression data to be grouped according to gene ontology and displayed on existing or user generated maps representing biological pathways. Metabolic and regulatory pathways are available in KEGG
  • There are several free web-based packages available for identifying transcription factor motifs in your probe sets of interest and clustering your data according to putative common transcriptional regulatory mechanisms. These include ConSite, Genomatix, Improbizer, MEME/MAST and TESS

Commercial Software

There are many commercial software providers that provide solutions for analysis of GeneChip data. A few of these are listed below but there are many more.  Many provide a free trial of the software to test its suitability for your application.

Affymetrix list some third party software providers that provide GeneChip® compatible solutions for expression, DNA and pathway analysis or Laboratory Management.

Considerations

  • You should assess the utility of several statistical tests/ clustering algorithms to accurately analyse your data (determine the false call rate of each method). There are large datasets available from Affymetrix within the Data Resource Centreto prospectively assess the power of your statistical test or gene expression analysis software/ algorithm
  • Current scientific knowledge can be used to assess the biological interpretation of your own dataset. The accuracy will also be assessed in your subsequent validation studies
  • Software can be compatible with raw Affymetrix data (CEL file) and / or processed data (CHP files)
  • 'Proof of principle': e.g. do you see higher expression for known markers of your disease/disorder/treatment?
  • You should have a higher correlation (r) between replicates than between your biological groups. If not, you have a problem!
  • Be aware of outliers
  • Is it real? Look at replicates and absolute signals - you may have a 10 fold change but if the absolute signal is very low in both samples, this apparent change is unlikely to be relevant as both signals are below the level of detection of the system and are within the noise/ background. You must always include a background filter
  • Can you predict which group each sample belongs to, using your 'predictive gene set'?
MIAME Compliance

MIAME and Public Repositories

Nature, The Lancet, Cell, Science and most leading journals require the microarray gene expression data that you are publishing, to be available in the public domain. In order to publish your gene expression data it must be MIAME (Minimum Information About a Microarray Experiment)-compliant.

The Functional genomics Data (FGED) Society is actively involved in establishing standards for microarray (as well as other functional genomic and proteomic) data annotation and release into the public domain. FGED have a microarray annotations working group that provides guidelines for generating MIAME-compliant (including a MIAME checklist, which has been developed to assist authors, editors and reviewers of microarray papers) and MAGE-compliant data to assist in the development of microarray repositories and data analysis tools. MIAME is under continuous development to contend with new genomic technologies and applications.

The MicroArray and Gene Expression (MAGE) group within the FGED Society, facilitates representation of microarray gene expression data utilising established standards (MAGE and MIAME standards). MAGE have developed a data exchange object model (MAGE-OM) and a data exchange format (MAGE-ML) for microarray studies, as well as a software toolkit (MAGE-stk) to allow conversion between MAGE-OM and MAGE-ML using various programming platforms.

There are currently three main repositories for submission of MIAME-compliant microarray expression data: ArrayExpress at EBI, GEO at the NIH and CIBEX at DDBJ. Submission to the Stanford Microarray Database (SMD) is restricted to researchers linked to Stanford University, but a significant amount of data is publicly available.