Proteomics Biostats: get the most out of your array data

High density antibody arrays represent powerful technologies which have opened a new era in functional proteomics. However, when faced with a large magnitude of array data, you must apply comprehensive statistical and data mining analysis to discover potential biomarker panels and meaningful patterns. Biostatistical analysis can also help to evaluate the quality of protein expression data by detecting and extracting errors and confidence level. As one can get easily lost at this game, this post illustrates a selection of packages available, taking the example of High density antibody array services performed by tebu-bio laboratories in Europe, in collaboration with their partner RayBiotech. I doubt you will want to do it your self when you have realized what can be done for you, and at very affordable pricing!

Initial Data Clean-Up

A pre-processing service is performed on raw array expression data to ensure high quality results. This service includes: Data normalization & transformation, Data outlier detection & extraction and Data filtering. The best way to describe it is to show you an example of the report generated.

Cluster Analysis

Cluster analysis identifies groups of markers with similar and different expression profiles across groups. The types of cluster analysis that will be performed include hierarchical clustering and PCA plot:

Hierarchical cluster and heatmap
of 8 samples where red represents increased expression level
and blue represents decreased expression level

If you want to see more, just download an example of a cluster analysis report.

Pathway Analysis

Pathway analysis identifies the specific protein functions, biological pathways, and physical interactions that are enriched in a particular group. The data are obtained from GO (Gene Ontology), KEGG (Kyoto Encyclopedia of Genes and Genomes) and STRING. Your final report will contain the following information:

  • Pathway enrichment
  • List of enriched pathways and FDR values
  • List of the proteins, their known functions and processes, and p-values related to their enrichment
  • Enriched biological pathways
  • Enriched molecular functions
  • Enriched biological processes
  • Protein interaction mapping
  • List of proteins identified in your study that have known interactions with each other
  • Protein interaction map (see figure below)

Protein interaction map
where proteins are represented as nodes and interactions as edges

Differential Expression Analysis

If your group is composed of a minimum of 3 samples, statistical analysis identifies the proteins that are statistically significant between different groups of samples. The types of statistical analyses include:

  • t-test
  • ANOVA (Analysis of Variance)
  • Wilcoxon Rank-Sum
  • SAM (Significance Analysis of Microarray)

Again an example of a report is the best way to see the way the data are presented, in the form of volcano plots and Jitter or swarm plots.

Biomarker Selection

Biomarker selection uses a variety of models to identify a subset of biomarkers that best differentiate the control from test samples. This service is only relevant when working with group sizes from 10 samples and above. The models that are used include:

  • Logistic regression
  • Linear discriminant analysis (LDA)
  • Support vector machine (SVM)
  • Random forest
  • Other models that may be used

If I have convinced you to outsource your proteomics studies from antibody array to biostats analysis, feel free to contact me and we can discuss your project in more detail!

Written by Isabelle Nobiron, PhD
Isabelle is a Product Manager at tebu-bio, and also the company's ISO Quality Manager.