Data Analysis Framework

Data Analysis Framework (DAF) is an infrastructure program providing data processing services based on integrated tools. DAF is open source and the source code in java is available

You can use the data processing services provided by DAF by clicking on the orange "+ Join DAF" button.

Warp2D Web App

By clicking Try this App above you can try out Warp2D with example files using test account!

We provide 10 GB disk space with web based data management for each user to store initial input files and data obtained after processing.

You can access this user space by into DAF using your existing ID at common service providers such as , Google, Yahoo, OpenID and others.

Find us on Web
Warp2D - 2D Time Alignment
Background

Warp2D1 is an efficient, fundamentally new approach to correct for non linear retention time shifts between complex proteomics and metabolomics LC-MS data sets. Warp2D1 operates on peak lists and use the integral of overlapping peak volume of the reference and sample chromatograms in benefit function with Correlation Optimized Warping algorithm.

We have developed a user friendly web page and provide data processing service using our distributed computational framework called Data Analysis Framework (DAF) which uses the Dutch Life Science Grid as computational resource to perform time alignment between large number of chromatogram peak lists and to evaluate the quality of the time alignment using standardized arithmetic and geometric mean of overlapping peak volumes. This score can take values between 1 (all peaks overlap perfectly) and 0 (no peak overlap), and is calculated before and after alignment. The difference reflects the alignment enhancement. The user friendly web page enables to upload and manage peak list in space delimited text format, and to arrange them in lists. The list of peak lists can be selected either as reference (chromatograms to which sample chromatograms are aligned) or sample (chromatograms which are aligned to the reference, and which change retention time after the time alignment operation), end perform all possible time alignments resulting from the cross product of the two reference and sample list of peak lists. For all alignments in one run the same parameters are used.

Input files format:
Warp2D1 requires one input file, which is a peak list in a space delimited ASCII file containing 3 columns with the following order: mass to charge ratio(1), retention time (2) and peak quantity (3) e.g. peak height, area or volume. One row contains data for one peak. Peak lists should have extension of .pks and may contain the first row as header. Sample data for input peak list is provided below:

Sample data for input peak list:
MZ RT Quantity
1384.45 106.248 86969.1
651.471 133.294 65311.5
1460.11 100.129 30593.4
1460.13 100.365 30593.4
258.515 67.3362 110894
585.002 121.893 111538
484.041 118.281 69817.1
331.511 130.022 84492.6
230.974 77.1685 30568.9
1168.55 70.3323 58774.5

Output files:
Warp2D1 provides three output files: 1. warped samples peak list in the same format as the input peak list with .wpks extension
2. the .tmap file contains the retention time transformation function. The function is in form of old and new retention time pairs, and the changed retention time can be calculated by using linear interpolation..
3. the .qual file contains quality measure of standardizes geometric and arithmetic means and other parameters.

The quality file contains the following 19. parameters in space delimited text format: a) ref_file_name: file name of the reference peak list
b) sam_file_name: file name of the sample peak list
c) output_file_name: file of the output files
d) mzwidth: standard deviation of LC-MS peak representing the peak width in mass dimension
e) rtwidth: standard deviation of LC-MS peak representing the peak width in retention time dimension
f) maxmzwidth: maximal peak width in mz dimension (applicable only if all mzwidth is not used and peak width is provided for all peaks)
g) maxrtwidth: maximal peak width in rt dimension (applicable only if all mzwidth is not used and peak width is provided for all peaks)
h) winSize: size of the COW segment in points (for more detail see ref. 1.)
i) slack: the size slack parameter in points (for more detail see ref 1.)
j) maxPeaksSgmt: maximum number of most intensive peaks used per segment (for more details see ref 1.)
k) sampShift: constant retention time shift in the time unite of the input peak list (it is useful if large constant retention time shift is present between the reference and sample chromatograms)
l) ref_NTimePoints: number of points covering the full retention time domain used in COW procedure (for more details see ref 1.) in the reference peak list
m) sam_NTimePoints: number of points covering the full retention time domain used in COW procedure (for more details see ref 1.) in the sample peak list
n) ref_nPeaksTotal: number of total peaks in the reference peak list
o) sam_nPeaksTotal: number of total peaks in the sample peak list
p) ref_unWarped_peaks_vol: sum of peak volume in the reference peak list before time alignment
q) sam_unWarped_peaks_vol: sum of peak volume in the sample peak list before time alignment
r) overlap_unWarped_ref_sam_peaks_vol: sum of overlapping peak volume before time alignment
s) geometricRatio_unWarped: geometric overlapping peak volume ratio before time alignment (see equation 3. in ref 2.)
t) meanRatio_unWarped: average overlapping peak volume before time alignment (see equation 4. in ref 2.)
u) ref_Warped_peaks_vol: sum of peak volume in the reference peak list after time alignment
v) sam_Warped_peaks_vol: sum of peak volume in the sample peak list after time alignment
w) overlap_Warped_ref_sam_peaks_vol: sum of overlapping peak volume after time alignment
x) geometricRatio_Warped: geometric overlapping peak volume ratio after time alignment (see equation 3. in ref 2.)
y) meanRatio_Warped: average overlapping peak volume after time alignment (see equation 4. in ref 2.)

Parameters:
Warp2D1 has following 10 parameters (parameter names are in italics, optional parameters are between <>, default values (if present) are provided after the equality sign, and explanation of the parameters is followed after colon):

1. <-mzwidth=0.2>: standard deviation of LC-MS peak representing the peak width in mass dimension
2. <-rtwidth=0.2>: standard deviation of LC-MS peak representing the peak width in retention time dimension
3. reference peak list: file name of the reference peak list with .pks extension (the reference list contains all reference peak lists)
4. sample peak list: file name of the sample peak list with .pks extension (the sample list contains all sample peak lists)
5. <Job title>: is the file name of output files (the peak list with warped retention time with .wpks extension, the warping function file with .tmap extension and the file containing the quality information with .qual extension)
6. <Window size=50>: size of the COW segment in points (for more detail see ref. 1.)
7. <Slack=10>: the size slack parameter in points (for more detail see ref 1.)
8. <Max peaks/segment=50>: maximum number of most intensive peaks used per segment (for more details see ref 1.)
9. <No. of time points=2000>: number of points covering the full retention time domain used in COW procedure (for more details see ref 1.)
10. <Const. time shift=0>: constant retention time shift in the time unite of the input peak list (it is useful if large constant retention time shift is present between the reference and sample chromatograms)

In case if you have any difficulty regarding parameters contact us.

Usage
To use Warp2D1 time alignment processing service login using one of the popular community portals such as , Hyves, AOL, Yahoo, Live journal, Google or by using OpenID by clicking on login at the top of the page.
The appearing web interface has four main parts accessible via horizontal and vertical tabs. Data Management/Data Home is dedicated for user data management. Here logged in users can upload their initial peak lists, organize them into list of peak lists and delete files after processing. The User Home/Submission deals with job submission using separate lists for the reference and for the samples peak lists and enable setting the parameters. After pushing the Submission button all combination between the reference and sample lists of peak list are submitted for time alignment to the Dutch Life Science Grid. The monitoring of the submitted jobs can be performed at the User Home/Desktop, where it is possible to get information on the processing status and possible errors on the submitted jobs using dedicated messaging system and color indication. After processing being completed the outcome of the analysis may be downloaded or deleted at Data Management/Results. Each submission has a job ID and each time alignment output files with the console and error outputs are arrange in a directory tree under the job ID. Besides the directory of each time alignment results the reference and sample lists of peak list, the concatenated quality files of all time alignment and the heat map of the arithmetic mean overlapping peak volume before and after time alignment rearranged with hierarchical clustering after time alignment is provided.

Tutorial and example data set
Tutorial presenting how to use Warp2D online application is accessible via the following link. Video tutorial is under preparation and will be available soon. Example data set containing peak lists obtained by QTOF analysis of mouse serum obtained during experimental design with the following factors:

(1) different mouse cancer models (Prostate TRAMP, Breast Chodosh, Lung EGFR)
(2) treatments with different depletion techniques (non depleted, MARS, MARS + Cysteine peptide capture, Glyco peptide capture)
(3) measurements in different laboratories (Lab1, Lab2)
(4) with mice having cancer or being healthy (tumor, normal).

More details about the original data can be obtained at National Cancer Institute Mouse Proteomic Technology Initiative. The peak lists were obtained using in house developed peak picking method using geometric peak detection approach. The peak list can be downloaded here.

APML converter
Currently there is no available standard format for LC-MS peak list, and the only published format is APML format introduced with Corra LC-MS data processing framework. EBI is coordinating the development of new standard format mzQuantML for preprocessed LC-MS data, which is not ready at present. For that reason we are using the above described space separated text format, however we have written a java command line converter from APML to our format. The command line tool can be used as follow:

java -jar ReadAPML.jar inputFile.apml outputDirectory

Taverna workflow and command line option
Warp2D time alignment processing service can be accessed through web service. In order to facilitate the use of the web service, we have developed a Taverna workflow and java command line tool, which users can use to process local files and obtained the results of the time alignment procedure to their local machine. This allow to incorporate Warp2D time alignment service in workflow or any other programs or scripts while having access to the processing power of the DLSG Grid.

The invoking "java -jar warp2d.jar -h" command provides all options to run the standalone java tool. Java 1.6 should be present in user path. If one or more of the options are not used then default value(s) for low resolution ion-trap data is used for time alignment. When executing library in the directory of lib\ should be present in the local path or in the tool execution directory. Example to align the Lab2_MARS_BreastChodosh_normal_R1_3c.pks sample LC-MS peak list to Lab1_MARS_BreastChodosh_normal_R1_3c.pks reference LC-MS peak is the following:

java -jar warp2d.jar -refPeakList Lab2_MARS_BreastChodosh_normal_R1_3c.pks -sampPeakList Lab1_MARS_BreastChodosh_normal_R1_3c.pks -rtWidth 0.3 -mzWidth 0.01 -outputFileName test_data -winSize 50 -slack 20 -maxPeaksPerSegment 50 -nTimePoints 3000 -sampShift 0

References
1. Suits, F., Lepre, J., Du, P., Bischoff, R., Horvatovich, P., Two-dimensional method for time aligning liquid chromatography-mass spectrometry data, Anal. Chem., 2008, 80(9), 3095-3104.
2. Ahmad, I., Suits, F., Morris, Swertz, M., Byelas, G., Dijkstra, M., Hooft, R., Katsubo, D., van Breukelen, B., Bischoff, R., Horvatovich, P., A high-throughput, user-friendly processing service for retention time alignment of complex proteomics and metabolomics LC-MS data, application note submitted to Bioinformatics.