Cookies Disclaimer

Our site saves small pieces of text information (cookies) on your device in order to deliver better content and for statistical purposes. You can disable the usage of cookies by changing the settings of your browser. By browsing our website without changing the browser settings you grant us permission to store that information on your device.

PepTracker®: Encyclopedia of Proteome Dynamics

PART OF THE PEPTRACKER® PROJECT

History and Goals of the EPD



The Encyclopedia of Proteome Dynamics (EPD) was conceived in the Lamond Laboratory (www.LamondLab.com), as an "Open Data ", tool for sharing, analysing and visualising the results from proteomics projects and other, large-scale data sets. The EPD provides access to data generated by the Lamond group and by our collaborators. It integrates the information from multiple, complex data sets and quantitative studies on human cells and model organisms in a single, user-friendly online resource. The EPD is specifically designed to maximise the value of our data for the wider biomedical and biological research communities by making access to all the data simple and convenient, in a consistent format, via an intuitive, custom-designed graphical interface.

The EPD displays data analyses and visualisations, in the form of interactive graphs and plots etc., appropriate to the various different types of experiments described in the database. It has been designed using consistent formats and colour schemes, aiming to convey the information clearly and concisely and to be visually appealing. The EPD also provides convenient links through to other relevant databases and online resources that we frequently use, including UniProt, STRING and WormBase. The EPD is part of the PepTracker proteomics software project in the Lamond laboratory.

The main goal in creating the EPD has been to promote an "Open Data" policy and to improve the communication and sharing of large, complex data sets with the broad community of biologists and biomedical researchers, in addition to groups interested in either proteomics, or bioinformatics. Thus, we aim not just to make our data technically “available”, but to establish a new "gold standard" for sharing complex, large-scale data as usefully as possible, mindful of the needs and interests of the intended major users, i.e., molecular cell biologists and biomedical researchers.



















The focus of the EPD therefore is not data archiving and nor is it intended to duplicate storage of raw MS files for use by specialist mass spectrometry and bioinformatics groups. As part of our Open Data policy, we do however also share all of our published raw MS data via the PRIDE partner repository of Proteome Xchange providing links to these files within the EPD.

With the EPD, we have concentrated instead on providing complementary data access in the form of interactive plots displaying processed MS data. We believe this is of direct value for the large number of molecular and cellular biologists interested in data describing the dynamic changes in protein function and abundance in human cells and model organisms.

The design and functionality of the EPD remains a work in progress. It is continually reviewed and revised in response to constructive feedback. The scope and scale of the EPD evolves as additional features and new data sets are incorporated. This reflects the burgeoning nature of the proteomics field itself, which is growing rapidly as new technologies and experimental methods are developed. While the scale and complexity of the data are growing, we are commited to retaining our core goal of providing free access to our data in a user-friendly, searchable format.

The EPD was first launched in 2013, when it was featured in our proteomic study on differential rates of degradation of human proteins in different subcellular compartments of U2OS cells (Larance et. al., Molecular and Cellular Proteomics, 2013). In this first release, the EPD incorporated also data from several of our previous, large-scale studies of subcellular protein localisation and turnover rates in different human cell lines (Boisvert et. al., Molecular and Cellular Proteomics, 2010; Boisvert et. al., Molecular and Cellular Proteomics, 2012; Ahmad et al., Molecular and Cellular Proteomics, 2012; Larance et. al., Molecular and Cellular Proteomics, 2013).

We subsequently expanded the EPD to include other types of proteomics data, including analysis of human protein complexes, isolated under native conditions and fractionated by HPLC using SEC chromatography (Kirkwood et. al., Molecular and Cellular Proteomics, 2013) and studies of cell cycle variation in both protein and mRNA levels, analysed in human myeloid NB4 cells (Ly et. al., eLife, 2014; 2015).

EPD 1.0 version





EPD 1.2 search section

As the scale and complexity of the data sets within the EPD continued to grow, technical upgrades were required to improve performance and support new features. In August 2014 we released a major update to the EPD (v1.2), featuring a redesign of the user interface and expanded functionality. The EPD backend (i.e. underlying data management technology) was migrated to a NoSQL structure, using Cassandra, providing improvements in speed and more flexibility in design options. The colour schemes of the interface were also unified and standardized and numerous other improvements introduced to enhance the user experience when searching and browsing data.

Migration to EPD version 1.2 involved a doubling of the amount of data available and provided additional functionality for browsing and exploring these data. The raw MS files used to generate all of the EPD data sets were made freely available by depositing them in PRIDE and every processed data set displayed in the EPD was made available for convenient download. Despite these advances, the EPD in version 1.2 remained ‘protein centric’ in the design of its user interface (i.e. it assumed that most users would interact with the database by performing a search based on a specific protein of interest), while most of the data plots and graphs displayed almost were limited in their interactivity.




In its latest version (EPD v2), we have implemented a major redesign of the user interface to facilitate exploring complex, "multidimensional" data sets. We harnessed the power of state of the art technologies, using D3 to provide directed force diagrams as a front end, combined with Neo4J to provide hierarchical navigation as a back end. The result is creation of richly featured, highly interactive plots with every data point displayed selectable and annotated with relevant information and further links to other related data. Now every single data plot has been converted into an interactive element. Thus, all points can be either moused over, or clicked on, to get extra information regarding the content that is being displayed. In this way we believe users of the EPD will find the interactivity very intuitive and easy to use to find exactly the information required.

In EPD v2 it is no longer necessary to explore the database by first selecting a specific protein of interest. Instead the expanded graphical interface provides multiple routes for filtering and exploring the entire data set based on a variety of alternative starting points, e.g. "organism", "experiment type" or "cell type" etc. The option remains, however, to filter the dataset to specifically restrict the search to only display data sets containing a selected protein of interest. With the new exploration interface, all the branches of the hierarchy can be navigated to reach the same end point, but traversing the data through different paths, as best suited for the interests of different users.

The latest EPD 2.0 version now supports access from mobile platforms (i.e. tablets and smartphones), as well as from laptop and desktop computers running most widely used browsers (e.g. Firefox, Chrome, Safari etc). We are committed to releasing further upgrades and improvements to the EPD and we have in place a roadmap for new features that will be incorporated together with many new data sets that will be uploaded. As always, we welcome user feedback and suggestions for additional features and new functionality.





EPD 2.0 data browser



Technical Notes

Resources and Recognition


We believe that the scientific research community benefits from expanding existing knowledge through sharing, collaboration and by providing open access to data and resources. We have created the EPD according to this “ Open Data ” concept of promoting the sharing of our results as effectively as possible with the community. We also wish to acknowledge the achievements and hard work done by others in the community that we have made use of in creating the EPD.

We would like to acknowledge the Uniprot database in particular, which provides a major resource that the EPD makes use of. Thus, the “ General Information ” section of the EPD is created based upon Uniprot data. We also thank the groups who have created and maintained the String Database, which we have also used extensively. The String association network map (example shown on the right) is produced by String db (thanks due to the Jensen laboratory). We provide a link to their site for each protein queried in the EPD, when available, on the External Links section.

The External Links section of the EPD acknowledges multiple useful resources that provide additional information on specific proteins that are queried in the EPD. For example, C elegans proteins are linked through to cognate entries in “ Wormbase ”, while human proteins are linked through to entries in the Protein Data Bank and the Human Protein Atlas.

We welcome suggestions for additional, useful resources that can be linked in future to the EPD and we also welcome feedback on any resources that we have used and inadvertently overlooked in the acknowledgements





Linking to the EPD


The EPD has been significantly updated in this latest release (March 2016). Nonetheless the link system used in the EPD v1.2 is still functional. The format is shown bellow:


http://peptracker.com/epd/protein/(insert uniprot accession here)/

The current EPD version (2.0) still uses a similar url format to version 1.2. It is stil based on the Uniprot accession, however the format has been slightly modified to better represent the changes implemented in the EPD.

http://peptracker.com/epd/analytics/?protein_id=(insert uniprot accession here)/