CD-CODE: crowdsourcing condensate database and encyclopedia (2023)

Biomolecular condensates are membraneless organelles that selectively concentrate biomolecules (for example, proteins and nucleic acids) in the cell, with spatial and temporal precision1. In recent years, their role was implicated in several biochemical processes, in physiology and disease2. Consequently, biomolecular condensates are now leveraged as a new class of therapeutic targets3,4.

Basic science and drug discovery advances build upon published reports and the rate of new discoveries depends on timely accessibility to relevant data. However, as with every novel paradigm, new terms and concepts emerge and evolve as the field develops. Accordingly, currently available databases which catalog proteins involved in condensate formation use various definitions and criteria to define condensates and their constituent proteins and RNAs5,6,7,8. These are excellent databases curating proteins that phase separate. Specifically, LLPSDB7 and PhaSePro6 collect proteins that are thought to drive liquid–liquid phase separation, with the former curating exclusively in vitro data.

However, these databases do not answer the following questions regarding biomolecular condensates: What are the biomolecular condensates discovered and verified to date? What are their known protein components? Which condensates is a given protein known to belong to? What are the experimental evidences supporting the existence of a particular condensate? Our goal is to generate answers for these and other important questions, and to create a community-editable database to facilitate the dynamic data updates. Therefore, we designed a condensate-centric database, which is based on the scientific literature, and provides experimental evidences, scores and references for each condensate–protein relationship (Extended Data Figs. 1 and 2). This database is updated dynamically by contributors to keep up with the growing knowledge in the field. We call our platform CD-CODE, which consists of three main parts: (1) a database of biomolecular condensates and their protein constituents; (2) an encyclopedia for the scientific terms used in condensate biology; and (3) a crowdsourcing web application (Extended Data Fig. 3).

CD-CODE is a ‘living database’ designed for dynamic and rapid addition and review of information about condensates and proteins by users (Fig. 1) and is open to any expert researcher who wishes to contribute. Our user management system supports three types of users: viewers, contributors and maintainers. Viewers can read and download the curated information. Contributors can suggest edits and propose new condensate and protein entries (Extended Data Figs. 4 and 5). Maintainers are part of the development team, who curate the changes and accept or reject suggestions by contributors, who are then notified about the status of their suggestions and can engage in further discussion. To keep up with the rapidly evolving definitions, nomenclature and growing scientific evidence, the crowdsourcing platform allows the community to aggregate scientific findings in condensate biology.

Users can view and search the data, or become contributors after registration and edit the content of the database via the community-editable web application. The maintainers assure quality control and only approved edits will be part of the dynamically updated database. Figure created with BioRender.com.

Full size image

At the time of this report, CD-CODE (cd-code.org) contains 9,861 proteins linked to 244 unique biomolecular condensates (and 375 in vitro synthetic condensates) across 49 different organisms. Notably, these numbers are continuously changing as contributors add and review more data. CD-CODE, as a semi-manually curated and annotated resource, aggregates information from the primary literature (to date, PubMed references published until 1 June 2022 were manually curated) and other databases5,6,7,8 (Extended Data Fig. 6 and Extended Data Tables 1 and 2). To promote easy integration with other resources, protein entries are cross-referenced with UniProt9, Ensembl10 and the Human Protein Atlas (proteinatlas.org)11. Common sequence properties of condensate proteins are also displayed graphically, such as disorder score12 and amino acid composition (Extended Data Fig. 7), facilitating the identification of regions that may drive condensate partitioning.

We standardized the names of condensates by creating an ontology from the literature and grouped condensates by functional categories (Supplementary Table 1) to reveal the evolutionary history of condensates. Most known condensates are found in mammals and many are clade-specific (Fig. 2a). Since our current knowledge is sparse and likely biased, the evolutionary origin of condensates remains an open future research direction that CD-CODE can facilitate.

a, Biomolecular condensates across the tree of life. CD-CODE contains information about 244 condensates across 49 species. Here, only major clades are shown for clarity and condensates were grouped into functional categories. b, Many proteins localize to multiple condensates. There is a large overlap between the proteomes of different biomolecular condensates in humans. The largest condensates in humans are represented as circles and the shared proteins between every two condensates are shown (only condensates with >20 connections are shown). c, The distribution of condensate proteome sizes in humans. Most biomolecular condensates have a few known protein members. The largest condensates contain >1,000 different proteins (inset).

Source data

Full size image

While many proteins undergo liquid–liquid phase separation in vitro, it is unclear which proteins form condensates in cells and which condensates they partition into. To facilitate our understanding of condensate-specificity of proteins, we collected all known condensates a given protein was found in, and we curated the experimental evidence for association of each protein with a given condensate (confidence score, corresponding to zero to five stars: 1 star: literature evidence, PubMed identifier (ID); 2 stars, high-throughput; 3 stars, in vitro; 4 stars, in cellulo; and 5 stars, in vivo evidence). Condensates and proteins that have zero or one star rating have not been manually curated yet.

As expected, for dynamic cellular compartments, many proteins partition into different condensates and the overlap between condensate proteomes is substantial (Fig. 2b). While proteins may localize to multiple condensates (members), a few are obligate and essential components (drivers). We annotated 205 driver proteins in specific condensates, providing the corresponding experimental evidences. Our database revealed that several proteins that are drivers in one condensate are nonessential members of another (for example, G3BP1, a driver of stress granules, is also present in processing bodies (P-bodies) and neuronal ribonucleoprotein particle granules). CD-CODE will aid our understanding of the determinants of condensate-specific driver behavior, and whether a driver protein can be used as a ‘marker’ of a condensate in experiments.

Marker proteins are used to define the identity of the condensates and inform designing of condensate-targeting drug screening pipelines3. They are thought to be uniquely associated with a given condensate, and are commonly used to visualize condensates using microscopy, for example, in colocalization experiments to prove the localization of proteins into condensates. Our database revealed that several known marker proteins are not specific to a condensate. For example, whilst DCP1A is used as a marker for P-bodies, it also localizes to stress granules and nucleoli. Knowing specific protein components will facilitate the experimental design for accurate, specific identification of condensates.

CD-CODE enables us to answer the questions posed at the beginning: (1) there are currently 136 unique biomolecular condensates documented in the literature; (2) as an example, P-granules, which are the germ granules of Caenorhabditis elegans, have 190 documented protein components: one of them, pgl-3 (PGL3_CAEEL), is a driver for P-granule formation, and its presence within P-granules is supported by in vivo experimental evidence (5 star). Pgl-3 is exclusively reported to be associated with P-granules; thus, it is a P-granule-specific marker protein.

Databases that curate proteins undergoing liquid–liquid phase separation have facilitated the development of machine learning algorithms to predict phase separation13,14,15 and the discovery of what protein properties drive phase separation15. The next open question is which biomolecular condensate does a specific protein belong to. Our database contains a curated list of condensate proteomes (Fig. 2c), which can facilitate investigations of protein recruitment into specific condensates. Our resource can provide high-quality benchmarking data for machine learning algorithms aimed at predicting the protein components of condensates.

Furthermore, our comprehensive curation of condensate types and their respective composition in multiple species, and the level of experimental support, provides a valuable resource for drug hunters, which can inform the design of assays and screening pipelines. For example, in high-content imaging phenotypic screens, it is desired that the protein or protein combination chosen to be monitored is/are selective for the target condensate3. Additionally, through regular updating of the database by the community and via curation of new publications, CD-CODE supports and accelerates nomination of new condensate-associated drug targets.

The field of biomolecular condensates is highly transdisciplinary and ever-developing, where definitions and terms keep changing, creating a need for constant updates that require consensus within the community. The encyclopedia, as a standalone wiki, serves as a platform to aggregate knowledge about condensate research. In the future, we are planning weekly updates to integrate new data from the users, and yearly updates with new features and data points that become relevant to store, as the research field develops.

The main feature of CD-CODE is that it contains experimentally validated entries. However, caution should be exercised by users when interpreting lack of data on a particular protein, condensate or species, as this may simply reflect the biased interest of the community towards particular model systems and biological pathways. Any missing information could mean that (1) the protein or condensate has not been studied yet; (2) there is a research paper but the information has not been added to the database yet; (3) the condensate truly does not exist; or (4) the protein truly does not belong to a given condensate. As such, CD-CODE aims to highlight the unknowns in the field to guide future research questions to fill the gaps. These gaps in experimental evidence can be bridged by computational predictions16,17, which are beyond the scope of CD-CODE. Evolution of the CD-CODE database through ongoing curation of new experimental evidence will lead to a progressive increase in high-scoring condensate entries.

In summary, we present CD-CODE, a semi-manually curated condensate database, and a community-editable web application. The crowdsourcing platform allows the community to further scrutinize definitions and evidence as the field evolves. This will ensure that the ever-growing knowledge on condensate research is integrated into the database and into the encyclopedia in a timely manner.

FAQs

What is CD-code CrowDsourcing COndensate database and encyclopedia? ›

CD-CODE (CrowDsourcing COndensate Database and Encyclopedia) is a comprehensive, semi-manually curated crowdsourcing database of biomolecular condensates and their constituents as well as an encyclopedia for the scientific terms used to describe them.

What is the CD-code database? ›

CD-CODE is a 'living database' designed for dynamic and rapid addition and review of information about condensates and proteins by users (Fig. 1) and is open to any expert researcher who wishes to contribute.

How do I read data from a CD? ›

The CD drive shines a laser at the surface of the CD and can detect the reflective areas and the bumps by the amount of laser light they reflect. The drive converts the reflections into 1s and 0s to read digital data from the disc. See How CDs Work for more information.

How do I find my CD code? ›

Solution 1. Locate CD Key from Steam Library
  1. Launch Steam client or open the Steam account page in your browser.
  2. Sign in to Steam with your account and password.
  3. Go to Steam Library, select the game that you didn't get its CD key.
  4. Click the settings icon on the right pane, and select "Manage" > "CD Keys".
Feb 22, 2023

What information is stored in CD? ›

A Compact Disc (CD) is an optical disc used to store digital data, originally developed for storing digital audio.

How do I recover data from a CD? ›

Step 1: Prepare a tube of toothpaste, Vaseline or liquid car wax or Scratch Out. Step 2: Cover a thin layer of toothpaste, Vaseline or Scratch Out on the scratched CD/DVD. Step 3: Wipe disc in a radial motion: inside - outside with a clean and soft lint-free cloth.

Why can't I read data from a CD? ›

A dirty or scratched disc surface is the most common reason for a CD/DVD issue when inserted into a computer. Check the disc for damage and confirm that the disc is compatible with your computer. Clean the disc and check for damage: Clean any dust or smudges from the disc with filtered water and a lint free cloth.

How much data is stored on a CD? ›

A standard compact disc measures 4.7 inches, or 120 millimeters (mm), across, is 1.2 mm thick, weighs between 15 grams and 20 grams, and has a capacity of 80 minutes of audio, or 650 megabytes (MB) to 700 MB of data.

What is a CD key or activation code? ›

A CD key is a unique code and is proof that you have the original copy of the game or software. The CD key activates the PC game or product and adds it to your gaming library.

How do you check if my CD is real? ›

One of the most common and easiest ways is to look at the bottom of the disc. Recordable discs have a green, purple, or color of tint to them, unlike the traditional CD and DVD. When CDs and DVDs are made in the factor the data on them is stamped onto the disc and not burned.

How long does it take to get a code from CD keys? ›

You will receive the code on the day of the product's official release, not immediately on purchase. You will receive an email notification once we have sent you the product key.

What information goes on the back of a CD cover? ›

The back cover of the album should include the album's track list. Although the typography can be different here than on the front of the cover, it should still communicate stylistically and be easy to read at the same time.

What is the most data on a CD? ›

Standard CDs have a diameter of 120 millimetres (4.7 in) and are designed to hold up to 74 minutes of uncompressed stereo digital audio or about 650 MiB of data. Capacity is routinely extended to 80 minutes and 700 MiB by arranging data more closely on the same-sized disc.

Does CD store data permanently? ›

CD stands for Compact Disk. CD is an optical storage device used in computers for the permanent storage of data and information.

What is the use of CD-ROM database? ›

CD-ROMs are used as databases to store largequantity of data, in the form of bibliographical, full text, numerical, graphical, and even sound.

Is CD codes legit? ›

Yes. No question about it. They are a reliable and trustworthy platform for finding cheap game keys. If you have no problems with going gray-market with your game purchases, you might as well go with cdkeys.com.

What does CD stand for and what is it used for? ›

A compact disc is a portable storage medium that can record, store and play back audio, video and other data in digital form.

How do you use CD codes? ›

Register a new account or log in to the account where you would like to redeem your code. Click the “Redeem a Code” button on the “Overview” page, enter your unique code that's displayed on the 'Order' page of CDKeys.com, then hit Submit.

References

Top Articles
Latest Posts
Article information

Author: Errol Quitzon

Last Updated: 06/05/2023

Views: 5578

Rating: 4.9 / 5 (79 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Errol Quitzon

Birthday: 1993-04-02

Address: 70604 Haley Lane, Port Weldonside, TN 99233-0942

Phone: +9665282866296

Job: Product Retail Agent

Hobby: Computer programming, Horseback riding, Hooping, Dance, Ice skating, Backpacking, Rafting

Introduction: My name is Errol Quitzon, I am a fair, cute, fancy, clean, attractive, sparkling, kind person who loves writing and wants to share my knowledge and understanding with you.