RCSB PDB Help
Search and Browse > Advanced Search
3D Similarity Search
Introduction
The functions of biological molecules are determined by their 3D shape, which means that molecules with similar 3D structures often share similar functions.
The Protein Data Bank (PDB) continues to grow each year, with an increasing number of experimental and integrative structures of varying size and complexity. Many of these structures represent assemblies, which may consist of multiple proteins or multiple copies of the same protein. The coordinates for these assemblies may include:
- Deposited coordinates or specific subsets of the model
- Coordinates derived from symmetry operations applied to the deposited model
When comparing the shapes of complexes, it is important to consider the full reconstructed assembly, since a single PDB entry may include multiple biologically relevant assemblies.
In addition, RCSB.org provides access to over a million Computed Structure Models (CSMs). Unlike experimental structures, the coordinates of CSMs do not include symmetry-related information, so the model and assembly coordinates are identical.
Finding and classifying 3D structures is essential for understanding functional and evolutionary relationships. While sequence-based searches can identify conserved domains in proteins, many biological examples show that proteins can have similar shapes and functions despite sequence variations. Additionally, a single protein may adopt multiple conformations, such as open and closed forms of an enzyme, which cannot be detected through sequence-based searches alone. These cases require structure similarity search methods.
Many proteins are also stabilized or function as part of assemblies, interacting with one or more copies of themselves or with other proteins. Structure similarity searches allow you to identify similar assemblies, enabling exploration of both the shape of individual proteins and their interactions within complexes.
How 3D Similarity Search Works?
The 3D Similarity search option allows you to query the PDB archive using the three-dimensional (3D) shape of a protein structure. This method, developed by RCSB PDB (Segura et al., 2026), represents proteins as fixed-length vector embeddings rather than just atomic coordinates and chain connectivities.
Protein embeddings are calculated by combining the ESM3 Protein Language Model and a transformer neural network that aggregates residue-level embeddings into a fixed-length vector. These vectors are designed to approximate TMscores between the protein structures that they encode.
In combination with vector databases, the embedding search enables a fast and accurate identification of structurally similar proteins.
Documentation
You can access the 3D Similarity search by opening Advanced Search and clicking on (+) 3D Similarity from the list of available search tools, or go directly to the search using this link: 3D Similarity Search.
How to Provide a Query
The 3D Similarity search three ways to provide a query structure:
Select an existing PDB or CSM structure
Choose a structure directly from the PDB archive or available Computed Structure Models (CSMs). The selected structure will be loaded automatically for use as the query.
Upload a local coordinates file
Upload a file in a supported format (e.g., PDB, mmCIF, or binaryCIF) from your computer. The uploaded coordinates will be loaded automatically for use as the query.
Files with the extensions “.cif”, “.bcif”, “.pdb”, and “.ent”, as well as their gzipped (“.gz”) versions, are supported. After you select a file, it is automatically uploaded to RCSB PDB servers. Your file will be assigned a unique, randomly generated URL. This URL cannot be guessed by other users; however, anyone who does have the link will be able to access the file.
Uploaded files remain available for 90 days, allowing you to bookmark your search or share it with collaborators during that time. If you need a persistent reference—for example, in a publication, blog post, or any long-lived resource—you should upload your structure to an external file-sharing service (such as Dropbox or Google Drive) and use the URL option instead. This same approach is required for queries saved in MyPDB.
The maximum supported file size is 10 MB. Larger files must be hosted externally and referenced through a URL.
Use URL to reference coordinates file
Provide a direct link to a structure file hosted online. You will need to click Load for the system to retrieve and load the structure for use as the query.
This option can be used to search for structures similar to a 3D model hosted outside of RCSB.org, such as predictions from AlphaFold, RoseTTAFold, or ESMFold, as well as structures available in other public data resources. By providing a direct URL to the coordinates file, the 3D Similarity search will retrieve and load the model automatically, allowing you to use it as the query without needing to download and re-upload the file manually.
Modifying the Query
Once a structure is loaded using any of the three modes, you can refine the query in several ways:
- Use a specific chain or assembly as the query:
- Select a chain by its ID or an assembly by its ID.
- This allows focusing the search on a specific part of the structure rather than the entire molecule.
- Interactive 3D selection:
- Integration with the 3D viewer enables you to visually select the query chain or assembly directly in the structure.
- This provides a more intuitive way to define the search query for complex structures.
Advanced Options
Search Targets
Controls whether your query structure is compared against individual chains or full assemblies:
- Assemblies: Use this to match your query to complete assemblies (this is relevant if you are interested in the overall shape of a complex).
- Chains: Use this to match your query to individual chains of protein structures.
Reasonable defaults are applied automatically. For example, if the query is defined using an assembly, the search will look for assemblies; if a chain is selected, the search will target individual chains. However, it may be helpful to adjust these options if your query returns no results or does not produce the expected matches.
Similarity Type
For any structure similarity search it is possible to choose between two modes of matching using the drop down menu:
- Global: Embedding model scores are scaled based on the relationship between the number of residues in the structures, using the factor: min(query_length, target_length) / max(query_length, target_length). Useful to force global matches between 3D structures.
- Local: Uses unmodified scores from the embedding model. Local structure similarity may rank higher. Useful to find matches between local regions of 3D structures.
Predicted Score Cutoff
Minimum predicted score threshold used to filter and retain structural matches. The default value of 0.8 provides a practical balance between precision and recall. Decreasing this threshold increases coverage of returned structures but may introduce lower-confidence matches toward the tail of the ranking.
Query By Example
All 3D structures available from the RCSB.org (experimental structures and CSMs) have a dedicated Structure Summary page that displays information about the entities and assemblies of that entry. To search for structures similar to any one polymer entity in the structure click on the 3D Structure link above the details listed for the macromolecule.
|
| Options to launch a structure based search from the structure summary page (highlighted in a red box). |
To search for assemblies similar to a specific assembly of the structure click on the Find Similar Assemblies link written below the snapshot of the assembly on the page.
|
| Options to launch a search for an assembly from the structure summary page. Click on link highlighted in the red box. |
Search Results
Depending on the selected search options, the structure similarity results will list similar Macromolecules (Polymer Entities) or Assemblies.
For chain-based searches, each search results can be superposed on the query chain and viewed interactively in 3D using the Pairwise Structure Alignment tool. Simply click the Align in 3D button next to “Structure Match” to open the alignment.
Note: This button is available only when the search is based on an existing PDB or CSM structure. It does not appear when the query structure is uploaded from a local file or provided via a URL.
For assembly-based searches, each matched assembly is assigned a structure match score, which represents the probability (expressed as a percentage) that the assembly matches the query structure. A score of 1.0 indicates a perfect match, while lower values reflect decreasing levels of structural similarity.
Limitations of 3D Similarity Search
The structure similarity search system has some limitations:
- The method can not report an RMSD since it bypasses the need for an alignment and knows nothing about residues that are paired in the alignment. Instead, the method outputs a score that indicates the likelihood that the match is relevant.
- The method cannot distinguish between assemblies that share identical subunit composition but adopt different conformations.
- Scores are not calibrated; different queries may require different thresholds to optimize search accuracy.
- The method does not support searches for structures composed exclusively of DNA or RNA. For assemblies containing both proteins and nucleic acids, only protein chains are considered.
Examples
Find chains similar to Myoglobin
- Use the PDB/CSM ID option to select a chain from an existing PDB structure, such as pdb_00001mbn
- Chain A is selected by default
- Click the Search button to return matching polymer entities from the PDB archive
- Enable Include CSM and run the search again to retrieve both matching experimental structures and predicted models
Find chains similar to the open form of hexokinase
- Use a structure of the enzyme hexokinase in an “open” conformation as a query (PDB ID pdb_00002yhx, chain A)
- Click the Search button to return matching polymer entities
Find assemblies similar to the SARS-CoV-2 Spike protein trimer
- The SARS-CoV-2 spike protein is composed of three polymer chains, each of which has a receptor-binding domain that can be in an open (or up) conformation for interacting with cellular receptors or a closed (or down) conformation.
- To find spike structures where all three receptor-binding domains are closed, use PDB ID pdb_00006vxx as a query
- Change query structure selection to Assembly, Assembly 1 will be selected by default
- Click the Search button to return matching assemblies
Find assemblies similar to Insulin hexamers
- Use PDB ID pdb_00001trz as a query
- Change query structure selection to Assembly and change the query to match to Assembly 3
- Click the Search button to return matching assemblies
Find assemblies containing Aspartate Carbamoyltransferase enzyme
- Use PDB ID pdb_00001acm as a query
- Use query structure selection to Chain, A
- Change Search Targets option to match Assemblies with Local similarity
- Click the Search button to return matching assemblies
Find assemblies similar to the Chymotrypsin polymer
- Use PDB ID pdb_00001k2i as a query
- Chain A [auth 1] is selected by default
- Change Search Targets option to match Assemblies
- Click the Search button to return matching assemblies
References
- Joan Segura, Ruben Sanchez-Garcia, Sebastian Bittrich, Yana Rose, Stephen K Burley, Jose M Duarte (2026) Multi-scale structural similarity embedding search across entire proteomes, https://doi.org/10.1093/bioinformatics/btag058














