PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 90%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
19771023
1978326
1979632
1980436
1981844
19821862
19831173
19841184
19851296
19869105
198711116
198825141
198945186
199051237
199155292
199267359
1993221580
19944361,016
19953281,344
19963911,735
19975422,277
19987172,994
19998433,837
20009514,788
200110085,796
200210606,856
200314888,344
2004204610,390
2005226912,659
2006252515,184
2007286518,049
2008264920,698
2009269723,395
2010273726,132
2011250428,636
2012270531,341
2013292734,268
2014358037,848
2015293740,785
2016342444,209
2017361347,822
2018333551,157
2019368554,842
2020456459,406
2021403463,440
2022491568,355
2023460972,964
2024487577,839
2025555383,392
202698484,376