PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 70%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977821
1978324
1979226
1980430
1981838
19821755
1983964
19841074
19851286
1986894
198711105
198823128
198940168
199041209
199150259
199259318
1993178496
1994347843
19952741,117
19963281,445
19974571,902
19985942,496
19997103,206
20008224,028
20018754,903
20029325,835
200313167,151
200418198,970
2005200010,970
2006223213,202
2007245515,657
2008228617,943
2009229820,241
2010230022,541
2011205324,594
2012220726,801
2013233429,135
2014282431,959
2015229234,251
2016257036,821
2017267639,497
2018260642,103
2019279344,896
2020345148,347
2021281551,162
2022353454,696
2023342958,125
2024346561,590
2025393465,524
202663966,163