DNA vs. Protein Sequencing

James Prashant Fonseka
3 min readFeb 4


This year may herald a major breakthrough in single-molecule protein sequencing. This in turn could be one of the most significant technological advancements of the year. To most it isn’t immediately obvious why that is the case. After all, DNA sequencing has been around for some time and has matured significantly, especially with these past few years’ improvements in cost and speed. Given that DNA and proteins are inextricably linked, with DNA being the template for almost proteins that exist in our world, why is it important to be able to sequence proteins? The answer, put simply, is vantage.

With DNA, we get the blueprints for all that lives and its subcomponents, including proteins. There are some processes in which proteins are synthesized from specialized types of RNA that were not encoded from DNA, but those are the exceptions that illustrate the general rule that proteins, and specifically their amino acid chains, are encoded in DNA. While the DNA of any cell should theoretically have the information needed to determine the composition of any proteins contained within, that information is very noisy.

Imagine if I gave you a list of all of the outfits of every person attending Paris Fashion Week, over 200,000 people in a typical year. That list contained detailed information about the size, designer, and style of each article of clothing worn by each person. If you were curious about someone’s outfit, you could theoretically figure out exactly what they were wearing. But imagine if that list were also unlabeled by name — there is no way to directly identify which person was wearing which outfit. Based on observable information like, say, the color of a shirt, you could start to go through the master list and figure out which could possibly be the right entry. But at the scale we’re talking about, which in some cases might need an increase of several orders of magnitude to match the actual complexity of cellular instructions, this information may not get one even remotely close to the correct database entry. It would make much more sense to be able to look at a person and decode their outfit. Basically, that’s the value of protein sequencing.

With protein sequencing go from the end product of a protein and determine its amino acid make up. If we’re looking to modify, replicate, categorize, or otherwise research proteins, this becomes tremendously valuable. This is in inherently hard problem given that that complex, folded structures of most proteins. DNA is hard enough to sequence, but it starts it off a relatively simple and stable double helix structure that we can unwind then decode. There are immensely more degrees of complexity to the structure of a protein, but our ability to manipulate proteins and biological compounds at a molecular level has improved significantly. Until now, it was difficult to understand individual proteins.

In the past, mass spectroscopy has been the most effective tool in determining protein composition. This is an inherently fuzzy process technique that amounts more estimation than observation. Proteins with a relatively low occurrence in a group would often not register at all with traditional techniques. Single-molecule sequencing is useful not just for identifying isolated proteins, but also for picking out proteins with a low occurrence in the more standard context group proteomic analysis.

While not as commonly discussed or understood as DNA sequencing, advancements in protein sequencing will be a big boon to biological research and engineering. Before we build and design in synthetic biology, it is imperative that we have the tools to comprehensively understand what already exists in our world. In service of that aim, single-molecule techniques will meaningfully sharpen the essential tool of protein sequencing.