Many diseases such as cancer or rheumatism are based on genetic defects. Will it be possible to treat them with gene therapy in the future? To get closer to this goal, MUDS doctoral researcher Laura Martens is trying to decipher the gene regulatory cell codes. She is being helped by data from the international megaproject Human Cell Atlas.
In October 2016, leading scientists from around the world met in London to discuss setting up a Human Cell Atlas. It would be a once-in-a-century project—because creating an overview of the human cell inventory would make it possible to define the basic cellular principles of health and describe a wide range of diseases with greater precision. Even though researchers have long been analyzing cells as the most fundamental building blocks of life, we still know very little about them. At the same time, we need a precise understanding of the various cell types, as this could enable new insights into the development and treatment of very different diseases—from autoimmune diseases like rheumatism, through cardiovascular diseases and chronic inflammation processes, to cancer.
Since October 2020, Laura Martens, who is based in Munich, has been involved in efforts to shed some light on the secrets of the human cell and is working toward this goal with data resources from the Human Cell Atlas. The young physicist from Bremen just completed a master’s degree in computational biology in the UK last year and was looking to return to Germany for her doctorate. That’s when a tweet from Fabian Theis at Helmholtz Zentrum München (HZM) came up on her Twitter account. He was writing to let her know about the new round of calls for applications at MUDS, the Munich School of Data Science. The school is part of the Helmholtz Information & Data Science Academy (HIDA), Germany’s largest postgraduate training network in the data sciences. Her application was successful, and Martens is now conducting research at MUDS on her PhD project, which is titled “Unraveling the gene regulatory code using single-cell multi-omic data.” “Data science with biology or biomedicine—that’s exactly what I want to be doing!” Martens’ work is supervised by two principle investigators, both of whom are experts in the field of computational biology. She refers to them as “Fabian and Julien”: Fabian Theis, Head of the Institute for Computational Biology at HZM and Julien Gagneur, Professor of Computational Molecular Medicine in the Department of Informatics at the Technical University of Munich.
Using the data of a megaproject
Martens isn’t alone in her enthusiasm for biomedical topics—people at over 1,200 research institutions on every continent are working to complete the Human Cell Atlas (HCA). Their aim is to provide an even better description of the human body. Eighty-two institutes are involved in the project in Germany alone, including four Helmholtz Centers (Helmholtz Zentrum München, German Cancer Research Center, German Center for Neurodegenerative Diseases, and the Max Delbrück Center for Molecular Medicine). The Atlas serves as a repository in which data from various cell analyses are made available to researchers for further study. This makes the HCA an outstanding example of how open science can work in practice—a topic that Helmholtz is also actively advancing with the goal of achieving benefits for science and society. As a large-scale, global research project, the HCA was made possible by a technology called single-cell RNA sequencing that was developed only 15 years or so ago.The technology can be used to analyze which RNA molecules are present in an individual cell, allowing conclusions to be drawn about the specific function—or dysfunction—of that cell.
Before single-cell sequencing was an option, the only analyses researchers could carry out on the genetic material in a tissue sample were unspecific, and they were unable to look at the differences between the cells—even though there are many very different types of cells in an organism. Martens, whose project draws on data that were acquired using this process, has an appealing way of describing it: A tissue sample of this kind can be imagined as a sort of “smoothie”; that is, a nice mix of various types of cells. When you analyze this material, she says, you therefore get an average of all the cells contained in the mix. “Now, with single-cell,” Martens continues, “you have the smoothie, but you can tell that it contains three strawberries and five blueberries; so now, you can look at individual cells to see what’s happening inside them.”
Not all cells are equal
The actual object of her research—the individual cells, or, to use the same metaphor, the blueberries—is extremely complex. This is because all cells are not the same. While every cell in the human body contains the identical copy of the genome, around three billion base pairs, they exhibit dramatic variations in their morphology and behavior patterns. But even cells that have been assigned to the same type up until now are different from one another. For a long time, it was thought that there are around 300 different types of cells in the human body, but thanks to a much more detailed knowledge of their biochemical “appearance,” we now know that this figure must be significantly larger. This is because single-cell RNA sequencing makes it possible to differentiate many types of cells in further sub-categories. This can be seen in the RNA, which transmits the blueprints for proteins that are encoded in the DNA to the ribosomes, the “workshops” of the cells, where the corresponding proteins are synthesized. As a result, looking at a cell’s complete RNA catalog makes it possible to see which genes are active there and how the cell is regulated.
The fact that a cell becomes a very specific type of cell can be attributed to various mechanisms known as gene regulatory elements, which have yet to be examined in detail. These elements determine how a gene of the cell DNA is “expressed”—for example, whether and how the transcription of a gene or the synthesis and degradation of RNA takes place. These various steps within the biosynthesis of proteins are a crucial factor in how the genetic information in a cell manifests itself. They indicate the function of the cell and how it develops. In other words, why do the blueberries become blueberries?
Martens now wants to look at some of these gene-regulating mechanisms—and decode their interactions. She started with a process that comes at the very end of protein biosynthesis, the degradation of RNA. “Of course I’d like to understand everything at once,” says Martens, “but I’m starting by having a try with this final element that involves the degradation of RNA.” This particular biomolecule, which carries the genetic information of the DNA and is needed to initiate the actual synthesis of proteins, only ever exists for a relatively short period. But if the RNA lasts for longer, for hours or days, for instance, more proteins can be produced; the RNA is used multiple times. In other words, the speed of RNA degradation has an effect on both the gene expression and the many other gene regulating mechanisms—and, like these, is encoded in the DNA sequence.
Unraveling the enigmatic combinatorics of gene regulation with Data Science
Martens knows that the interconnections between various types of genetic information are complex, and she wants to shed further light on these connections as she pursues her project. “You have to imagine it as an enormous network or enormous combinatorics that is interconnected, but we still don’t even know exactly how it actually works. It’s incredibly complex, and these various elements can’t even be properly unraveled yet.” She adds: “I feel overwhelmed by the complexity sometimes, too!” But it’s evident that the researcher doesn’t see this as reason for giving up; on the contrary, it’s what really motivates her. Because a moment later, she’s bubbling with enthusiasm for her new research terrain. “That’s what’s so incredibly fascinating about it at the same time; that this network has developed like this through evolution! It’s so complicated that you really can’t get to the bottom of it just by looking at the data. And that’s also why deep learning is coming into play now.”
This is because wherever data becomes unmanageable for people, artificial intelligence can help. Martens is using the complex data from single-cell sequencing as the basis for developing a general, computer-assisted framework—in other words, she is writing a program. This program is designed to facilitate systematic analyses that deal with the respective gene-regulating mechanisms specific to the cells. Martens’ goal is to make it flexible enough so that it can be generalized at the level of various single-cell sequencing data and is capable of modeling multiple steps involved in gene expression, from the beginning of transcription up to the breakdown of the RNA. In order to make it possible to read the gene regulatory code of cells in the future, Martens also wants to use machine learning models which are intended to help interpret the data that are input in the program. “The single-cell data reveal which genes are expressed and how pronounced this expression is. But our goal is to understand why this is the case. So I want to know which input was especially important in a dataset.” A program of this type could, for example, be useful for researchers who want to understand the gene-regulating mechanisms of certain types of cells that are important for the functioning of an organ. They include the scientists in Fabian Theis’ research group, who are working on the Human Lung Cell Atlas, another project relating to the Human Cell Atlas.
"We don’t even know what’s broken in many cases”
During the first few months of her PhD, Martens’ main focus was getting the data into a format that would make it possible to apply machine learning tools. The doctoral researcher’s main job involves a lot of sitting in front of a computer. And even more so now due to the lockdown in Germany, which means that not only do her day-to-day interactions with the two labs take place online—her regular communication with the other new MUDS doctoral researchers across all the various disciplines does too. These contacts are something that Martens really appreciates about the program. In other words, the open-minded doctoral researcher doesn’t live up to the cliché of the introspective IT nerd at all even though, as she says with a laugh, she spends the entire day fiddling with code. “When I started studying, I never would have thought that I would end up spending all my time writing program code,” she admits.
But as someone who describes herself as more of a “theoretician,” programming doesn’t appear to be a huge obstacle, but rather her actual calling. Her knowledge of data science is helping Martens to understand at least a small part of this large, complex network that is gene regulation—and to support medical research in its pursuit of new therapy options in the process. This is what motivates Martens: “Many diseases go back to DNA. Once you understand how their building blocks interact with and influence each other, you can naturally try to repair things in specific places. Right now, we don’t even know what’s broken in many cases.” The prospect of gene therapy treatment, for instance using genome surgery, is a distant prospect on the horizon of the Human Cell Atlas. Laura Martens’ contribution to the unraveling of the gene regulatory codes of the cells appears to be akin to the way the initiators of the Human Cell Atlas described their historic research project: “ambitious, but achievable.”
Deciphering the gene regulatory codes in our cells could improve the treatment of numerous diseases such as cancer or rheumatism in the future. MUDS doctoral researcher Laura Martens wants to contribute to this with her data science expertise.
A precise understanding of the human genome can help to better understand the development of diseases. Manolis Kellis from MIT explains how insights from AI can be used to treat some diseases in the future. A video podcast.
HIDA and the Academy for Theater and Digitality Dortmund jointly announce a research fellowship for artists, technicians, and scientists who want to work at the intersection of digital art and cutting-edge digital research.