What do cancer, the flu, and breast milk all have in common? They all rely on glycans, large carbohydrate molecules found in all living cells.

While biologists have known about glycans for decades, their diversity and complexity have presented challenges for identifying glycan structures and functions. Twenty-first century technology tools such as machine learning and artificial intelligence (AI) have made it possible for scientists to delve deeper into the world of glycans. New research by Dr. Daniel Bojar and colleagues at the University of Gothenburg, Sweden, has improved our understanding of glycan structures and functions and opened possibilities for glycan-targeted diagnostic tools and treatments for an array of diseases. 

Every cell that scientists have identified is covered by a sugary forest of complex molecules known as glycans1. Glycans are also called complex carbohydrates or polysaccharides.

Figure 1.

Electron micrograph of a bacterial cell. The hair-like structures surrounding the cell are glycans.

[Source: https://commons.wikimedia.org/wiki/File:Bacillus_subtilis.jpg]

Glycans are complex because they are large molecules composed of many smaller molecules.  They facilitate cell-to-cell communication, known as cell signaling, and play an important role in the immune system2. Over 60% of all proteins secreted by human cells are modified by glycans, which shape the structure and function of the resulting protein. Yet scientists know very little about glycans compared to other foundational biological molecules like DNA or proteins. This is why Dr. Bojar and others in the field refer to glycans as the “dark matter” of biology.

The building blocks of glycans are small molecules called monosaccharides3. Examples of monosaccharides include glucose and fructose. Monosaccharides are linked together through chemical reactions by specialized proteins called enzymes4

Figure 2.

Examples of glycan structures in yeast, insects, mammals, plants, and humans. Each shape represents a different monosaccharide as indicated in the figure legend to the left.

[Source: https://en.wikipedia.org/wiki/N-linked_glycosylation#]

Monosaccharides are linked together into glycan chains. Researchers have identified over 100 monosaccharides, with thousands of possible combinations. There are 12-20 different ways that two monosaccharides can be linked in a glycan chain. The type of linkage determines the structure and function of the resulting glycan. For example, two glucose molecules can be linked to form a disaccharide5. The resulting disaccharide can be found in honey, caramel, black mold, or anti-freezing agents, depending on the type of linkage between the two monosaccharides. 

Figure 3.

Sucrose molecule, a disaccharide made of one glucose and one fructose molecule. Sucrose is the main component of table sugar.

[Source: https://en.wikipedia.org/wiki/Sucrose]

There is no template or set of instructions for building glycans with specific structures. Rather, a series of enzymes adds or removes a monosaccharide in the chain. The resulting glycans tend to be branched and nonlinear. These glycans can be found alone or in combination with other molecules to form glycoproteins6 and glycolipids7

Figure 4.

Glycolipids and glycoproteins are commonly found on the cell membrane along with other molecules.

[Source: https://biologydictionary.net/glycoprotein/]

One reason that glycan research lags behind that of other biological molecules is that glycan diversity makes data analysis challenging and time-consuming. However, with twenty-first century technology, researchers can now train computers to analyze large sets of data to look for patterns. An important part of Dr. Bojar’s research is leveraging these tools to develop improved methods for analyzing glycan data. 

In one study, Dr. Bojar and colleagues developed a model to predict whether a given glycan would interact with the human immune system. To do this, the researchers first gathered existing glycan datasets into a large database with 12,674 glycans from 1,726 species of organisms. They broke down the glycans into 1,027 unique monosaccharides and linkages called “glycoletters” that formed the building blocks of all the glycans in the database. These building blocks formed 19,866 chains of three monosaccharides and two linkages called “glycowords.” 

Figure 5.

Schematic of the process to identify “glycoletters” and “glycowords” by gathering existing glycan datasets into a larger database.

[Source: Bojar et al 2021, Fig 2a]

Next, Dr. Bojar and colleagues trained a computer model to recognize glycowords that could stimulate an immune response in humans. They did this by adding glycans known to provoke an immune response, called immunogenic8 glycans, to the database. This enabled the computer model to compare a given glycoword against immunogenic glycans to predict whether an unknown glycan might interact with the human immune system. This model was 92% accurate, demonstrating the power of this method for glycan analysis. 

Dr. Bojar encourages anyone interested in this research to start practicing training computer models using the publicly available glycan database and software he developed, called Glycowork. Glycowork contains over 50,000 unique glycan sequences including labels based on organism species, tissue type, and disease state. This database is available for anyone to use, including middle and high school students!

Since glycans are intricately involved in human health and illnesses, there are many possible applications of this research to improve diagnostic tools and treatments for a variety of diseases. For example, one recent project applied glycan analysis to the diagnosis and treatment of cancer. 

Figure 6.

Schematic describing the application of glycan analysis to cancer.

[Source: Dr. Bojar]

Dr. Bojar and his team collected data on glycan sequences based on cancer type from the existing literature. The database they used contained glycan sequences from 220 cancer patients with different types of cancer. Analysis of the data showed several key glycan sequences commonly dysregulated or suppressed across cancer types. In the future, this kind of analysis could lead to better methods for diagnosing cancer based on the presence of specific glycan structures known to be found only in cancer cells. It could also open new pathways for targeted cancer treatments.

In another research project, Dr. Bojar focused on glycoproteins used by pathogens9 like viruses10 and bacteria11 to infect their host12. All viruses and most pathogenic bacteria have glycans on their cell surface that allow them to communicate with host cells and cause infection. Dr. Bojar developed a model to predict which glycan receptors13 on the host cell the proteins on the pathogen cell might bind to. This could lead to new ways to prevent disease by blocking the binding between the glycan receptor and the pathogenic protein. 

Predicting glycan structure and function also allows scientists to monitor viruses as they mutate over time. One example was a recognizable shift in glycan binding between the influenza virus that infected birds and the influenza virus that mutated to infect mammals. 

An ongoing project involves the study of breast milk. Glycans are a primary component of breast milk and help infants develop properly and protect them from diseases. Although all mammals make breast milk, its content differs dramatically among species. Dr. Bojar and his team gathered breast milk from nine mammal species: alpaca, beluga whale, black rhinoceros, bottlenose dolphin, impala, L’Hoest’s monkey, pygmy hippopotamus, domestic sheep, and striped dolphin. The breast milk samples were provided by zoos, research foundations, and individuals. Analysis of the samples revealed differences in glycan content between species and identified 393 glycans, including 108 that had never been described. 

Figure 7.

Common glycan features in nine mammal species.

[Source: Jin et al 2023, Fig 7b]

Some of these glycans produced a strong reaction in human immune cells and may be beneficial components to add to baby formula. Dr. Bojar also hopes to understand the evolution of breast milk glycans by mapping the breast milk glycans of as many mammalian species as possible. This research contributes to improving our understanding of glycan diversity and breast milk functions across mammalian species. 

Dr. Daniel Bojar is Associate Senior Lecturer for Bioinformatics at the Department for Chemistry and Molecular Biology & the Wallenberg Centre for Molecular and Translational Medicine at the University of Gothenburg, Sweden. His research focuses on applying novel analytical techniques to improve our understanding of glycans and their role in human diseases. When not in the laboratory, Dr. Bojar enjoys playing board games, writing fantasy stories, hiking, and spending time with friends.


  1. Bojar, D. et al. 2021. “Deep-learning resources for studying glycan-mediated host-microbe interactions.” Cell Host& Microbe, 29(1): 132-144. https://www.cell.com/cell-host-microbe/fulltext/S1931-3128(20)30562-X
  2. Thomès, L. et al. 2021. “Glycowork: A Python package for glycan data science and machine learning.” Glycobiology, 31: 1240-44. https://academic.oup.com/glycob/article/31/10/1240/6311240
  3. Burkholz, R. et al. 2021. “Using graph convolutional neural networks to learn a representation for glycans.” Cell Reports, 35. https://www.cell.com/cell-reports/fulltext/S2211-1247(21)00616-1
  4. Lundstrøm, J. et al. 2023. “Decoding glycomics with a suite of methods for differential expression analysis.” Cell Reports Methods, 3(12). https://www.cell.com/cell-reports-methods/fulltext/S2667-2375(23)00323-5
  5. Jin, C. et al. 2023. “Breast Milk Oligosaccharides Contain Immunomodulatory Glucuronic Acid and LacdiNAc.” Molecu lar and Cellular Proteomics. https://www.mcponline.org/article/S1535-9476(23)00146-9/fulltext
  1. Bojar Lab. https://dbojar.com/bojar-lab/
  2. Glycowork. https://bojarlab.github.io/glycowork/
  3. Crowell, R. “Researchers Read the Sugary ‘Language’ on Cell Surfaces.” 3 May 21. https://www.quantamagazine.org/researchers-read-the-sugary-language-on-cell-surfaces-20210503/

Written by Rebecca Kranz with Andrea Gwosdow, PhD at www.gwosdow.com

  1. Glycan: A complex chain of smaller sugar molecules linked together. https://www.cancer.gov/publications/dictionaries/cancer-terms/def/glycan ↩︎
  2. Immune system: A network of organs, tissues, and cells that fight infection and prevent disease. https://www.ncbi.nlm.nih.gov/books/NBK279364/ ↩︎
  3. Monosaccharide: Smallest sugar molecules that form the basis of carbohydrates and glycans. https://www.britannica.com/science/monosaccharide ↩︎
  4. Enzyme: Proteins that speed up chemical reactions. https://www.genome.gov/genetics-glossary/Enzyme ↩︎
  5. Disaccharide: Small sugar molecule composed of two linked monosaccharides. https://www.britannica.com/science/disaccharide ↩︎
  6. Glycoprotein: Large molecules composed of a glycan and a protein. https://www.news-medical.net/health/What-is-a-Glycoprotein.aspx ↩︎
  7. Glycolipid: Large molecules composed of a glycan and a lipid. https://en.wikipedia.org/wiki/Glycolipid ↩︎
  8. Immunogenic: Producing an immune response. https://www.merriam-webster.com/dictionary/immunogenic ↩︎
  9. Pathogen: Any organism that causes disease. https://www.healthline.com/health/what-is-a-pathogen ↩︎
  10. Virus: Infectious organism that replicates inside a host. https://www.genome.gov/genetics-glossary/Virus ↩︎
  11. Bacteria: Single-celled organisms found almost everywhere on earth; most bacteria are harmless, but some can cause disease. https://www.genome.gov/genetics-glossary/Bacteria ↩︎
  12. Host: A living cell that can be invaded by infectious agents like viruses and bacteria. https://www.merriam-webster.com/dictionary/host%20cell ↩︎
  13. Receptor: A molecule on or inside the cell that binds specific substances to cause specific effects within the cell. https://www.cancer.gov/publications/dictionaries/cancer-terms/def/receptor ↩︎