Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Primary Research | Definition, Types, & Examples

Primary Research | Definition, Types, & Examples

Published on January 14, 2023 by Tegan George . Revised on January 12, 2024.

Primary research is a research method that relies on direct data collection , rather than relying on data that’s already been collected by someone else. In other words, primary research is any type of research that you undertake yourself, firsthand, while using data that has already been collected is called secondary research .

Primary research is often used in qualitative research , particularly in survey methodology, questionnaires, focus groups, and various types of interviews . While quantitative primary research does exist, it’s not as common.

Table of contents

When to use primary research, types of primary research, examples of primary research, advantages and disadvantages of primary research, other interesting articles, frequently asked questions.

Primary research is any research that you conduct yourself. It can be as simple as a 2-question survey, or as in-depth as a years-long longitudinal study . The only key is that data must be collected firsthand by you.

Primary research is often used to supplement or strengthen existing secondary research. It is usually exploratory in nature, concerned with examining a research question where no preexisting knowledge exists. It is also sometimes called original research for this reason.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Primary research can take many forms, but the most common types are:

  • Surveys and questionnaires
  • Observational studies
  • Interviews and focus groups

Surveys and questionnaires collect information about a group of people by asking them questions and analyzing the results. They are a solid choice if your research topic seeks to investigate something about the characteristics, preferences, opinions, or beliefs of a group of people.

Surveys and questionnaires can take place online, in person, or through the mail. It is best to have a combination of open-ended and closed-ended questions, and how the questions are phrased matters. Be sure to avoid leading questions, and ask any related questions in groups, starting with the most basic ones first.

Observational studies are an easy and popular way to answer a research question based purely on what you, the researcher, observes. If there are practical or ethical concerns that prevent you from conducting a traditional experiment , observational studies are often a good stopgap.

There are three types of observational studies: cross-sectional studies , cohort studies, and case-control studies. If you decide to conduct observational research, you can choose the one that’s best for you. All three are quite straightforward and easy to design—just beware of confounding variables and observer bias creeping into your analysis.

Similarly to surveys and questionnaires, interviews and focus groups also rely on asking questions to collect information about a group of people. However, how this is done is slightly different. Instead of sending your questions out into the world, interviews and focus groups involve two or more people—one of whom is you, the interviewer, who asks the questions.

There are 3 main types of interviews:

  • Structured interviews ask predetermined questions in a predetermined order.
  • Unstructured interviews are more flexible and free-flowing, proceeding based on the interviewee’s previous answers.
  • Semi-structured interviews fall in between, asking a mix of predetermined questions and off-the-cuff questions.

While interviews are a rich source of information, they can also be deceptively challenging to do well. Be careful of interviewer bias creeping into your process. This is best mitigated by avoiding double-barreled questions and paying close attention to your tone and delivery while asking questions.

Alternatively, a focus group is a group interview, led by a moderator. Focus groups can provide more nuanced interactions than individual interviews, but their small sample size means that external validity is low.

Primary Research and Secondary Research

Primary research can often be quite simple to pursue yourself. Here are a few examples of different research methods you can use to explore different topics.

Primary research is a great choice for many research projects, but it has distinct advantages and disadvantages.

Advantages of primary research

Advantages include:

  • The ability to conduct really tailored, thorough research, down to the “nitty-gritty” of your topic . You decide what you want to study or observe and how to go about doing that.
  • You maintain control over the quality of the data collected, and can ensure firsthand that it is objective, reliable , and valid .
  • The ensuing results are yours, for you to disseminate as you see fit. You maintain proprietary control over what you find out, allowing you to share your findings with like-minded individuals or those conducting related research that interests you for replication or discussion purposes.

Disadvantages of primary research

Disadvantages include:

  • In order to be done well, primary research can be very expensive and time consuming. If you are constrained in terms of time or funding, it can be very difficult to conduct your own high-quality primary research.
  • Primary research is often insufficient as a standalone research method, requiring secondary research to bolster it.
  • Primary research can be prone to various types of research bias . Bias can manifest on the part of the researcher as observer bias , Pygmalion effect , or demand characteristics . It can occur on the part of participants as a Hawthorne effect or social desirability bias .

Prevent plagiarism. Run a free check.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Chi square goodness of fit test
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Inclusion and exclusion criteria

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

The 3 main types of primary research are:

Exploratory research aims to explore the main aspects of an under-researched problem, while explanatory research aims to explain the causes and consequences of a well-defined problem.

There are several methods you can use to decrease the impact of confounding variables on your research: restriction, matching, statistical control and randomization.

In restriction , you restrict your sample by only including certain subjects that have the same values of potential confounding variables.

In matching , you match each of the subjects in your treatment group with a counterpart in the comparison group. The matched subjects have the same values on any potential confounding variables, and only differ in the independent variable .

In statistical control , you include potential confounders as variables in your regression .

In randomization , you randomly assign the treatment (or independent variable) in your study to a sufficiently large number of subjects, which allows you to control for all potential confounding variables.

A questionnaire is a data collection tool or instrument, while a survey is an overarching research method that involves collecting and analyzing data from people using questionnaires.

When conducting research, collecting original data has significant advantages:

  • You can tailor data collection to your specific research aims (e.g. understanding the needs of your consumers or user testing your website)
  • You can control and standardize the process for high reliability and validity (e.g. choosing appropriate measurements and sampling methods )

However, there are also some drawbacks: data collection can be time-consuming, labor-intensive and expensive. In some cases, it’s more efficient to use secondary data that has already been collected by someone else, but the data might be less reliable.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

George, T. (2024, January 12). Primary Research | Definition, Types, & Examples. Scribbr. Retrieved September 3, 2024, from https://www.scribbr.com/methodology/primary-research/

Is this article helpful?

Tegan George

Tegan George

Other students also liked, data collection | definition, methods & examples, observer bias | definition, examples, prevention, what is qualitative research | methods & examples, what is your plagiarism score.

  • Student Services
  • Faculty Services

Peer Review and Primary Literature: An Introduction: Is it Primary Research? How Do I Know?

  • Scholarly Journal vs. Magazine
  • Peer Review: What is it?
  • Finding Peer-Reviewed Articles
  • Primary Journal Literature
  • Is it Primary Research? How Do I Know?

Components of a Primary Research Study

As indicated on a previous page, Peer-Reviewed Journals also include non -primary content. Simply limiting your search results in a database to "peer-reviewed" will not retrieve a list of only primary research studies.

Learn to recognize the parts of a primary research study. Terminology will vary slightly from discipline to discipline and from journal to journal.  However, there are common components to most research studies.

When you run a search, find a promising article in your results list and then look at the record for that item (usually by clicking on the title). The full database record for an item usually includes an abstract or summary--sometimes prepared by the journal or database, but often written by the author(s) themselves. This will usually give a clear indication of whether the article is a primary study.  For example, here is a full database record from a search for family violence and support in SocINDEX with Full Text :

Although the abstract often tells the story, you will need to read the article to know for sure. Besides scanning the Abstract or Summary, look for the following components: (I am only capturing small article segments for illustration.)

Look for the words METHOD or METHODOLOGY . The authors should explain how they conducted their research.

NOTE: Different Journals and Disciplines will use different terms to mean similar things. If instead of " Method " or " Methodology " you see a heading that says " Research Design " or " Data Collection ," you have a similar indicator that the scholar-authors have done original research.

  

Look for the section called RESULTS . This details what the author(s) found out after conducting their research.

Charts , Tables , Graphs , Maps and other displays help to summarize and present the findings of the research.

A Discussion indicates the significance of findings, acknowledges limitations of the research study, and suggests further research.

References , a Bibliography or List of Works Cited indicates a literature review and shows other studies and works that were consulted. USE THIS PART OF THE STUDY! If you find one or two good recent studies, you can identify some important earlier studies simply by going through the bibliographies of those articles.

A FINAL NOTE:  If you are ever unclear about whether a particular article is appropriate to use in your paper, it is best to show that article to your professor and discuss it with them.  The professor is the final judge since they will be assigning your grade.

Subject Guide

Profile Photo

  • << Previous: Primary Journal Literature
  • Last Updated: Aug 21, 2024 10:06 AM
  • URL: https://suffolk.libguides.com/PeerandPrimary

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

What is Primary Research and How do I get Started?

OWL logo

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

Primary research is any type of research that you collect yourself. Examples include surveys, interviews, observations, and ethnographic research. A good researcher knows how to use both primary and secondary sources in their writing and to integrate them in a cohesive fashion.

Conducting primary research is a useful skill to acquire as it can greatly supplement your research in secondary sources, such as journals, magazines, or books. You can also use it as the focus of your writing project. Primary research is an excellent skill to learn as it can be useful in a variety of settings including business, personal, and academic.

But I’m not an expert!

With some careful planning, primary research can be done by anyone, even students new to writing at the university level. The information provided on this page will help you get started.

What types of projects or activities benefit from primary research?

When you are working on a local problem that may not have been addressed before and little research is there to back it up.

When you are working on writing about a specific group of people or a specific person.

When you are working on a topic that is relatively new or original and few publications exist on the subject.

You can also use primary research to confirm or dispute national results with local trends.

What types of primary research can be done?

Many types of primary research exist. This guide is designed to provide you with an overview of primary research that is often done in writing classes.

Interviews: Interviews are one-on-one or small group question and answer sessions. Interviews will provide a lot of information from a small number of people and are useful when you want to get an expert or knowledgeable opinion on a subject.

Surveys: Surveys are a form of questioning that is more rigid than interviews and that involve larger groups of people. Surveys will provide a limited amount of information from a large group of people and are useful when you want to learn what a larger population thinks.

Observations: Observations involve taking organized notes about occurrences in the world. Observations provide you insight about specific people, events, or locales and are useful when you want to learn more about an event without the biased viewpoint of an interview.

Analysis: Analysis involves collecting data and organizing it in some fashion based on criteria you develop. They are useful when you want to find some trend or pattern. A type of analysis would be to record commercials on three major television networks and analyze gender roles.

Where do I start?

Consider the following questions when beginning to think about conducting primary research:

  • What do I want to discover?
  • How do I plan on discovering it? (This is called your research methods or methodology)
  • Who am I going to talk to/observe/survey? (These people are called your subjects or participants)
  • How am I going to be able to gain access to these groups or individuals?
  • What are my biases about this topic?
  • How can I make sure my biases are not reflected in my research methods?
  • What do I expect to discover?

Banner

Finding Primary Research Articles in the Sciences: Home

  • Advanced Search-Databases
  • Primary vs. Secondary
  • Analyzing a Primary Research Article
  • MLA, APA, and Chicago Style

This guide goes over how to find and analyze primary research articles in the sciences (e.g. nutrition, health sciences and nursing, biology, chemistry, physics, sociology, psychology). In addition, the guide explains how to tell the difference between a primary source and a secondary source in scientific subject areas.

If you are looking for how to find primary sources in the humanities and social sciences, such as direct experience accounts in newspapers, diaries, artwork and so forth, please see   Finding Primary Sources in the Humanities and Social Sciences . 

Recommended Databases

To get started, choose one of the databases below.  Once you log in, enter your search terms to start looking for primary articles. 

Watch a Tutorial

  • Link to all Polk State College Library databases

Login Required

You must log in to use library databases and eBooks. When prompted to log in, enter your Passport credentials. 

If you have trouble, try  resetting your Passport pin , sending an email to  [email protected] ,  or calling the Help Desk at 863.292.3652 . 

You can also get help from Ask a Librarian . 

Search Tips

Keep your search terms simple.

  • No need to type full sentences into the database search box.  Limit your search to 2-3 words.
  • There is no need to type "research article" into the search box.

Use the "Advanced Search" feature of the database.

  • This will allow you to limit your search to only peer reviewed articles or a certain time frame (for example: 2013 or later).
  • Click the red tab above for tips on advanced search strategies .

Re-read the assignment guidelines often

  • Does this article satisfy the scope of the assignment (e.g. a study focused on nutrition)?
  • Does it meet the criteria for the assignment (e.g. an original research article)?

Not finding what you are looking for?

  • Ask a Librarian!

Cover Art

Search and Find a Primary Research Article

Are you looking for a primary research journal article if so, that is an article that reports on the results of an original research study conducted by the authors themselves. .

You can use the library's databases to search for primary research articles.  A research article will almost always be published in a peer-reviewed journal. Therefore, it is a good idea to limit your results to peer-reviewed articles. Click on the  Advanced Search-Databases tab at the top of this guide for instructions. 

The following is _not_ primary research:

Review articles are studies that arrive at conclusions after looking over other studies. Therefore, review articles are not  primary (think "first") research.  There are a variety of review articles, including:

  • Literature Reviews
  • Systematic Reviews
  • Meta-Analyses 
  • Scoping Reviews
  • Topical Reviews
  • A review/assessment of the evidence

Having trouble?  Look for a  method section within the article. If the method section includes the process used to conduct the research, how the data was gathered and analyzed and any limitations or ethical concerns to the study, then it is most likely a primary research article. For example: a research article will describe the number of people (e.g. 175 adults with celiac disease) who participated in the study and who were used to collect data.

If the method section describes how the authors found articles on a topic using search terms or databases , then it is mostly likely a secondary review article and not primary research. If there is no method section, it is not a primary research article.

Other sections in a journal: 

Your search may yield these items, too. You can skip these because they are not full write-ups of research:

  • Conference Proceedings 
  • Symposium Publications

Example of a primary research article found in the Library's Academic Search Complete database : (these authors conducted an original research study)

  • Lumia et al. (2015) Lumia, M., Takkinen, H., Luukkainen, P., Kaila, M., Lehtinen, J. S., Nwaru, B. I., Tuokkola, J., Niemelä, O., Haapala, A., Ilonen, J., Simell, O., Knip, M., Veijola, R., & Virtanen, S. M. (2015). Food consumption and risk of childhood asthma. Pediatric Allergy & Immunology, 26(8), 789–796. https://doi.org/10.1111/pai.12352

Example of a secondary article found in the Library's Academic Search Complete database : (these authors are reviewing the work of other authors)

  • Rachmah et al. (2022) Rachmah, Q., Martiana, T., Mulyono, Paskarini, I., Dwiyanti, E., Widajati, N., Ernawati, M., Ardyanto, Y. D., Tualeka, A. R., Haqi, D. N., Arini, S. Y., & Alayyannur, P. A. (2022). The effectiveness of nutrition and health intervention in workplace setting: A systematic review. Journal of Public Health Research, 11(1), 1–8. https://doi.org/10.4081/jphr.2021.2312

How do I know if this article is primary?

You've found an article in the library databases but how do you know if it's primary .

Look for these sections: (terminology may vary)

  • abstract  - summarizes paper in one paragraph, states the purpose of the study
  • methods  - explaining how the experiment was conducted (note: if the method section discusses how a search was conducted that is _not_ primary research) 
  • results  - detailing what happened and providing raw data sets (often as tables or graphs)
  • conclusions  - connecting the results with theories and other research
  • references  - to previous research or theories that influenced the research

Scan the article you found to see if it includes the sections above. You don't have to read the full article (yet). Look for the clues highlighted in the images below. 

primary articles

Questions? Use Ask a Librarian

  • Next: Advanced Search-Databases >>
  • Last Updated: Aug 8, 2024 4:22 PM
  • URL: https://libguides.polk.edu/primaryresearch

Polk State College is committed to equal access/equal opportunity in its programs, activities, and employment. For additional information, visit polk.edu/compliance .

primary research articles

Main Navigation Menu

Peer-review and primary research.

  • Getting Started With Peer-Reviewed Literature

Primary Research

Identifying a primary research article.

  • Finding Peer-Reviewed Journal Articles
  • Finding Randomized Controlled Trials (RCTs)
  • Evaluating Scholarly Articles
  • Google Scholar
  • Tips for Reading Journal Articles

STEM Librarian

Profile Photo

Primary research or a primary study refers to a research article that is an author’s original research that is almost always published in a peer-reviewed journal. A primary study reports on the details, methods and results of a research study. These articles often have a standard structure of a format called IMRAD, referring to sections of an article: Introduction, Methods, Results and Discussion. Primary research studies will start with a review of the previous literature, however, the rest of the article will focus on the authors’ original research. Literature reviews can be published in peer-reviewed journals, however, they are not primary research.

Primary studies are part of primary sources but should not be mistaken for primary documents. Primary documents are usually original sources such as a letter, a diary, a speech or an autobiography. They are a first person view of an event or a period. Typically, if you are a Humanities major, you will be asked to find primary documents for your paper however, if you are in Social Sciences or the Sciences you are most likely going to be asked to find primary research studies. If you are unsure, ask your professor or a librarian for help.

A primary research or study is an empirical research that is published in peer-reviewed journals. Some ways of recognizing whether an article is a primary research article when searching a database:

1. The abstract includes a research question or a hypothesis,  methods and results.

primary research articles

2. Studies can have tables and charts representing data findings.

primary research articles

3. The article includes a section for "methods” or “methodology” and "results".

primary research articles

4. Discussion section indicates findings and discusses limitations of the research study, and suggests further research.

primary research articles

5. Check the reference section because it will refer you to the studies and works that were consulted. You can use this section to find other studies on that particular topic.

primary research articles

The following are not to be confused with primary research articles:

- Literature reviews

- Meta-analyses or systematic reviews (these studies make conclusions based on research on many other studies)

  • << Previous: Getting Started With Peer-Reviewed Literature
  • Next: Finding Peer-Reviewed Journal Articles >>
  • Last Updated: Feb 15, 2024 2:45 PM
  • URL: https://guides.library.ucmo.edu/peerreview

Berry Header Logo

Animal Science

How to identify peer reviewed journals, how to identify primary research articles.

  • Reference Sources
  • Key Journals
  • Writing & Citing
  • Self Checkout
  • Anatomy Study Resources
  • Peer Reviewed Journals Quiz How do I know if a journal is peer reviewed? What is peer review, anyway? Take this short quiz to test your knowledge and perhaps learn something new!
  • Primary Research Articles Quiz How do I know if an article is a primary or secondary research article? Are there search techniques that will help me find them? Take this short quiz to test your knowledge and perhaps learn something new!

You must get all answers correct to submit the quiz!

Peer review is defined as “a process of subjecting an author’s scholarly work, research or ideas to the scrutiny of others who are experts in the same field” ( 1 ). Peer review is intended to serve two purposes:

  • It acts as a filter to ensure that only high quality research is published, especially in reputable journals, by determining the validity, significance and originality of the study.
  • Peer review is intended to improve the quality of manuscripts that are deemed suitable for publication. Peer reviewers provide suggestions to authors on how to improve the quality of their manuscripts, and also identify any errors that need correcting before publication.

How do you determine whether an article qualifies as being a peer-reviewed journal article?

  • If you're searching for articles in certain databases, you can limit your search to peer-reviewed sources simply by selecting a tab or checking a box on the search screen.
  • If you have an article, an indication that it has been through the peer review process will be the publication history , usually at the beginning or end of the article.
  • If you're looking at the journal itself, go to the  editorial statement or instructions to authors  (usually in the first few pages of the journal or at the end) for references to the peer-review process.
  • Lookup the journal by title or ISSN in the ProQuest Source Evaluation Aid . 
  • Careful! Not all information in a peer-reviewed journal is actually reviewed. Editorials, letters to the editor, book reviews, and other types of information don't count as articles, and may not be accepted by your professor.

What about preprint sites and ResearchGate?

  • A preprint is a piece of research that has not yet been peer reviewed and published in a journal. In most cases, they can be considered final drafts or working papers. Preprint sites are great sources of current research - and most preprint sites will provide a link to a later, peer-reviewed version of an article. 
  • ResearchGate is a commercial social networking site for scientists and researchers to share papers, ask and answer questions, and find collaborators. Members can upload research output including papers, chapters, negative results, patents, research proposals, methods, presentations, etc. Researchers can access these materials, and also contact members to ask for access to material that has not been shared, usually because of copyright restrictions. There is a filter to limit results to articles, but it can be difficult to determine the publication history of ResearchGate items and whether they have been published in peer reviewed sources.

A primary research article reports on an empirical research study conducted by the authors. The goal of a primary research article is to present the result of original research that makes a new contribution to the body of knowledge. 

Characteristics:

  • Almost always published in a peer-reviewed journal
  • Asks a research question or states a hypothesis or hypotheses
  • Identifies a research population
  • Describes a specific research method
  • Tests or measures something
  • Often (but not always) structured in a standard format called IMRAD: Introduction, Methods, Results, and Discussion
  • Words to look for as clues include: analysis, study, investigation, examination, experiment, numbers of people or objects analyzed, content analysis, or surveys.

To contrast, the following are not primary research articles (i.e., they are secondary sources):

  • Literature reviews/Review articles
  • Meta-Analyses (studies that arrive at conclusions based on research from many other studies)
  • Editorials & Letters
  • Dissertations

Articles that are NOT primary research articles may discuss the same research, but they are not reporting on original research, they are summarizing and commenting on research conducted and published by someone else. For example, a literature review provides commentary and analysis of research done by other people, but it does not report the results of the author's own study and is not primary research.

  • << Previous: Home
  • Next: Reference Sources >>
  • Last Updated: Aug 24, 2023 2:38 PM
  • URL: https://libguides.berry.edu/ans

St George's University of London Logo

Understanding research and critical appraisal

  • Introduction
  • Secondary research

What is primary research?

Quantitative research study designs, qualitative research study designs, mixed methods research study designs.

  • Critical appraisal of research papers
  • Useful terminology
  • Further reading and helpful resources

Primary research articles provide a report of individual, original research studies, which constitute the majority of articles published in peer-reviewed journals. All primary research studies are conducted according to a specified methodology, which will be partly determined by the aims and objectives of the research.

The following sections offer brief summaries of some of the common quantitative, qualitative, and mixed-methods study designs you may encounter. 

Randomised Controlled Trial

A randomised controlled trial (RCT) is a study where participants are randomly allocated to two or more groups. One group receives the treatment that is being tested by the study (treatment or experimental group), and the other group(s) receive an alternative, which is often the current standard treatment or a placebo (control or comparison group). The nature of the control used should always be specified.

An RCT is a good study choice for determining the effectiveness of an intervention or treatment, or for comparing the relative effectiveness of different interventions or treatments. If well implemented, the randomisation of participants in RCTs should ensure that the groups differ only in their exposure to treatment, and that differences in outcomes between the groups are probably attributable to the treatment being studied.

In crossover randomised controlled trials, participants receive all of the treatments and controls being tested in a random order. This means that participants receive one treatment, the effect of which is measured, and then "cross over" into the other treatment group, where the effect of the second treatment (or control) is measured.

RCTs are generally considered to be the most rigorous experimental study design, as the randomisation of participants helps to minimise confounding and other sources of bias.

Cohort study

A cohort study identifies a group of people and follows them over a period of time to see who develops the outcome of interest to the study. This type of study is normally used to look at the effect of suspected risk factors that cannot be controlled experimentally – for example, the effect of smoking on lung cancer.

Also sometimes called longitudinal studies, cohort studies can be either prospective, that is, exposure factors are identified at the beginning of a study and the study population is followed into the future, or retrospective, that is, medical records for the study population are used to identify past exposure factors.

Cohort studies are useful in answering questions about disease causation or progression, or studying the effects of harmful exposures.

Cohort studies are generally considered to be the most reliable observational study design. They are not as reliable as RCTs, as the study groups may differ in ways other than the variable being studied.

Other problems with cohort studies are that they require a large sample size, are inefficient for rare outcomes, and can take long periods of time.

Case-Control Study

A case-control study compares a group of people with a disease or condition, against a control population without the disease or condition, in order to investigate the causes of particular outcomes. The study looks back at the two groups over time to see which risk factors for the disease or condition they have been exposed to.

Case-control studies can be useful in identifying which risk factors may predict a disease, or how a disease progresses over time. They can be especially useful for investigating the causes of rare outcomes.

Case-control studies can be done quickly, and do not require large groups of subjects. However, their reliance on retrospective data which may be incomplete or unreliable (owing to subject ability to accurately recall information such as the appearance of a symptom) can be a difficulty.

Cross-Sectional Study

A cross-sectional study collects data from the study population at one point in time, and considers the relationships between characteristics. Also  sometimes called surveys or prevalence studies.

Cross-sectional studies are generally used to study the prevalence of a risk factor, disease or outcome in a chosen population.

Because cross-sectional studies do not look at trends or changes over time, they cannot establish cause and effect between exposures and outcomes.

Case Series / Case Reports

A case series is a descriptive study of a group of people, who have either received the same treatment or have the same disease, in order to identify characteristics or outcomes in a particular group of people.

Case series are useful for studying rare diseases or adverse outcomes, for illustrating particular aspects of a condition, identifying treatment approaches, and for generating hypotheses for further study.

A case report provides a study of an individual, rather than a group.

Case series and case reports have no comparative control groups, and are prone to bias and chance association.

Expert opinion

Expert opinion draws upon the clinical experience and recommendations of those with established expertise on a topic.

Grounded theory

Grounded theory studies aim to generate theory in order to explain social processes, interactions or issues. This explanatory theory is grounded in, and generated from, the research participant data collected.

Research data typically takes the form of interviews, observations or documents. Data is analysed as it is collected, and is coded and organised into categories which inform the further collection of data, and the construction of theory. This cycle helps to refine the theory, which evolves as more data is gathered.

Phenomenology

A phenomenological study aims to describe the meaning(s) of the lived experience of a phenomenon. Research participants will have some common experience of the phenomenon under examination, but will differ in their precise individual experience, and in other personal or social characteristics.

Research data is typically in the form of observations, interviews or written records, and its analysis sets out to identify common themes in the participants' experience, while also highlighting variations and unique themes.

Ethnography

Ethnography is the study of a specific culture or cultural group, where the researcher seeks an insider perspective by placing themselves as a participant observer within the group under study.

Data is typically formed of observations, interviews and conversation. Ethnography aims to offer direct insight into the lives and the experiences of the group or the culture under study, examining its beliefs, values, practices and behaviours.

A case study offers a detailed description of the experience of an individual, a family, a community or an organisation, often with the aim of highlighting a particular issue. Research data may include documents, interviews and observations.

Content analysis

Content analysis is used to explore the occurrence, meanings and relationships of words, themes or concepts within a set of textual data. Research data might be drawn from any type of written document(s). Data is coded and categorised, with the aim of revealing and examining the patterns and the intentions of language use within the data set.

Narrative inquiry

A narrative inquiry offers in depth detail of a situation or experience from the perspective of an individual or small groups. Research data usually consists of interviews or recordings, which is presented as a structured, chronological narrative. Narrative inquiry studies often seek to give voice to individuals or populations whose perspective is less well established, or not commonly sought.

Action research

Action research is a form of research, commonly used with groups, where the participants take a more active, collaborative role in producing the research. Studies incorporate the lived experiences of the individuals, groups or communities under study, drawing on data which might include observation, interviews, questionnaires or workshops.

Action research is generally aimed at changing or improving a particular context, or a specific practice, alongside the generation of theory.

Explanatory sequential design

In an explanatory sequential study, emphasis is given to the collection and analysis of quantitative data, which occurs during the first phase of the study. The results of this quantitative phase inform the subsequent collection of qualitative data in the next phase.

Analysis of the resultant qualitative data is then used to 'explain' the quantitative results, usually serving to contextualise these, or to otherwise enhance or enrich the initial findings.

Exploratory sequential design

In an exploratory sequential study, the opposite sequence to that outlined above is used. In this case, qualitative data is emphasised, with this being collected and analysed during the first phase of the study. The results of this qualitative phase inform the subsequent collection of quantitative data in the next phase.

The quantitative data can then be used to define or to generalise the qualitative results, or to test these results on the basis of theory emerging from the initial findings.

Convergent design

In a convergent study, qualitative and quantitative data sets are collected and analysed simultaneously and independently of one another.

Results from analysis of both sets of data are brought together to provide one overall interpretation; this combination of data types can be handled in various ways, but the objective is always to provide a fuller understanding of the phenomena under study. Equal emphasis is given to both qualitative and quantitative data in a convergent study.

  • << Previous: Secondary research
  • Next: Critical appraisal of research papers >>
  • Last Updated: Mar 26, 2024 4:38 PM
  • URL: https://libguides.sgul.ac.uk/researchdesign

Banner

  • William & Mary Libraries
  • Research Guides
  • Using the Library

Science Writing

  • Primary Research Articles
  • Review Articles
  • Citing in the Sciences

What is a primary research article? 

If you're writing an empirical article (also known as a primary research article) then you're doing original, typically experimental, research -- you are creating new knowledge and will have original findings. These primary research articles will always have a methodology section where you describe how you conducted your study. It will typically be structured like this: 

  • Introduction

Methodology

How to Write a Primary Research Article

The introduction will include: 

  • A review of the literature (background on your topic & what other research has been done)
  • The question this study will be answering, and why it's important
  • Your approach to answering the question, and your hypothesis

Things to avoid: 

  • Excessive length
  • Leaving out the justification for the study

The methods section is where you detail the materials and experimental approaches that you used in your study. It should be detailed, particularly if the method you're using is novel. A general guideline is that you want to include enough detail so that other researchers could replicate your experiment. When writing it, you should arrange everything chronologically and can use subsections where appropriate. 

  • Switching tenses (it should be in past tense)
  • Insufficient detail
  • Omitting the purpose of the experiment

The results section will include data and your interpretation of the data (but it won't tie it in to the overall literature or bigger implications -- that's what the discussion section is for). You should include your main findings, any other important findings, and your control results. Most data should be in figures or tables, with the text being used to summarize and explain the data. The results section should be organized in a logical way -- for instance, from most important to least important findings. 

Things to Avoid: 

  • Inexact language ("significance" means something very particular in science)
  • Including irrelevant data 
  • Excessive detail (don't include results from anything not discussed in the methods section)

The discussion section will answer your research question by stating and interpreting your findings, including their relevance, meaning, and context. You should tie in elements from your earlier literature review to explain what is new and impactful about your work. The discussion section is also where you can talk about possible limitations of your study and suggest future work that can be done. It should be organized in a way that moves from specific to broad, introducing your particular findings first and then moving to a more general perspective. 

  • Restating your results section
  • Making conclusions outside of the scope of your findings 
  • Criticizing other studies

The conclusion may be the last paragraph of the discussion section, or it can be pulled out into its own section. Either way it should be about a paragraph in length and should recap the most important results and significance of your findings. 

  • Introducing large ideas not already covered in the paper 
  • Excessive length -- conclusion should be brief 

Examples of Primary Research Articles

  • Experimental Exposure to Urban and Pink Noise Affects Brain Development and Song Learning in Zebra Finches
  • Effect of an Enteric-Coated Fish Oil Preparation on Relapses in Crohn's Disease
  • The Effect of Intrinsic Crumpling on the Mechanics of Free-Standing Graphene

Instruction & Research Librarian

Profile Photo

  • << Previous: Review Articles
  • Next: Citing in the Sciences >>

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • v.43(5); 2014 Sep

Logo of ambio

Maximizing Legacy and Impact of Primary Research: A Call for Better Reporting of Results

Neal r. haddaway.

Centre for Evidence-Based Conservation, School of Environment, Natural Resources and Geography, Bangor University, Bangor, LL57 2UW UK

Much of the scientific literature in existence today is based on model systems and case studies, which help to split research into manageable blocks. The impact of this research can be greatly increased in meta-analyses that combine individual studies published over time to identify patterns across studies; patterns that may go undetected by smaller studies and that may not be the main subject of investigation. However, many potentially useful studies fail to provide sufficient data (typically means, true sample sizes, and measures of variability) to permit meta-analysis. Authors of primary research studies should provide these summary statistics as a minimum, and editors should require them to do so. By putting policies in place that require these summary statistics to be included, or even those that require raw data, editors and authors can maximize the legacy and impact of the research they publish beyond that of their initial target audience.

Introduction

Some 8323 scientific journals were listed in Journal Citation Reports in 2013, with tens of thousands more journals unlisted. The vast majority of these journals have been given impact factors in the lower end of the spectrum, giving a classic Poisson distribution with a median of approximately 0.5 and 1.0 (Thomson Reuters 2013 ). Thus, the majority of journals are typically more applied (i.e., focused on more practical subjects) than their counterparts at the far end of the spectrum, publishing research that targets specific audiences. Much of the research in these publications uses model species and habitats or case studies to simplify more complex systems (e.g., Rantalainen et al. 2008 ). While these studies are often quite specific, they can inform wider analyses if, for example, used in a meta-analysis and systematic reviews (SRs) (Pullin and Knight 2001 ).

Meta-analyses are statistical methods that combine like studies to create a single study of far greater effective sample size than any of its constituent parts (Glass 1976 ). These analyses are used where individual studies disagree, or where individual studies are thought to be of insufficient power to identify significant effects. Meta-analyses are powerful tools to increase the value and impact of research. Meta-analysis has been widely used in recent decades in medicine to identify significant patterns in the evidence that may go undetected in individual studies (O’Rourke 2007 ). Analyzed together, the evidence provided by individual studies is more powerful than the sum of its individual analyses. Furthermore, meta-analyses allow us to examine the effect of modifying factors that may not have been considered in the original research. For example, while individual studies on the effect of drainage on greenhouse gas emissions from peatlands may each have been undertaken in sites with a specific mean annual rainfall and temperature, when studies are combined in a meta-analysis the effect of meteorology on the relationship between land management and emissions can be examined (also referred to as sources of heterogeneity and effect modifiers ) (Haddaway et al. 2014 ).

Meta-analyses in the health sciences have identified significant positive effects of potentially life-saving therapies where individual studies have failed to find an effect. One example of the potential influence of meta-analyses on policy is demonstrated by the review of the use of streptokinase in the treatment of myocardial infarction (commonly known as a heart attack). A meta-analysis that arranged and analyzed studies cumulatively through time over a 30-year period identified a statistically significant reduction in mortality resulting from the therapeutic use of streptokinase following myocardial infarction. This significant effect was clear in the cumulative meta-analysis after only 14 years of research, but streptokinase was not widely recommended until more than a decade later when two large-scale trials (mega trials) identified a significant effect (Lau et al. 1992 ). This striking example demonstrates the potentially preventable loss of life that results from missing patterns in the evidence identified through pooling studies.

Meta-analyses in medicine, and more recently in environmental management and conservation (Gurevitch et al. 1992 ), have been developed even further by the establishment of systematic review methodology (Pullin and Stewart 2006 ; Higgins and Green 2011 ). Systematic reviews aim to identify all available evidence for a specific question using a detailed, pre-defined methodology. This methodology aims to minimize various biases, such as publication bias and selection bias that may affect traditional reviews.

The power and utility of meta-analyses, however, is reduced significantly when primary research does not report sufficient data to allow full quantitative analyses. These studies with missing data must be excluded from the analysis despite being relevant and providing some informative results. Broadly speaking, primary research articles should report three key measures to facilitate their inclusion in a meta-analysis: mean effect size , sample size , and measure of variability (typically standard deviation, standard error, or confidence intervals). Effect sizes are summary statistics that estimate the magnitude of effect of a specific intervention (e.g., application of a pesticide) or exposure (e.g., soil water content). One form of effect sizes where studies report their results in the same units would be the raw mean difference, the control sample mean subtracted from the intervention sample mean, which represents the direct additional effect of the intervention in meaningful units. Other examples of effect sizes include correlation coefficients, risk ratios, and specific effect sizes designed for meta-analysis such as Hedges g . Different effect size types are suitable for different outcome measures and data types (Borenstein et al. 2011 ). Measures of variability indicate the uncertainty of effect size estimates and are used in meta-analyses to weight studies according to the variability in the data around the sample means, in order to give more weight to more precise studies. A range of possible variability measures can be used in meta-analyses as these are interchangeable. Sample sizes relate to the true sample size of the study and should not include pseudoreplicates. True replicates are those samples that are measured at the same level as that at which the intervention is experienced: if treatments are delivered at the field level, then replicates are fields and NOT plots within fields.

Where quantitative data for the key measures described above are not presented in the text or tables of relevant studies, this information can often be extracted from figures of summary metrics or raw data (e.g., Tummers 2006 ). In some cases other data can be included in a meta-analysis. For example, meta-analysis can be performed on p values (Fisher 1932 ), but such analyses do not consider the magnitude or the direction of effect, and cannot investigate sources of heterogeneity, so should be restricted to use when other options for meta-analysis are exhausted (Jones 1995 ).

Where data on key measures are missing from some studies, for example variability measures, it may be appropriate to impute these values (see Harris et al. 2009 ). Imputing involves replacing a missing value with an appropriate substitute. It enables the inclusion of studies that would otherwise be excluded due to the lack of reported data, and thus mitigating the potential impact this would have on the power and bias of the pooled effect (Wiebe et al. 2006 ; Burgess et al. 2013 ). This may be generated, for example for variability measures, in one of a number of ways: it may be based on an understanding of the population being studied; from a mean variability identified from other studies included in the meta-analysis; or from the largest variance reported in other included studies in order to be more conservative. One final option is to perform multiple imputation using several methods and substituting some form of average where the data are missing. Imputing is often appropriate in medicine, where meta-analyses involve large numbers of studies and imputing of a small number of studies’ variability is less influential on the overall analysis. Meta-analysis in environmental sciences, however, rarely involves large samples sizes, and large proportions of the evidence base may be missing data. Three recent systematic reviews highlight this problem. A recent systematic review of the impact of terrestrial protected areas on human well-being identified 281 outcome measures across 49 studies, but 82 percent of these studies reported measures with no variability (Pullin et al. 2013 ). Another review of the impact of land management on lowland peatland carbon greenhouse gas flux identified 33 of 111 studies that lacked measures of variability, precluding their inclusion in meta-analysis (Haddaway et al. 2014 ). In a systematic review of the impact of reindeer grazing on arctic and alpine vegetation, currently underway, 30 percent of the included articles were unable to be included in meta-analysis due to a lack of either variability (10 of 53 studies) or true sample size (6 of 53 studies) (Bernes et al. 2013 ). Despite the availability and use of imputing methods in the health care discipline, these are not always feasible in the environment setting, and therefore there are even more imperative primary studies to report the variance data. Studied human populations are typically far less variable and more predictable than the range of studied populations included in meta-analyses in the environmental sciences (Haddaway et al. 2014 ). As a result, imputing in environmental sciences meta-analyses is rarely likely to be appropriate.

One other solution to the problem of missing values is to contact the authors of relevant studies with a request for supplemental data. Such requests are more successful with recently published manuscripts (Vines et al. 2014 ), where email addresses are supplied and are still functional. Email requests for data in meta-analyses typically have low success rates (e.g., Gibson et al. 2006 ), with only a small minority of contacted authors responding with usable data. The process should be encouraged where resources allow, since the increase in usable data is often valuable. For older research, however, such contact is often not feasible. This latter point raises concerns about a possible bias resulting from the presence of more usable data in meta-analyses from more recent research. Such bias should not be ignored, but little can be done to account for it.

In systematic reviews, study results can often still be synthesized narratively in the form of textual descriptions, tabulation, and the production of figures despite being lost from meta-analyses. However, such narrative syntheses are not as powerful as meta-analyses, which should be the main aim of a quantitative (aggregative) systematic review. Furthermore, if some studies are missing effect sizes and statistical results, little use can be made of their results.

Those with experience in meta-analysis and systematic review understand the value of well-reported summary data in primary research articles, and failing this, the provision of raw data. To ensure the legacy of primary research and maximize its value, however, it should be the priority of journal editors and manuscript authors to ensure that all primary researches report quantitative data either in summary or raw form. Summary data should be provided with measures of variability to ensure that it can be included in meta-analyses. Maximizing the use of existing evidence in meta-analyses may also potentially conserve resources that would otherwise be used for additional primary research, where answers already exist in the literature. This policy follows the recommendations made in the CONSORT Statement (BMJ 2010a , b ) in medicine that call for better reporting of clinical trials.

Some journals have recently begun to demand the publication of raw data alongside manuscripts. The Public Library of Sciences (PLoS), for example, amended their data policy in December 2013 to state that “PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction.” Such a policy is a bold move in a competitive publishing market; the majority of other journals, particularly those that are not fully turning to Open Access, may find such a move difficult to implement. Summary data for treatment and control groups in the form of means, sample sizes, and variability measures are a far simpler, yet just as effective, requisite that will maximize the legacy and usability of primary research.

Systematic reviews and meta-analyses include research from a range of time periods, not solely more recent publications. As the publishing world advances and reporting of raw and summary data improve, the historic research that lacks sufficient data to permit meta-analysis could be made useful with the establishment of a universal database for the deposition of raw and summary data. Such a database could mirror the advances in independent post-publication peer review such as www.PubPeer.com . This project would require a significant effort to establish, maintain, and advertise.

Acknowledgments

I thank Claes Bernes, Ruth Lewis, and two anonymous reviewers for their comments on a draft version of the manuscript.

is a Postdoctoral Research Officer at the Centre for Evidence-Based Conservation, School of Environment, Natural Resources and Geography, Bangor University, Bangor, LL57 2UW, UK.

  • Bernes C, Bråthen KA, Forbes BC, Hofgaard A, Moen J, Speed JD. What are the impacts of reindeer/caribou ( Rangifer tarandus L.) on arctic and alpine vegetation? A systematic review protocol. Environmental Evidence. 2013; 2 :6. doi: 10.1186/2047-2382-2-6. [ CrossRef ] [ Google Scholar ]
  • BMJ CONSORT 2010 statement: Updated guidelines for reporting parallel group randomised trials. BMJ. 2010; 340 :332. doi: 10.1136/bmj.c332. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • BMJ The new CONSORT statement. BMJ. 2010; 340 :c1432. doi: 10.1136/bmj.c1432. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Borenstein M, Hedges LV, Higgins JP, Rothstein HR. Introduction to meta-analysis. Chichester: Wiley; 2011. [ Google Scholar ]
  • Burgess S, White IR, Resch-Rigon M, Wood AM. Combining multiple imputation and meta-analysis with individual participant data. Statistics in Medicine. 2013; 32 (26):4499–4514. doi: 10.1002/sim.5844. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fisher RA. Statistical methods for research workers. London: Oliver & Boyd; 1932. [ Google Scholar ]
  • Gibson CA, Bailey BW, Carper MJ, LeCheminant JD, Kirk EP, Huang G, Drowatzky DuBose K, Donnelly JE. Author contacts for retrieval of data for a meta-analysis on exercise and diet restriction. International Journal of Technology Assessment in Health Care. 2006; 22 (02):267–270. doi: 10.1017/S0266462306051105. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Glass GV. Primary, secondary, and meta-analysis of research. Educational Researcher. 1976; 5 (10):3–8. doi: 10.3102/0013189X005010003. [ CrossRef ] [ Google Scholar ]
  • Gurevitch J, Morrow LL, Wallace A, Walsh JS. A meta-analysis of competition in field experiments. The American Naturalist. 1992; 140 (4):539–572. doi: 10.1086/285428. [ CrossRef ] [ Google Scholar ]
  • Haddaway NR, Burden A, Evans CD, Healey JR, Jones DL, Dalrymple SE, Pullin AS. Evaluating effects of land management on greenhouse gas fluxes and carbon balances in boreo-temperate lowland peatland systems. Environmental Evidence. 2014; 3 :5. doi: 10.1186/2047-2382-3-5. [ CrossRef ] [ Google Scholar ]
  • Harris C, Hedges LV, Valentine JC. Handbook of research synthesis and meta-analysis. New York: Russell Sage Foundation; 2009. [ Google Scholar ]
  • Higgins, J.P.T., and S. Green. 2011. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0. Updated March 2011. Retrieved March 1, 2014, from www.cochrane-handbook.org .
  • Jones D. Meta-analysis: weighing the evidence. Statistics in Medicine. 1995; 14 :137–149. doi: 10.1002/sim.4780140206. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lau J, Antman EM, Jimenez-Silva J, Kupelnick B, Mosteller F, Chalmers TC. Cumulative meta-analysis of therapeutic trials for myocardial infarction. New England Journal of Medicine. 1992; 327 (4):248–254. doi: 10.1056/NEJM199207233270406. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • O’Rourke K. An historical perspective on meta-analysis: dealing quantitatively with varying study results. Journal of the Royal Society of Medicine. 2007; 100 (12):579–582. doi: 10.1258/jrsm.100.12.579. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pullin AS, Knight TM. Effectiveness in conservation practice: Pointers from medicine and public health. Conservation Biology. 2001; 15 (1):50–54. doi: 10.1046/j.1523-1739.2001.99499.x. [ CrossRef ] [ Google Scholar ]
  • Pullin AS, Stewart GB. Guidelines for systematic review in conservation and environmental management. Conservation Biology. 2006; 20 (6):1647–1656. doi: 10.1111/j.1523-1739.2006.00485.x. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pullin AS, Bangpan M, Dalrymple S, Dickson K, Haddaway NR, Healey JR, Hauari H, Hockley N, et al. Human well-being impacts of terrestrial protected areas. Environmental Evidence. 2013; 2 :19. doi: 10.1186/2047-2382-2-19. [ CrossRef ] [ Google Scholar ]
  • Rantalainen ML, Haimi J, Fritze H, Pennanen T, Setala H. Soil decomposer community as a model system in studying the effects of habitat fragmentation and habitat corridors. Soil Biology & Biochemistry. 2008; 40 (4):853–863. doi: 10.1016/j.soilbio.2007.11.008. [ CrossRef ] [ Google Scholar ]
  • Thomson Reuters. 2013. 2012 Journal Citation Reports ® Science Edition. Accessed October, 2013.
  • Tummers, B. 2006. DataThief III. Retrieved March 1, 2014, from http://datathief.org .
  • Vines TH, Albert AYK, Andrew RL, Débarre F, Bock DG, Franklin MT, Gilbert KJ, Moore J-S, et al. The availability of research data declines rapidly with article age. Current Biology. 2014; 24 :94–97. doi: 10.1016/j.cub.2013.11.014. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wiebe N, Vandermeer B, Platt RW, Klassen TP, Moher D, Barrowman NJ. A systematic review identifies a lack of standardization in methods for handling missing variance data. Journal of Clinical Epidemiology. 2006; 59 :342–353. doi: 10.1016/j.jclinepi.2005.08.017. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Harvard Library
  • Research Guides
  • Faculty of Arts & Sciences Libraries
  • Identifying Articles
  • PubMed at Harvard
  • Searching in PubMed
  • My NCBI in PubMed
  • Utilizing Search Results
  • Scenarios in PubMed

Primary Research Article

Review article.

Identifying and creating an APA style citation for your bibliography: 

  • Author initials are separated by a period
  • Multiple authors are separated by commas and an ampersand (&)  
  • Title format rules change depending on what is referenced
  • Double check them for accuracy 

primary research articles

Identifying and creating an APA style in-text citation: 

  • eg. (Smith, 2022) or (Smith & Stevens, 2022) 

The structure of this changes depending on whether a direct quote or parenthetical used:

Direct Quote: the citation must follow the quote directly and contain a page number after the date

eg. (Smith, 2022, p.21)

Parenthetical: the page number is not needed

For more information, take a look at Harvard Library's Citation Styles guide !

A primary research article typically contains the following section headings:

"Methods"/"Materials and Methods"/"Experimental Methods"(different journals title this section in different ways)

"Results"

"Discussion"

If you skim the article, you should find additional evidence that an experiment was conducted by the authors themselves.

Primary research articles provide a background on their subject by summarizing previously conducted research, this typically occurs only in the Introduction section of the article.

Review articles do not report new experiments. Rather, they attempt to provide a thorough review of a specific subject by assessing either all or the best available scholarly literature on that topic.

Ways to identify a review article: 

  • Author(s) summarize and analyze previously published research 
  • May focus on a specific research question, comparing and contrasting previously published research 
  • Overview all of the research on a particular topic 
  • Does not contain "methods" or "results" type sections
  • << Previous: Scenarios in PubMed
  • Last Updated: Oct 3, 2023 4:16 PM
  • URL: https://guides.library.harvard.edu/PubMed

Harvard University Digital Accessibility Policy

Identifying Primary and Secondary Research Articles

  • Primary and Secondary

Profile Photo

Primary Research Articles

Primary research articles report on a single study. In the health sciences, primary research articles generally describe the following aspects of the study:

  • The study's hypothesis or research question
  • Some articles will include information on how participants were recruited or identified, as well as additional information about participants' sex, age, or race/ethnicity
  • A "methods" or "methodology" section that describes how the study was performed and what the researchers did
  • Results and conclusion section

Secondary Research Articles

Review articles are the most common type of secondary research article in the health sciences. A review article is a summary of previously published research on a topic. Authors who are writing a review article will search databases for previously completed research and summarize or synthesize those articles,  as opposed to recruiting participants and performing a new research study.

Specific types of review articles include:

  • Systematic Reviews
  • Meta-Analysis
  • Narrative Reviews
  • Integrative Reviews
  • Literature Reviews

Review articles often report on the following:

  • The hypothesis, research question, or review topic
  • Databases searched-- authors should clearly describe where and how they searched for the research included in their reviews
  • Systematic Reviews and Meta-Analysis should provide detailed information on the databases searched and the search strategy the authors used.Selection criteria-- the researchers should describe how they decided which articles to include
  • A critical appraisal or evaluation of the quality of the articles included (most frequently included in systematic reviews and meta-analysis)
  • Discussion, results, and conclusions

Determining Primary versus Secondary Using the Database Abstract

Information found in PubMed, CINAHL, Scopus, and other databases can help you determine whether the article you're looking at is primary or secondary.

Primary research article abstract

  • Note that in the "Objectives" field, the authors describe their single, individual study.
  • In the materials and methods section, they describe the number of patients included in the study and how those patients were divided into groups.
  • These are all clues that help us determine this abstract is describing is a single, primary research article, as opposed to a literature review.
  • Primary Article Abstract

primary research articles

Secondary research/review article abstract

  • Note that the words "systematic review" and "meta-analysis" appear in the title of the article
  • The objectives field also includes the term "meta-analysis" (a common type of literature review in the health sciences)
  • The "Data Source" section includes a list of databases searched
  • The "Study Selection" section describes the selection criteria
  • These are all clues that help us determine that this abstract is describing a review article, as opposed to a single, primary research article.
  • Secondary Research Article

primary research articles

  • Primary vs. Secondary Worksheet

Full Text Challenge

Can you determine if the following articles are primary or secondary?

  • Last Updated: Feb 17, 2024 5:25 PM
  • URL: https://library.usfca.edu/primary-secondary

2130 Fulton Street San Francisco, CA 94117-1080 415-422-5555

  • Facebook (link is external)
  • Instagram (link is external)
  • Twitter (link is external)
  • YouTube (link is external)
  • Consumer Information
  • Privacy Statement
  • Web Accessibility

Copyright © 2022 University of San Francisco

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Primary Research | Definition, Types, & Examples

Primary Research | Definition, Types, & Examples

Published on 15 January 2023 by Tegan George .

Primary research is a research method that relies on direct data collection , rather than relying on data that’s already been collected by someone else. In other words, primary research is any type of research that you undertake yourself, firsthand, while using data that has already been collected is called secondary research.

Primary research is often used in qualitative research , particularly in survey methodology, questionnaires, focus groups, and various types of interviews . While quantitative primary research does exist, it’s not as common.

Table of contents

When to use primary research, types of primary research, examples of primary research, advantages and disadvantages of primary research, frequently asked questions.

Primary research is any research that you conduct yourself. It can be as simple as a 2-question survey, or as in-depth as a years-long longitudinal study . The only key is that data must be collected firsthand by you.

Primary research is often used to supplement or strengthen existing secondary research. It is usually exploratory in nature, concerned with examining a research question where no preexisting knowledge exists. It is also sometimes called original research for this reason.

Prevent plagiarism, run a free check.

Primary research can take many forms, but the most common types are:

  • Surveys and questionnaire
  • Observational studies
  • Interviews and focus groups
  • Surveys and questionnaires

Surveys and questionnaires collect information about a group of people by asking them questions and analyzing the results. They are a solid choice if your research topic seeks to investigate something about the characteristics, preferences, opinions, or beliefs of a group of people.

Surveys and questionnaires can take place online, in person, or through the mail. It is best to have a combination of open-ended and closed-ended questions, and how the questions are phrased matters. Be sure to avoid leading questions, and ask any related questions in groups, starting with the most basic ones first.

Observational studies are an easy and popular way to answer a research question based purely on what you, the researcher, observes. If there are practical or ethical concerns that prevent you from conducting a traditional experiment , observational studies are often a good stopgap.

There are three types of observational studies: cross-sectional studies , cohort studies, and case-control studies. If you decide to conduct observational research, you can choose the one that’s best for you. All three are quite straightforward and easy to design – just beware of confounding variables and observer bias creeping into your analysis.

Similarly to surveys and questionnaires, interviews and focus groups also rely on asking questions to collect information about a group of people. However, how this is done is slightly different. Instead of sending your questions out into the world, interviews and focus groups involve two or more people – one of whom is you, the interviewer, who asks the questions.

There are 3 main types of interviews:

  • Structured interviews ask predetermined questions in a predetermined order.
  • Unstructured interviews are more flexible and free-flowing, proceeding based on the interviewee’s previous answers.
  • Semi-structured interviews fall in between, asking a mix of predetermined questions and off-the-cuff questions.

While interviews are a rich source of information, they can also be deceptively challenging to do well. Be careful of interviewer bias creeping into your process. This is best mitigated by avoiding double-barreled questions and paying close attention to your tone and delivery while asking questions.

Alternatively, a focus group is a group interview, led by a moderator. Focus groups can provide more nuanced interactions than individual interviews, but their small sample size means that external validity is low.

Primary research can often be quite simple to pursue yourself. Here are a few examples of different research methods you can use to explore different topics.

Primary research is a great choice for many research projects, but it has distinct advantages and disadvantages.

Advantages of primary research

Advantages include:

  • The ability to conduct really tailored, thorough research, down to the ‘nitty-gritty’ of your topic . You decide what you want to study or observe and how to go about doing that.
  • You maintain control over the quality of the data collected, and can ensure firsthand that it is objective, reliable , and valid .
  • The ensuing results are yours, for you to disseminate as you see fit. You maintain proprietary control over what you find out, allowing you to share your findings with like-minded individuals or those conducting related research that interests you for replication or discussion purposes.

Disadvantages of primary research

Disadvantages include:

  • In order to be done well, primary research can be very expensive and time consuming. If you are constrained in terms of time or funding, it can be very difficult to conduct your own high-quality primary research.
  • Primary research is often insufficient as a standalone research method, requiring secondary research to bolster it.
  • Primary research can be prone to various types of research bias . Bias can manifest on the part of the researcher as observer bias , Pygmalion effect , or demand characteristics . It can occur on the part of participants as a Hawthorne effect or social desirability bias .

The 3 main types of primary research are:

Exploratory research explores the main aspects of a new or barely researched question.

Explanatory research explains the causes and effects of an already widely researched question.

There are several methods you can use to decrease the impact of confounding variables on your research: restriction, matching, statistical control, and randomisation.

In restriction , you restrict your sample by only including certain subjects that have the same values of potential confounding variables.

In matching , you match each of the subjects in your treatment group with a counterpart in the comparison group. The matched subjects have the same values on any potential confounding variables, and only differ in the independent variable .

In statistical control , you include potential confounders as variables in your regression .

In randomisation , you randomly assign the treatment (or independent variable) in your study to a sufficiently large number of subjects, which allows you to control for all potential confounding variables.

A questionnaire is a data collection tool or instrument, while a survey is an overarching research method that involves collecting and analysing data from people using questionnaires.

When conducting research, collecting original data has significant advantages:

  • You can tailor data collection to your specific research aims (e.g., understanding the needs of your consumers or user testing your website).
  • You can control and standardise the process for high reliability and validity (e.g., choosing appropriate measurements and sampling methods ).

However, there are also some drawbacks: data collection can be time-consuming, labour-intensive, and expensive. In some cases, it’s more efficient to use secondary data that has already been collected by someone else, but the data might be less reliable.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

George, T. (2023, January 15). Primary Research | Definition, Types, & Examples. Scribbr. Retrieved 3 September 2024, from https://www.scribbr.co.uk/research-methods/primary-research-explained/

Is this article helpful?

Tegan George

Tegan George

Other students also liked, data collection methods | step-by-step guide & examples, observer bias | definition, examples, prevention, what is qualitative research | methods & examples.

primary research articles

Primary Research: Methods and Best Practices

primary research articles

Introduction

What is the definition of primary research, what are examples of primary research, primary vs. secondary research, types of primary research, when to use primary research.

Conducting research involves two types of data: primary data and secondary data . While secondary research deals with existing data, primary research collects new data . Ultimately, the most appropriate type of research depends on which method is best suited to your research question .

While this article discusses the difference between primary and secondary research, the main focus is on primary research, the types of data collected through primary research, and considerations for researchers who conduct primary research.

primary research articles

Simply put, researchers conduct primary research to gather new information. When existing data cannot address the research inquiry at hand, the researcher usually needs to collect new data to meet their research objectives.

How do you identify primary research?

Primary research uses collected data that hasn't been previously documented. Primary research typically means collecting data straight from the source (e.g., interviewing a research participant , observing a cultural practice or phenomenon firsthand).

Note that other divides that you should also consider include that of collecting quantitative or qualitative data , and of conducting basic or applied research . Each of these dimensions informs and is informed by your research inquiry.

What are the advantages of primary research?

New data, particularly that which addresses a research gap, can contribute to a novel inquiry and prove compelling to the research audience. When a researcher conducts a literature review and generates a problem statement for their research, they can identify what new data needs to be collected and what primary research method can be used to collect it.

Primary research studies ultimately contribute to theoretical developments and novel insights that an analysis of existing data might not have identified. Research publications in some fields may place a premium on primary research for its potential to generate new scientific knowledge as a result.

What are the disadvantages of primary research?

Primary research is time-consuming and potentially expensive to conduct, considering the equipment and resources needed to collect new data as well as the time required to engage with the field and collect data.

Moreover, primary research relies on new data that has yet to be documented elsewhere, meaning that the research audience is less familiar with the primary data being presented. This might raise issues of transparency and research rigor (e.g., how does the audience know that the data they are shown is trustworthy?).

primary research articles

Primary research is common in various fields of research. Let's look at some typical examples of primary research in three different areas.

Education research

Teaching and learning is a field that relies on evidence-based data to make policy recommendations affecting teachers, learning materials, and even classroom requirements. As a result, there are countless methods for collecting relevant data on the various aspects of education.

Observations , interviews , and assessments are just some of the primary research methods that are employed when studying education contexts. Education research acknowledges the full variety of situated differences found in the diversity of learners and their schooling contexts. This makes collecting data that is relevant to the given context and research inquiry crucial to understanding teaching and learning.

primary research articles

Market research

Businesses often rely on primary research to understand the target market for their products and services. Since competing businesses tend not to share research on customer insights with each other, primary research collecting original data can be a necessity.

Focus groups , surveys , and user research are typical research tools employed by businesses. Within market research, the goal is typically to understand customers' preferences and use cases for specific products and services.

primary research articles

Cultural studies

Fields such as anthropology and sociology count on primary research for understanding cultures and communities. Ethnographic research acknowledges that thick description of cultures and phenomena is more meaningful than only generating universal theories, making the collection of primary data essential to understanding the full diversity of the social world.

Researchers examining culture often collect data through interviews, observations, and photovoice, among other research methods. These methods look at the social world through the eyes of the research participants to generate an immersive view of cultures and groups with which audiences may not be familiar.

primary research articles

Insights from data are at your fingertips with ATLAS.ti

See how our powerful data analysis interface can help you make sense of data. Start with a free trial.

Primary research data stands in contrast to secondary research data, which is any data that has been previously collected and documented. In some situations, existing data may be abundant and available, making secondary research a more feasible approach to generating theory and identifying key insights.

Secondary research methods are employed in all fields of research. Market researchers conduct secondary research when there is already existing data about a target market. In particular, secondary market research might look at previous trends in the popularity of products to make predictions about the demand for new products.

Scholarly researchers can use secondary sources such as corpora, news articles, and online videos to make assertions about language and culture. Analytical approaches such as discourse analysis and content analysis can be well suited to analyzing data collected through secondary research methods.

Ultimately, primary and secondary research go hand in hand. The main function of research in building knowledge does not necessarily depend on the use of primary data collection . Rather, it is a matter of whether data needs to be collected in order to address your research inquiry, or relevant data already exists and you can access it.

There are many research methods used to collect data for primary research. The research method that works best for you depends on what you are looking to do with your research project.

This section lists some of the common primary data collection methods that researchers rely on.

One-on-one interviews are useful for capturing perspectives from research participants. Direct interactions can tell researchers what perspectives their research participants have and the thinking behind those perspectives.

Interview research is a complex and detailed methodology that includes several types of interviews to suit various research inquiries. Researchers can choose between structured interviews , semi-structured interviews , and unstructured interviews , depending on the nature of interaction they are looking to establish.

primary research articles

Focus groups

Focus groups are discussions that involve multiple research participants and are led by a moderator. Similar to interviews, the primary goal is to gather information about people's perspectives. Yet focus groups are distinct, because they can capture how people interact and build meaning when discussing a particular topic.

Market researchers may consider conducting a focus group discussion when they want to know more about how a particular group feels about a product or service. Researchers in linguistics and anthropology might be interested in observing how a group of people construct meaning with each other.

primary research articles

Observations

In research involving naturalistic inquiry and the social world, the researcher can gather information directly from the field through observational research methods . Primary data takes the form of field notes , audio and video recordings , their resulting transcripts , and even images of objects of interest.

For quantitative research inquiries, observation entails measuring the amount of activity or the frequency of particular phenomena. Qualitative observations look for patterns in cultural or social practices and document significant events in the field.

primary research articles

When the objective is to capture perspectives from large numbers of people, surveys are a good research method for collecting novel data. In-person questionnaires and online surveys can be used to quickly collect data at scale.

Surveys are used for conducting primary research in both quantitative and qualitative research . The structure of survey questions provide data that can be measured quantitatively, while open-ended survey responses require qualitative data analysis .

primary research articles

Experiments

While the above methods emphasize or are involved with naturalistic inquiry, experiments are a different form of primary research that is far more controlled. When you want to understand the relationship between various elements in a certain context (e.g., the effect of water and fertilizer on plant growth), a controlled experiment is a typical research approach to empirically establish scientific knowledge.

Experiments focus on a specific set of factors from the research phenomenon to understand causal relationships between variables. Experiments are a common primary research method in physical sciences, but they are also extensively used in psychology, education, and political science, among other areas.

primary research articles

The decision to conduct a primary or secondary study is a question of whether existing data is sufficient to satisfy the research inquiry at hand. Where data does not exist, primary research should be conducted.

Consider an example research study regarding ideal teaching methods in elementary school contexts in a developing country in Asia. Just because there is abundant data on the same topic in elementary schools in Western countries does not preclude the possibility of novel theoretical developments in schools in Asia. This becomes particularly important if insights based on existing data from other contexts may not be applicable to the present context.

Note that this does not mean that a secondary research study is any less novel than a primary study. Indeed, many fields and methodologies rely extensively on analyzing existing data. For example, studies that employ discourse analysis and content analysis typically (though not always) rely on existing sources of data to facilitate understanding of language use in real-world situations.

As a result, the choice between primary and secondary research can be seen as more of a practical consideration than a matter of a study's potential contribution to scientific knowledge. Novelty in research is as much about the data collection as it is about the resulting analysis. If you require data for your study where none exists, then data from primary research is your best option.

Powerful data analysis at your fingertips with ATLAS.ti

Download a free trial to start making the most of your qualitative data.

primary research articles

Survey Software & Market Research Solutions - Sawtooth Software

  • Technical Support
  • Technical Papers
  • Knowledge Base
  • Question Library

Call our friendly, no-pressure support team.

What is Primary Research? Definitions, Methods, Sources, Examples, and More

A man and a woman in an interview. Representing primary research.

Table of Contents

What is Primary Research? Primary Research Meaning

Primary research is a cornerstone of insightful, accurate, and effective decision-making in both academic and professional settings. At its core, primary research refers to the process of collecting data directly from sources rather than relying on previously gathered information, distinguishing it clearly from secondary research . 

The process of directly controlling the collection of the data is pivotal for ensuring the accuracy and relevance of the analysis, enabling researchers to tailor their investigations to specific research questions or problems put to them by stakeholders. This direct involvement contrasts with primary vs secondary research , where the latter often involves analyzing existing data.

Primary research serves as a vital component when seeking answers to your business objectives, filling gaps in existing knowledge and providing new data for analysis. Particularly coming into play when solving research problems with a high degree of specificity and relevance. 

By leveraging primary research, professionals can uncover unique insights, highly specific to their intended target market, tailored to their industry and unique to their product of interest. This level of tailoring is simply not possible through the use of secondary research.

When to Use Primary Research

Selecting the appropriate research method is a critical decision that hinges on the objectives of the study. Primary research is particularly beneficial in scenarios where recent, specific data is required to address a unique problem or question. This method is ideal for:

Academic Research

In the realm of academic research, primary research is indispensable when fresh insights or novel data are necessary to advance knowledge or contribute to scholarly debates. This type of research is crucial for:

  • Exploring New Theories or Models : When a researcher aims to develop or validate a new theory, firsthand data collection is essential. For instance, a psychologist conducting experiments to test a new cognitive behavioral therapy model for anxiety would rely on primary research to gather data on the therapy's effectiveness directly from participants.
  • Filling Knowledge Gaps : Primary research helps fill gaps in existing literature. If a historian is studying a less-documented cultural practice, interviews and observational studies can provide new insights that no secondary sources could offer.
  • Improving Research Rigor : Utilizing primary data enhances the rigor of academic studies. By collecting and analyzing original data, researchers can draw conclusions with greater validity, offering substantial contributions to their fields.

Market Research

Market research utilizes primary research extensively to understand consumer behaviors, preferences, and trends. This method is particularly advantageous for:

  • Product Development : Before launching a new product, companies often use surveys and focus groups to gather consumer feedback on the product concept, design, and functionality. For example, a beverage company considering a new flavor profile might conduct taste tests and focus groups to refine the product based on direct consumer feedback.
  • Customer Satisfaction : To assess and enhance customer satisfaction, businesses frequently employ primary research methods such as customer satisfaction surveys and in-depth interviews. This allows companies to receive real-time feedback and quickly implement changes to improve customer service.
  • Segmentation and Targeting : Through interviews and surveys, companies can identify customer segments and understand their specific needs and preferences. This segmentation enables more effective targeting of marketing efforts and product customization.

Get Started with Market Research Today!

Ready for your next market research study? Get access to our free survey research tool. In just a few minutes, you can create powerful surveys with our easy-to-use interface.

Start Market Research for Free or Request a Product Tour

Policy Formulation

Primary research is critical in policy formulation, particularly when policies need to be based on up-to-date and specific data regarding public opinion, needs, and conditions. Primary research methods such as public opinion polls and field observations are commonly used:

  • Understanding Public Needs : Governments and organizations use primary research to gauge public opinion on various issues, from healthcare to urban development. For instance, before implementing a new public transport policy, a city council might conduct surveys to understand residents' preferences and concerns regarding transit options.
  • Evaluating Policy Impact : After a policy is implemented, primary research is used to evaluate its effectiveness. This could involve collecting data on user satisfaction, policy usage, and public perception through direct feedback mechanisms like online polls or public forums.
  • Refining Policies : Continuous primary research is necessary to refine and adjust policies based on direct stakeholder feedback. This dynamic approach ensures that policies remain relevant and effective over time.

In each of these contexts, primary research not only provides the specificity needed for tailored insights but also offers the flexibility to adapt to emerging data and trends, thereby enhancing the overall impact and effectiveness of the research efforts.

Types of Primary Research Methods with Examples

Primary research methods are diverse, each tailored to fit specific study objectives and research environments. These methods enable researchers to gather fresh, firsthand data directly related to their study's focus.

Surveys are structured questionnaires designed to collect data from a target audience. They are used widely due to their versatility in capturing a broad spectrum of information, ranging from customer preferences to behavioral patterns. Surveys can be administered online, in person, or via phone, making them adaptable to various research needs. For instance, a company aiming to gauge customer satisfaction might deploy an online survey to understand the factors influencing their product's user experience. This method allows for quick data collection from a large audience, providing valuable insights into customer sentiment. The volume of respondent data collected via this method also enables analysis via a range of statistical methods, allowing us to understand if the answers we receive are robust, or if there are any hidden patterns which emerge from the data.

One to One Interviews

Interviews involve direct, one-on-one conversations where detailed information is solicited from participants. They are particularly useful for gathering qualitative data, offering deep insights into participants' attitudes, experiences, and emotions. Interviews can be structured, semi-structured, or unstructured, giving researchers flexibility in their approach. Imagine a study exploring the impact of remote work on employee well-being. Conducting semi-structured interviews with employees would offer nuanced understandings of personal experiences, challenges faced, and the overall satisfaction with remote work arrangements. The depth of understanding and information gathered via this process is particularly useful when speaking to participants about difficult or challenging topics of conversation.

Focus Groups

Focus Groups are guided discussions with a small group of participants, typically used to explore new ideas or opinions about products, services, or concepts. This method is invaluable for generating rich, detailed data and for observing the dynamics of participants' interactions and consensus-forming processes. Consider a company developing a new smartphone app. Hosting a focus group session with potential users could unveil insights into user expectations, desired features, and usability concerns, directly influencing the app's development trajectory. Due to the small number of respondents involved in the groups, care must be taken to ensure that you are speaking to a representative sample of your intended audience.

Ethnographic Studies

Ethnographic Studies involve watching and recording the behavior of subjects in their natural environment without intervention. This method is critical for studies where interaction with the subject might alter the outcome. For example, a retailer interested in improving store layout might conduct an observational study to track customer navigation patterns, identifying areas of congestion or overlooked products. Ethnographic studies can uncover vital behaviours which respondents themselves may be unaware of, as researchers seek to identify the unconscious behaviors which may otherwise be hidden from other research methods.

Examples of Primary Sources in Research

Primary research data sources are the lifeblood of firsthand research, providing raw, unfiltered insights directly from the source. These include:

Customer Satisfaction Survey Results: Direct feedback from customers about their satisfaction with a product or service. This data is invaluable for identifying strengths to build on and areas for improvement and typically renews each month or quarter so that metrics can be tracked over time.

NPS Rating Scores from Customers: Net Promoter Score (NPS) provides a straightforward metric to gauge customer loyalty and satisfaction. This quantitative data can reveal much about customer sentiment and the likelihood of referrals.

Ad-hoc Surveys: Ad-hoc surveys can be about any topic which requires investigation, they are typically one-off surveys which zero in on one particular business objective. Ad-hoc projects are useful for situations such as investigating issues identified in other tracking surveys, new product development, ad testing, brand messaging, and many other kinds of projects.

A Field Researcher’s Notes: Detailed observations from fieldwork can offer nuanced insights into user behaviors, interactions, and environmental factors that influence those interactions. These notes are a goldmine for understanding the context and complexities of user experiences.

Recordings Made During Focus Groups: Audio or video recordings of focus group discussions capture the dynamics of conversation, including reactions, emotions, and the interplay of ideas. Analyzing these recordings can uncover nuanced consumer attitudes and perceptions that might not be evident in survey data alone.

Through these examples, it's clear that each primary research method and source serves a distinct purpose, providing unique insights that are crucial for informed decision-making and strategic planning in various contexts.

Marketing Research Consulting

Need help with your research study? Contact our expert consulting team for help with survey design, fielding, and interpreting survey results.

Contact Our Consulting Team

Advantages and Disadvantages of Primary Research

Primary research, characterized by its ability to gather firsthand information directly from the source, plays a crucial role in the landscape of research methodologies. Despite its invaluable contributions to the acquisition of new, tailored data, primary research comes with its own set of advantages and disadvantages. Understanding these can help researchers and organizations make informed decisions when planning their research strategies.

Advantages of Primary Research

  • Specificity and Relevance : Primary research allows for the collection of data specifically tailored to the research questions or objectives. This targeted approach ensures that the information gathered is highly relevant and directly applicable to the matter at hand, providing clear insights and facilitating informed decision-making.
  • Control Over Data Quality : When conducting primary research, the researcher has complete control over the quality of data collected. This includes the design of the research method, the selection of participants, and the timing of data collection, all of which contribute to the reliability and validity of the research outcomes.
  • Up-to-Date Information : One of the key strengths of primary research is its ability to produce the most current data possible. This is particularly important in fast-moving sectors where timely information can provide a competitive edge or in academic studies where recent data can lead to groundbreaking conclusions.
  • Proprietary Information : The data collected through primary research is exclusive to the researcher or the commissioning organization. This proprietary nature of the data can offer a strategic advantage, especially in commercial contexts where unique insights can differentiate a company from its competitors.
  • Flexibility : Primary research methods are highly flexible, allowing researchers to adjust their approach based on preliminary findings or to explore unexpected avenues. This adaptability can lead to more comprehensive and nuanced understandings of the research topic.

Disadvantages of Primary Research

  • Cost : Conducting primary research is often expensive due to the costs associated with designing and implementing the study, recruiting participants, and collecting and analyzing data. These expenses can be prohibitive for some organizations or individual researchers.
  • Time : Primary research can be time-consuming, from the initial planning stages through to data collection and data analysis . This extended timeline may not be suitable for projects with tight deadlines or where quick decisions are needed.
  • Complexity : Designing and conducting primary research requires a certain level of expertise to ensure that the data collected is valid, reliable, and relevant. This complexity can pose challenges, particularly for those without extensive research experience.
  • Sample Size and Representativeness : Achieving a sample size that is both large enough to be statistically significant and representative of the broader population can be challenging. Missteps in this area can lead to skewed data and potentially unreliable conclusions.
  • Bias : Despite efforts to minimize bias in research design and implementation, primary research is vulnerable to biases introduced by the researcher, participants, or the research context itself. These biases can affect the objectivity and accuracy of the findings.

In conclusion, primary research is a valuable part of any researcher's toolkit, offering detailed, specific insights that are directly relevant to the research question. However, the decision to undertake primary research should be weighed against the potential costs, time requirements, and complexities involved.

Free Survey Maker Tool

Get access to our free and intuitive survey maker. In just a few minutes, you can create powerful surveys with its easy-to-use interface.

Try our Free Survey Maker or Request a Product Tour

Sawtooth Software

3210 N Canyon Rd Ste 202

Provo UT 84604-6508

United States of America

primary research articles

Support: [email protected]

Consulting: [email protected]

Sales: [email protected]

Products & Services

Support & Resources

primary research articles

Tutorial: Evaluating Information: Primary vs. Secondary Articles

  • Evaluating Information
  • Scholarly Literature Types
  • Primary vs. Secondary Articles
  • Peer Review
  • Systematic Reviews & Meta-Analysis
  • Gray Literature
  • Evaluating Like a Boss
  • Evaluating AV

Primary vs. Secondary Research Articles

In the sciences,  primary (or empirical) research articles :

  • are original scientific reports of new research findings (Please note that an original scientific article does not include review articles, which summarize the research literature on a particular subject, or articles using meta-analyses, which analyze pre-published data.)
  • usually include the following sections: Introduction , Methods , Results , Discussion, References
  • are usually  peer-reviewed (examined by expert(s) in the field before publication). Please note that a peer-reviewed article is not the same as a review article, which summarizes the research literature on a particular subject

You may also choose to use some secondary sources (summaries or interpretations of original research) such as books (find these through the library catalog) or review articles (articles which organize and critically analyze the research of others on a topic). These secondary sources, particularly review articles, are often useful and easier-to-read summaries of research in an area. Additionally, you can use the listed references to find useful primary research articles.

Anatomy of a Scholarly Article

scholarly article anatomy

from NCSU Libraries' Anatomy of a Scholarly Article

Types of health studies

In the sciences, particularly the health sciences, there are a number of types of primary articles (the gold standard being randomized controlled trials ) and secondary articles (the gold standard being systematic reviews and meta-analysis ). The chart below summarizes their differences and the linked article gives more information.

health study types

Searching for Primary vs. Secondary Articles

primary or secondary article search

Some scholarly databases will allow you to specific what kind of scholarly literature you're looking for.  However, be careful! Sometimes, depending on the database, the Review article type may mean book review instead of or as well as review article. You may also have to look under more or custom options to find these choices.

  • << Previous: Scholarly Literature Types
  • Next: Peer Review >>
  • Last Updated: Sep 6, 2024 12:35 PM
  • URL: https://guides.library.cornell.edu/evaluate
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case AskWhy Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

primary research articles

Home Market Research

Primary Research: What It Is, Purpose & Methods + Examples

primary research

As we continue exploring the exciting research world, we’ll come across two primary and secondary data approaches. This article will focus on primary research – what it is, how it’s done, and why it’s essential. 

We’ll discuss the methods used to gather first-hand data and examples of how it’s applied in various fields. Get ready to discover how this research can be used to solve research problems , answer questions, and drive innovation.

What is Primary Research: Definition

Primary research is a methodology researchers use to collect data directly rather than depending on data collected from previously done research. Technically, they “own” the data. Primary research is solely carried out to address a certain problem, which requires in-depth analysis .

There are two forms of research:

  • Primary Research
  • Secondary Research

Businesses or organizations can conduct primary research or employ a third party to conduct research. One major advantage of primary research is this type of research is “pinpointed.” Research only focuses on a specific issue or problem and on obtaining related solutions.

For example, a brand is about to launch a new mobile phone model and wants to research the looks and features they will soon introduce. 

Organizations can select a qualified sample of respondents closely resembling the population and conduct primary research with them to know their opinions. Based on this research, the brand can now think of probable solutions to make necessary changes in the looks and features of the mobile phone.

Primary Research Methods with Examples

In this technology-driven world, meaningful data is more valuable than gold. Organizations or businesses need highly validated data to make informed decisions. This is the very reason why many companies are proactive in gathering their own data so that the authenticity of data is maintained and they get first-hand data without any alterations.

Here are some of the primary research methods organizations or businesses use to collect data:

1. Interviews (telephonic or face-to-face)

Conducting interviews is a qualitative research method to collect data and has been a popular method for ages. These interviews can be conducted in person (face-to-face) or over the telephone. Interviews are an open-ended method that involves dialogues or interaction between the interviewer (researcher) and the interviewee (respondent).

Conducting a face-to-face interview method is said to generate a better response from respondents as it is a more personal approach. However, the success of face-to-face interviews depends heavily on the researcher’s ability to ask questions and his/her experience related to conducting such interviews in the past. The types of questions that are used in this type of research are mostly open-ended questions . These questions help to gain in-depth insights into the opinions and perceptions of respondents.

Personal interviews usually last up to 30 minutes or even longer, depending on the subject of research. If a researcher is running short of time conducting telephonic interviews can also be helpful to collect data.

2. Online surveys

Once conducted with pen and paper, surveys have come a long way since then. Today, most researchers use online surveys to send to respondents to gather information from them. Online surveys are convenient and can be sent by email or can be filled out online. These can be accessed on handheld devices like smartphones, tablets, iPads, and similar devices.

Once a survey is deployed, a certain amount of stipulated time is given to respondents to answer survey questions and send them back to the researcher. In order to get maximum information from respondents, surveys should have a good mix of open-ended questions and close-ended questions . The survey should not be lengthy. Respondents lose interest and tend to leave it half-done.

It is a good practice to reward respondents for successfully filling out surveys for their time and efforts and valuable information. Most organizations or businesses usually give away gift cards from reputed brands that respondents can redeem later.

3. Focus groups

This popular research technique is used to collect data from a small group of people, usually restricted to 6-10. Focus group brings together people who are experts in the subject matter for which research is being conducted.

Focus group has a moderator who stimulates discussions among the members to get greater insights. Organizations and businesses can make use of this method, especially to identify niche markets to learn about a specific group of consumers.

4. Observations

In this primary research method, there is no direct interaction between the researcher and the person/consumer being observed. The researcher observes the reactions of a subject and makes notes.

Trained observers or cameras are used to record reactions. Observations are noted in a predetermined situation. For example, a bakery brand wants to know how people react to its new biscuits, observes notes on consumers’ first reactions, and evaluates collective data to draw inferences .

Primary Research vs Secondary Research – The Differences

Primary and secondary research are two distinct approaches to gathering information, each with its own characteristics and advantages. 

While primary research involves conducting surveys to gather firsthand data from potential customers, secondary market research is utilized to analyze existing industry reports and competitor data, providing valuable context and benchmarks for the survey findings.

Find out more details about the differences: 

1. Definition

  • Primary Research: Involves the direct collection of original data specifically for the research project at hand. Examples include surveys, interviews, observations, and experiments.
  • Secondary Research: Involves analyzing and interpreting existing data, literature, or information. This can include sources like books, articles, databases, and reports.

2. Data Source

  • Primary Research: Data is collected directly from individuals, experiments, or observations.
  • Secondary Research: Data is gathered from already existing sources.

3. Time and Cost

  • Primary Research: Often time-consuming and can be costly due to the need for designing and implementing research instruments and collecting new data.
  • Secondary Research: Generally more time and cost-effective, as it relies on readily available data.

4. Customization

  • Primary Research: Provides tailored and specific information, allowing researchers to address unique research questions.
  • Secondary Research: Offers information that is pre-existing and may not be as customized to the specific needs of the researcher.
  • Primary Research: Researchers have control over the research process, including study design, data collection methods , and participant selection.
  • Secondary Research: Limited control, as researchers rely on data collected by others.

6. Originality

  • Primary Research: Generates original data that hasn’t been analyzed before.
  • Secondary Research: Involves the analysis of data that has been previously collected and analyzed.

7. Relevance and Timeliness

  • Primary Research: Often provides more up-to-date and relevant data or information.
  • Secondary Research: This may involve data that is outdated, but it can still be valuable for historical context or broad trends.

Advantages of Primary Research

Primary research has several advantages over other research methods, making it an indispensable tool for anyone seeking to understand their target market, improve their products or services, and stay ahead of the competition. So let’s dive in and explore the many benefits of primary research.

  • One of the most important advantages is data collected is first-hand and accurate. In other words, there is no dilution of data. Also, this research method can be customized to suit organizations’ or businesses’ personal requirements and needs .
  • I t focuses mainly on the problem at hand, which means entire attention is directed to finding probable solutions to a pinpointed subject matter. Primary research allows researchers to go in-depth about a matter and study all foreseeable options.
  • Data collected can be controlled. I T gives a means to control how data is collected and used. It’s up to the discretion of businesses or organizations who are collecting data how to best make use of data to get meaningful research insights.
  • I t is a time-tested method, therefore, one can rely on the results that are obtained from conducting this type of research.

Disadvantages of Primary Research

While primary research is a powerful tool for gathering unique and firsthand data, it also has its limitations. As we explore the drawbacks, we’ll gain a deeper understanding of when primary research may not be the best option and how to work around its challenges.

  • One of the major disadvantages of primary research is it can be quite expensive to conduct. One may be required to spend a huge sum of money depending on the setup or primary research method used. Not all businesses or organizations may be able to spend a considerable amount of money.
  • This type of research can be time-consuming. Conducting interviews and sending and receiving online surveys can be quite an exhaustive process and require investing time and patience for the process to work. Moreover, evaluating results and applying the findings to improve a product or service will need additional time.
  • Sometimes, just using one primary research method may not be enough. In such cases, the use of more than one method is required, and this might increase both the time required to conduct research and the cost associated with it.

Every research is conducted with a purpose. Primary research is conducted by organizations or businesses to stay informed of the ever-changing market conditions and consumer perception. Excellent customer satisfaction (CSAT) has become a key goal and objective of many organizations.

A customer-centric organization knows the importance of providing exceptional products and services to its customers to increase customer loyalty and decrease customer churn. Organizations collect data and analyze it by conducting primary research to draw highly evaluated results and conclusions. Using this information, organizations are able to make informed decisions based on real data-oriented insights.

QuestionPro is a comprehensive survey platform that can be used to conduct primary research. Users can create custom surveys and distribute them to their target audience , whether it be through email, social media, or a website.

QuestionPro also offers advanced features such as skip logic, branching, and data analysis tools, making collecting and analyzing data easier. With QuestionPro, you can gather valuable insights and make informed decisions based on the results of your primary research. Start today for free!

LEARN MORE         FREE TRIAL

MORE LIKE THIS

Experimental vs Observational Studies: Differences & Examples

Experimental vs Observational Studies: Differences & Examples

Sep 5, 2024

Interactive forms

Interactive Forms: Key Features, Benefits, Uses + Design Tips

Sep 4, 2024

closed-loop management

Closed-Loop Management: The Key to Customer Centricity

Sep 3, 2024

Net Trust Score

Net Trust Score: Tool for Measuring Trust in Organization

Sep 2, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Tuesday CX Thoughts (TCXT)
  • Uncategorized
  • What’s Coming Up
  • Workforce Intelligence

× Indigenous People's Weekend: The Libraries will be closed from Saturday - Monday, October 12 - 14 . Sherrill Library and Moriarty Library will reopen on Tuesday, October 15 at 10am. Enjoy the holiday!

× sherrill library will be closed from may 18 - september 2 due to construction. services by appointment: research consultations, library instruction, pick up appointments moriarty library will be open through august 31, monday - friday from 10am-6pm. closed weekends and holidays. as always, our web resources are available 24/7. questions our chat and ask us services are available monday-friday, 10am-6pm., × the libraries will be closed for memorial day weekend from friday-monday, may 24-27. enjoy the holiday, × spring break: monday, 03/13/2023 - sunday, 03/19/2023: library pickups are by appointment. need an appointment email us at sherrill library: [email protected] or moriarty library: [email protected], × alert mm/dd/yyyy: something is broken please contact us with questions., × alert 12/14/2023: ebsco allsearch is unavailable. we are working to fix this as quickly as we can. in the meanwhile, please try searching for articles from our proquest central database and for ebooks and books from our flo catalog . we're very sorry for the inconveniance. --> × welcome back our remote services guide has everything you need to know about library services we're offering this semester, including research help, study spaces, and more for other campus plans, see the lesley university covid-19 response. any other questions ask us, × welcome back our remote services guide has everything you need to know about library services we're offering this semester, including research help, study spaces, and more any other questions ask us, finding and using primary resources.

  • Where to Find Primary Resources
  • How to Find Primary Resources
  • Interpreting Primary Resources
  • Advanced Primary Resource Research Tools
  • Front Matter
  • Collection Inventory
  • Understanding Linear Feet
  • Citing Primary Resources

Primary sources are those created contemporaneously to whatever period a researcher is studying. In contrast to secondary sources, they don't provide any analysis on a given topic after the fact; instead, they reflect on information or events as they unfolded (for example, a newspaper article, from the time of a particular historical event, discussing the historical event as it happened). Primary sources are especially useful for researchers because they reveal how certain topics and ideas were understood during a specific time and place. The particular primary sources you might use in your research, as well as how you find them, can vary a lot based on your field of study. This guide aims to provide helpful information on where to go about searching for primary sources

What is a primary resource .

primary research articles

Each academic discipline or field defines and uses primary sources differently. Therefore, the definition of a primary source is contextual and dependent on that specific discipline or field of inquiry. Furthermore, any definition of primary sources also includes distinguishing them from secondary sources. Some disciplines use the term tertiary sources which typically include all three types (primary & secondary).

The  humanities  and the arts define primary sources as text, images, artifacts, and architecture (any material) that conveys the experience or life at time they are from. 

The  sciences  define primary sources as original research. The  social sciences  define primary sources similar to both the humanities, sciences, and author created data/evidence. They depend on the nature of the inquiry and research methodology. 

The  health sciences  define primary sources as original research.

Examples of Primary Resources

primary research articles

Walden  by Henry David Thoreau published in 2016, is a primary resource as the text was written in 1897, and offers insight into life in rural Massachusetts in the late 19th century. 

primary research articles

Rembrandt van Rijn, The Anatomy Lesson of Dr Nicolaes Tulp ,  1632. This painting is a good visual example of medical history in 17th century Holland. 

primary research articles

Ledgers of imports and exports, 1731, Held by The National Archives, Kew Gardens. This is a digital scan of an original ledger of imports and exports to London in 1731. This can give us a general idea of what trade looks like in 18th-Century England.

primary research articles

Tapestry Room from Croome Court, Various artists/makers,  1763–71, Metropolitan Museum of Art. This was designed in 1763–1771, . Around 1902 the ninth Earl sold the tapestries and seating to a Parisian dealer. The Samuel H. Kress Foundation purchased the ceiling, floor, chimneypiece, chair rails, doors and door surrounds in 1949; they were donated to the Metropolitan Museum of Art, New York, in 1958. This room provides insight as to what an 18th-century Country house room might look, and help historians understand domestic life. 

  • Next: Where to Find Primary Resources >>
  • Last Updated: Sep 5, 2024 1:50 PM
  • URL: https://research.lesley.edu/c.php?g=1400378

Moriarty Library

Porter Campus 1801 Massachusetts Avenue Cambridge, MA 02140 617-349-8070

Sherrill Library

South Campus 89 Brattle Street Cambridge, MA 02138 617-349-8850

Introduction to Special Collections & Archives: Primary vs. Secondary Sources

  • Primary vs. Secondary Sources
  • Oral Histories
  • Special Collections
  • Historical Newspapers
  • History and Political Science Resources

Primary, Secondary, & Archival Sources

primary research articles

Primary Sources

Primary sources are written or created by people who actually experienced or witnessed an event . This can take the form of scientific data that the author collected themselves, like the U.S. census or data collected during an experiment or study. Primary sources also include qualitative forms, like what people say, do, and experience. These sources can take various forms like written, audio, video, or photographic.

Archival Sources are primary sources that have been created during the course of everyday life and have enduring value as evidence of the past. This enduring value and the ways archives are organized vary by the preserving institution. Archives tend to be organized and labeled differently than other primary sources or secondary sources that can be found in a library. Rather than being grouped by topic, archival materials are grouped by creator in as close to the creator’s organization as possible.

Primary/Archival sources include : speeches and interviews; autobiographies, journals/diaries, letters/emails, blogs, social media, government documents, etc.

Note : Newspapers and magazines could be any of these types of sources, depending on how they are being viewed and used. Both newspapers and magazines contain articles and images of events, and could contain interviews. If the journalist is considered a witness to the event, then they have created a primary source. If this source is deemed to have enduring value, it could become an archival source.

If, however, the article is a commentary or editorial, and the journalist is not considered a witness, then they have created a secondary source.

Secondary Sources

Secondary sources are the types of material most people are familiar with. They are interpretations or analyses of events the author did not personally experience , often based on other’s writings.

Secondary sources include : scholarly books and articles, textbooks, commentaries, encyclopedias, etc.

Tertiary Sources and Beyond

Tertiary sources are even further removed from the original event our source of data. These are works that primarily reference secondary sources .

Tertiary sources include: encyclopedias, literature reviews

Interim Head of Special Collections/Archivist

Profile Photo

  • << Previous: Home
  • Next: Archives >>
  • Last Updated: Sep 4, 2024 4:36 PM
  • URL: https://utahtech.libguides.com/specialcollections_archives

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 06 September 2024

Primary succession of Bifidobacteria drives pathogen resistance in neonatal microbiota assembly

  • Yan Shao   ORCID: orcid.org/0000-0002-8662-0504 1 ,
  • Cristina Garcia-Mauriño 2 ,
  • Simon Clare 1 ,
  • Nicholas J. R. Dawson 1 ,
  • Andre Mu   ORCID: orcid.org/0000-0002-0853-9743 1 ,
  • Anne Adoum 1 ,
  • Katherine Harcourt 1 ,
  • Junyan Liu 1 ,
  • Hilary P. Browne   ORCID: orcid.org/0000-0002-1305-2470 1 ,
  • Mark D. Stares 1 ,
  • Alison Rodger 2 ,
  • Peter Brocklehurst 3 ,
  • Nigel Field 2 &
  • Trevor D. Lawley   ORCID: orcid.org/0000-0002-4805-621X 1  

Nature Microbiology ( 2024 ) Cite this article

78 Altmetric

Metrics details

  • Metagenomics
  • Microbial ecology

Human microbiota assembly commences at birth, seeded by both maternal and environmental microorganisms. Ecological theory postulates that primary colonizers dictate microbial community assembly outcomes, yet such microbial priority effects in the human gut remain underexplored. Here using longitudinal faecal metagenomics, we characterized neonatal microbiota assembly for a cohort of 1,288 neonates from the UK. We show that the pioneering neonatal gut microbiota can be stratified into one of three distinct community states, each dominated by a single microbial species and influenced by clinical and host factors, such as maternal age, ethnicity and parity. A community state dominated by Enterococcus faecalis displayed stochastic microbiota assembly with persistent high pathogen loads into infancy. In contrast, community states dominated by Bifidobacterium , specifically B. longum and particularly B. breve , exhibited a stable assembly trajectory and long-term pathogen colonization resistance, probably due to strain-specific functional adaptions to a breast milk-rich neonatal diet. Consistent with our human cohort observation, B. breve demonstrated priority effects and conferred pathogen colonization resistance in a germ-free mouse model. Our findings solidify the crucial role of Bifidobacteria as primary colonizers in shaping the microbiota assembly and functions in early life.

Similar content being viewed by others

primary research articles

Metagenomic insights of the infant microbiome community structure and function across multiple sites in the United States

primary research articles

Human milk oligosaccharides modify the strength of priority effects in the Bifidobacterium community assembly during infancy

primary research articles

Bacterial colonization reprograms the neonatal gut metabolome

Human gut microbiota colonization commences immediately at birth when neonates are exposed to microorganisms from the surrounding environment and maternal sources (for example, gut 1 , 2 , 3 , 4 , 5 , vagina 2 , 3 , 4 , skin 3 , 4 , breast milk 3 , 6 ). We recently reported in the UK Baby Biome Study (BBS) that maternal transmission of primary colonizers, such as commensal Bifidobacterium and Bacteroides species, is disrupted in caesarean-section (CS) and antibiotic-exposed births, instead predisposing the neonatal gut microbiota (NGM) to colonization by antibiotic resistant healthcare-associated pathogens 1 . This observation suggests the possibility of ‘priority effects’ in human gut microbiota assembly, which posits the arrival order of primary colonizer species determines the outcome of the microbiota assembly during a primary ecological succession (from sterile to complex communities) 7 , 8 . The NGM represents the earliest window of opportunity for intervention with probiotics or prebiotics to prevent or restore impaired microbiota development. However, little is known about the ecological priority effects in the NGM assembly due to a lack of high-resolution, longitudinal human microbiome data from the neonatal period (that is, the first month of life).

To comprehensively examine NGM assembly dynamics, we expanded on phase 1 of our BBS cohort (BBS1) 1 , 9 with an additional 688 neonatal participants (primarily day 7) in phase 2 (BBS2), effectively doubling our sampling effort. A large-scale, longitudinal metagenomic characterization of the combined BBS dataset, comprising 2,387 gut microbiota samples from 1,288 healthy UK neonates (≤1 month), enabled us to study neonatal microbiota assembly with unparalleled scale and resolution (Extended Data Fig. 1a,b and Supplementary Tables 1 – 3 ). To investigate the origin and both short-term and long-term stability of the NGM primary colonizers, we utilized three subgroups from the expanded BBS2 cohort. These included (1) 183 neonate–mother pairs (representing 14% of participants), referred to as investigating ‘maternal transmission’; (2) 359 participants with longitudinal sampling within the neonatal period (median = 3 samples per participant on days 4, 7 and 21; representing 28% of participants), referred to as investigating ‘neonatal longitudinal colonization’; and (3) 302 participants with paired samples taken both in the neonatal period and later in infancy (at 8.75 ± 1.98 months; representing 23% of participants), referred to as investigating ‘infancy persistence’ (Extended Data Fig. 1c ).

Complementing the increased sample size, we have also updated extensive, high-quality clinical and sociodemographic metadata harmonized from BBS clinical record forms and hospital electronic records (Methods), thereby facilitating robust statistical and epidemiological assessment of primary succession patterns. Most neonates in this cohort (84.5%, N  = 836) were at least partially breastfed by their mothers, with 44.1% being exclusively breastfed ( N  = 436). A large majority of participants at the time of infancy sampling were still being breastfed (86.2%; N  = 199), with very few fully weaned (0.87%; N  = 2). Only 11.3% ( N  = 123) received postnatal antibiotics during the first week of life (Supplementary Table 4 ).

Three community states in the neonatal gut microbiota

To delineate the primary succession patterns of the NGM, we sought to identify the primary colonizers driving gut microbial community structure during the neonatal period. Applying partitioning around medoids (PAM) clustering to 1,904 BBS neonatal gut metagenomes at the species level revealed an optimal clustering of three within the NGM, hereafter referred to as ‘NGM community states’ 10 (Fig. 1a and Extended Data Fig. 2a,b ). These three community states were further validated by another widely used microbial community typing method: the Dirichlet multinomial mixture (DMM) modelling framework (Extended Data Fig. 2c,d ). Both the PAM and DMM-based approaches showed strong concordance in community state assignments (Cramér’s V correlation of 0.726; Extended Data Fig. 2e ) and core species compositions (Extended Data Fig. 2f ). Notably, these three community states were consistently observed across the three main sampling points in the BBS cohort (days 4, 7 and 21), underscoring their representativeness of the neonatal period, irrespective of the timing of sample collection (Extended Data Fig. 3 ).

figure 1

a , Principal coordinates analysis (PCoA) plots of 1,904 neonatal gut metagenomes sampled within the first 30 days of life and clustered using the PAM algorithm on the basis of species-level JSD. Three distinct NGM community states (optimal number clusters k  = 3) were identified via PAM clustering. The inset pie chart displays the proportion of the three NGM community states, each labelled according to its primary driver species, namely B. breve (BB, green; N  = 336, 17.6% of the samples), E. faecalis (EF, purple; N  = 827, 43.4% of the samples) and B. longum (BL, orange; N  = 741, 38.9% of the samples). Ellipses encapsulate 67% of the samples within each respective cluster. b , Top 10 driver species contributing to variation observed in the ordination space, as ranked by effect size (‘envfit’ R 2 , false discovery rate (FDR)-corrected two-sided test, P  < 0.05). c , d , Each NGM community state is dominated by a single driver species, as measured by the high relative abundance of the driver species ( c ) and the low alpha diversity ( d ) across the three NGM community states (FDR-corrected, two-sided Wilcoxon test). Boxplot centre line and red point indicate the median and mean, respectively; box limits indicate the upper and lower quartiles; and whiskers indicate 1.5× the interquartile range (BB n  = 336, EF n  = 827, BL n  = 741).

Three bacterial species, Bifidobacterium longum subsp. longum (BL), Bifidobacterium breve (BB) and Enterococcus faecalis (EF) acted as the taxonomic drivers for each community state (Fig. 1b and Extended Data Figs. 2g and 4 ). Each species dominated their respective NGM community states with a relative mean abundance of 56.5% for BB, 21.7% for EF and 27.2% for BL (Fig. 1c ). Henceforth, they are referred to as NGM driver species with acronyms indicating each respective community state.

The observed single-species dominance of either B. breve , B. longum or E. faecalis in very early life can also be consistently observed in other cohorts, albeit underreported owing to the previous undersampling during the neonatal period (the largest sample size being <100). Evidence for this comes from diverse populations and methodologies, including 16S gene or quantitative PCR (qPCR)-based observations in Norway 11 ( N  = 87) and Denmark 12 ( N  = 16), as well as shotgun metagenomic surveys of neonates across industrialized urban populations similar to the UK BBS cohort in Europe (Sweden 13 ), Asia (Israel 14 ) and North America (the TEDDY cohort 15 , 16 ) (Extended Data Fig. 5 ). Importantly, the NGM community states observed across industrialized cohorts are paralleled in non-industrialized populations. In a peri-urban cohort in South Asia (Bangladesh 17 ), although B. breve continues to be a primary NGM driver species, the community states typically driven by B. longum and E. faecalis in industrialized settings are instead represented by closely similar species: B. infantis (closely related to B. longum ) and Escherichia coli (sharing facultative anaerobic and opportunistic pathogenic traits with E. faecalis ). Collectively, these cross-study validations strengthen the generalizability of our results in neonatal populations from different geographical regions and lifestyles beyond the UK, and using different methodologies.

Of note, B. longum subsp. infantis ( B. infantis ), which is closely related to BL and often used as an infant probiotic, was not identified as a driver species. It was rarely detected (~2% prevalence based on 0.5% relative abundance) in the BBS neonates 14 . The near absence of B. infantis in our UK neonatal cohort aligns with findings from other Western industrialized countries, including a recent meta-analysis 14 of cohorts from Israel, Sweden, Finland, Estonia, Italy and the USA 18 , where there is little evidence of B. infantis naturally colonizing the gut microbiota of healthy, full-term infants. This underscores the importance of distinguishing between closely related species that exhibit very different host colonization patterns.

Applying metagenomic strain tracking analysis on the ‘maternal transmission’ subset, only B. longum exhibited evidence of maternal transmission, with all evaluable BL neonates (15 out of 15) harbouring the exact same B. longum strain found in their mothers’ gut microbiota. This result, consistent with a recent global meta-analysis 19 , strongly indicates the maternal gut microbiota as the main source of the BL community state (Extended Data Fig. 6 ). While we could have overlooked maternal transmission of very low-abundance B. breve and E. faecalis below the metagenomic strain detection limit, we consider it more likely that they originate from unsampled maternal (for example, B. breve in breast milk 20 , 21 ) or environmental sources (for example, E. faecalis in the hospital birth environment 22 , 23 ) previously implicated as potential sources of these species in the NGM.

The abundant dominance of single driver species was particularly pronounced in community state BB, in which B. breve constituted over half of the NGM by mean relative abundance, and exhibited the lowest microbial richness and evenness, as reflected by the alpha (Shannon) diversity (Fig. 1d ). In comparison, the other two NGM community states, BL and EF, had higher microbial diversity, and other moderately abundant species frequently co-occurred with the driver species (Extended Data Fig. 2f,g ); B. longum with commensal E. coli , Bacteroides and other Bifidobacterium species; E. faecalis with environment and skin-associated Streptococcus , Staphylococcus spp., as well as healthcare-associated opportunistic pathogens Enterococcus , Klebsiella , Enterobacter spp. and C. perfringens . Notably, these less-dominant species in EF were also known signatures of hospital CS birth not only in this UK cohort 1 but also in cohorts from North America 24 , 25 , Latin America 24 and Europe 13 , 24 , 26 .

Factors influencing the acquisition of the NGM community states

To determine the perinatal factors influencing the acquisition of each NGM community state, we performed epidemiological analyses using 20 high-quality clinical and sociodemographic metadata variables ( N  = 1,108 eligible participants; Fig. 2 and Supplementary Table 5 ). After adjusting for potential confounders in multivariate fixed-effect logistic regression models, we found that the acquisition of an EF community state was independently associated with being born via CS birth (compared to vaginal delivery (VD); adjusted odds ratio (AOR) = 2.30 [95% CI 1.34–3.95], P  = 0.003; 70.5/23.6/40.0% among EF/BL/BB, respectively) and with the mother receiving intrapartum antibiotics during labour (AOR = 3.69 [95% CI 2.11–6.42], P  < 0.001; 80.8/32.7/46.3% among EF/BL/BB, respectively). Conversely, being born via CS birth and labour antibiotics exposure were negatively associated with BL acquisition (AOR for CS vs VD = 0.36 [95% CI 0.21–0.64], P  < 0.001; AOR for receiving antibiotics during labour = 0.46 [95% CI 0.26–0.79], P  = 0.005, respectively).

figure 2

a – c , Multivariate associations between clinical and sociodemographic variables and each week-1 NGM community state. Three different models were built: EF vs non-EF ( a ), BL vs non-BL ( b ) and BB vs non-BB ( c ). Likelihood ratio tests (two-sided) were used to calculate P values (without FDR correction), with P  ≤ 0.05 in the multivariate models displayed. Odds ratios (OR) are plotted on a log 10 scale. For details of univariate and multivariate analyses, refer to Supplementary Tables 5 and 6 . The week-1 NGM community state was identified for each eligible participant using the earliest available sample from week 1, either on day 4 ( N  = 64) or day 7 ( N  = 1,044).

Interestingly, several intrinsic host factors including sex (male with BB), maternal ethnicity (Asian with EF and BB), age (<30 and ≥40 with EF and BL, respectively) and parity (first time giving birth with EF) were also independently associated with specific community states. For example, mothers identifying as Asian (compared with white participants) were more likely to acquire BB (AOR = 2.11 [95% CI 1.32–3.38], P  = 0.006) but less likely to acquire EF (AOR = 0.63 [95% CI 0.41–0.95], P  = 0.04; 9.0/12.1/19.5% among EF/BL/BB, respectively). It is noteworthy that BB is the only community state that was exclusively influenced by host factors and independent of any clinical factors including mode of birth and antibiotics, which may suggest a distinct route of BB acquisition that remains unaffected by the perturbations associated with hospital births. These observations align with the hypotheses that maternal factors, such as genetic determinants of breast milk composition (for example, secretor status of the mothers) 27 , a history of previous pregnancies or cohabitation with children 28 , as well as cross-cultural differences in infant-care-associated behaviours 7 may influence the vertical transmission of maternal microbiota.

Neither postnatal antibiotics nor breastfeeding exposure, whether immediately after birth or within the first week of life, appeared to predispose neonates to any specific community state. This lack of association is probably attributed to the uniformly high-levels of antibiotic-free status (84.7/90.8/89.5% among EF/BL/BB, respectively) and breastfeeding rates (79.1/81.8/88.6% among EF/BL/BB, respectively) during the earliest postnatal window sampled in this cohort. The absence of an association between breastfeeding and EF also aligns with previous reports that, despite its antimicrobial properties, breast milk alone does not inhibit E. faecalis growth in vitro 29 , 30 .

Priority effects in NGM community state stability

We reasoned that the three primary colonizers as NGM drivers could benefit from priority effects, which would be evident through the exclusion of, or replacement by, later-arriving species in the NGM. To search for evidence of such priority effects, we sought to examine the stability and temporal signals of both the NGM community states and their driver species in the ‘neonatal longitudinal’ subset, stratified by birth modes. Most VD neonates who initially acquired a Bifidobacterium -dominated community state (either 92% for BB or 89% for BL, 79% or 72% by considering transient switches between day 4 and 7) during week 1 retained their community state when resampled in week 3 (Fig. 3a and Extended Data Fig. 7a ). By contrast, EF was the most unstable community state, with less than half of the neonates (29% in VD and 39% in CS) remaining in their early EF community state during the neonatal period (EF vs BB AOR 16.2 [95%CI 3.84–68.10], EF vs BL AOR 13.89 [4.02–48.02]; P  < 0.001; Supplementary Table 6 ). Irrespective of birth mode, BB proved more stable than EF (pairwise chi-square test, corrected P  < 0.001), while the sample size was insufficient to be confident about the relative stability of BL in CS neonates (65% versus 48% for EF; pairwise chi-square test, corrected P  = 0.52).

figure 3

a , b , Stability of NGM community states ( a ) and levels of three species driving NGM community states ( b ) (week 1, based on the earlier sample of day 4 or 7) in neonates longitudinally sampled from weeks 1 to 3 (day 21, total N  = 306; VD N  = 140; CS N  = 166). The proportion of community states that remained consistent from weeks 1 to 3 is depicted as a percentage of their initial sample size in week 1 (labelled in black). Participants starting with BB or BL on week 1 were significantly more likely to retain their community state in week 3 compared with those with EF (pairwise chi-square tests with FDR correction, P  < 0.001). c – e , Persistence of the dominant abundance of driver species of NGM community states in week 1 ( c , d ) or week 3 ( e ) in the paired longitudinal samples obtained later at week 3 ( c ) and in infancy ( d , e ). f , Persistent carriage of week-1 driver species in paired longitudinal samples obtained later in infancy. Species carriage is defined using a threshold of 0.1% relative abundance. Sample sizes of participants longitudinally sampled for weeks 1 and 3 shown in a – c are: total N  = 306; VD N  = 26/39/75 among BB/EF/BL, respectively; CS N  = 26/114/26 among BB/EF/BL, respectively; for week 1 and infancy (also referred to as the ‘infancy persistence’ group) shown in d and f : total N  = 302; VD N  = 27/43/90 among BB/EF/BL, respectively; CS N  = 17/108/17 among BB/EF/BL, respectively; and for week 3 and infancy shown in e : total N  = 146; VD N  = 12/11/43 among BB/EF/BL, respectively; CS N  = 17/37/26 among BB/EF/BL, respectively. Colour represents NGM community states or driver species: BB and B. breve in green; EF and E. faecalis in purple; BL and B. longum in orange. Boxplots as in Fig. 1 . Statistical differences in abundance between time points ( a ), species ( c – e ) and carriage frequency ( f ) were determined using paired t -tests, Wilcoxon tests and chi-square tests (all two-sided) with FDR correction, respectively.

The stability of the underlying driver species closely mirrored the observed community state dynamics. In contrast to E. faecalis , which rapidly declined throughout the stochastic assembly trajectory of the early community state EF, both B. breve and B. longum retained their high abundance within their respective community states throughout the 3-week neonatal sampling window (Fig. 3b and Extended Data Fig. 7b ). Notably, both species, as late-arriving secondary colonizers (that is, colonized NGM only in week 3), exhibited signs of competitively excluding E. faecalis in CS neonates who initially acquired the EF community state (Fig. 3b ). This competitive exclusion effect seemed most pronounced for B. breve ; in contrast to B. longum , it was able to colonize VD neonates at increasing levels as a late-arriving species (Extended Data Fig. 7b ). Among the primary colonizers that dominated the NGM in the first week, B. breve is the only species conferring durable colonization dominance (relative to the other driver species), which persisted as far as the final neonatal period sampling point at week 3 ( P  < 0.001 in VD and CS; Fig. 3c ).

The stability of the two Bifidobacterium species is also reflected at the strain level (Extended Data Fig. 6 ); most of the neonates retained the same B. longum (79.5%, N  = 35/44 BL neonates) or B. breve (75%, N  = 24/32 BB neonates) strain they initially acquired throughout the neonatal period, in contrast to 62.3% for E. faecalis ( N  = 43/69 EF neonates; the denominators represent longitudinally sampled individuals with detectable strain sharing events).

Together, as primary colonizers, both Bifidobacterium species benefit from priority effects, maintaining a stable NGM assembly trajectory owing to their ability to confer durable species dominance and inhibit the later arrival of opportunistic pathogens such as E. faecalis . In particular, B. breve exhibits stronger priority effects between the two species (that is, only as a primary colonizer), as well as strong deterministic exclusion of E. faecalis (that is, as either a primary or a secondary colonizer).

Stability of NGM driver species into infancy

We also assessed the longer-term engraftment of the NGM driver species in participants resampled 6–12 months beyond the neonatal period using the ‘infancy persistence’ subset. Remarkably, the relative dominance of B. breve (over the other driver species, in VD, P  < 0.05; Fig. 3d ) also extended into infancy when there was still no significant difference in breastfeeding rates between early NGM community states (BB/EF/BL: 88.4%/89.6%/80.5%, chi-square test, P  = 0.18). In addition, the long-term competitive exclusion effect of B. breve was evident in CS neonates who either retained or transitioned into BB (primarily from EF) by week 3. These long-term stability patterns were exclusively observed for B. breve , with its abundance in infancy being almost double in neonates who previously had a BB community state compared with those with other community states (Fig. 3e ).

Although NGM driver species rarely retained their differential abundance later in infancy (except B. breve ), the frequency of carriage for all three driver species was consistently higher in infants stratified by their corresponding NGM community states (Fig. 3f ). As many as 93% of VD (or 77% of CS) neonates with week-1 community state of BB still carried B. breve , compared with 58% and 66% (or 65% of CS) of VD neonates with week-1 community states EF and BL, respectively (pairwise chi-square tests, P  < 0.001). While levels of E. faecalis in community state EF waned over time to non-differential levels later in infancy, neonatal acquisition of EF remains a predisposing factor for longer-term carriage of E. faecalis . This opportunistic pathogen species was still detected in higher proportions (44%) in neonates from the EF community state during their first week (relative to 37–41% in BB and 35–38% in BL) when resampled later in infancy, regardless of their birth mode (pairwise chi-square tests, P  < 0.001; Fig. 3f ).

EF state enriched with virulence and antibiotic resistance genes

To determine the functional differences among NGM community states, we leveraged their driver species as proxies for functional analyses, using 1,249 high-quality isolate ( N = 133) and metagenome-assembled genomes ( N  = 1,116) generated from the corresponding community state samples (BB N  = 297, EF N  = 561, BL N  = 391; Supplementary Table 7 ). We found a striking difference between Bifidobacterium spp. and E. faecalis functional profiles in antimicrobial resistance (AMR) and virulence potential. Importantly, all E. faecalis strain genomes recovered from neonates with EF community states encoded known virulence factors including 70% predicted to produce the toxin cytolysin 31 . By contrast, both Bifidobacterium driver species genomes displayed markedly reduced levels of AMR and virulence-associated genes, with a burden 10- to a 100-fold less than in EF (median 17 versus 0; Fig. 4a ). Further AMR gene screening of the entire gut resistome within each community state revealed a higher carriage of high-risk AMR genes, such as CTX-M-15 linked to extended-spectrum beta-lactamase (ESBL), in both BL and EF community states (Fig. 4b ). This underscores the notable pathogenic potential of ESBL-carrying Enterobacteriaceae pathogens co-occurring in non-BB community states. These findings align with our risk factor analyses (Fig. 2 ), which identified maternal antibiotics exposure during labour (to some VD and all CS neonates) as a strong risk for the acquisition of an EF community state that bears increased risk of AMR and virulence.

figure 4

a , Counts of detected AMR and virulence genes in driver species genomes, with median values enclosed in brackets. Wilcoxon test (two-sided) with FDR correction; number of genomes (isolates in brackets): BB N  = 297 (30), EF N  = 561 (54) and BL N  = 391 (49). b , Carriage of high-risk AMR genes associated with ESBL in the day-7 NGM community state samples based on raw metagenomic assemblies (BB N  = 207, EF N  = 498, BL N  = 444). The x axis shows the most clinically prevalent ESBL genes belonging to CTX-M, OXA, SHV and TEM families. c , Proportion of species genomes, indicated by a colour gradient, predicted to utilize HMOs or their primary downstream products, lactose and fucose. The actual proportions are labelled for genotypes that are not completely present. The predictions are based on the presence of both the gene and its encoded transporters required for utilization of each substrate. 2′-fucosyllactose (2′-FL) liberates lactose and fucose which are also present in breast milk. Utilizations of LNnT, LNT and LNB will all liberate lactose. d , NGM driver species BB confers pathogen colonization resistance in vivo. The boxplot depicts the relative abundance of BB compared to the opportunistic pathogen species EF or K. oxytoca (KO). The x axis represents three experimental groups co-colonized as follows: (1) BB type strain DSM 20213 (2′-FL + ) with EF; (2) BB natural variant D19 (2′-FL − , isolated from a BBS neonate) with EF; and (3) BB type strain (2′-FL + ) with KO (D63). The BB genotype (2′-FL +/− ) indicates whether the strain encodes the α- l -fucosidase (GH95) enzyme encoding for 2′-FL metabolism. In each co-colonization group, one group of mice also received a 2′-FL supplement (50 mg ml −1  per day) in their daily drinking water. The y axis for BB co-colonization with KO is shown on a log scale. Each experimental condition included 3–5 mice per cage and 3 technical replicate cages. Statistical differences between treatment groups were determined using a t -test with Welch’s correction (two-sided). Boxplot centre line indicates the median, box limits indicate upper and lower quartiles, and whiskers indicate 1.5× the interquartile range.

Pathogen resistance of B. breve via metabolic adaptation to HMOs

At the genome-wide functional level, we observed distinct metabolic landscapes of NGM community states based on KEGG orthologues (Extended Data Fig. 8a ), particularly in metabolic repertoire of carbohydrate-active enzymes (Extended Data Fig. 8b ). Both Bifidobacterium community states, in contrast to EF, exhibited an enrichment in carbohydrate-active enzymes associated with metabolizing human milk oligosaccharides (HMOs) abundant and exclusively found in human breast milk. By contrast, EF predominantly possesses genes tailored for utilizing complex dietary glycans such as mannan and chitin, as well as those like starch and cellulose that are commonly found in a plant-based diet usually consumed later in life (Extended Data Figs. 8b and 9 ).

Compared with the limited HMO metabolic capability of the BL community state, BB is capable of utilizing the all the major HMO substrates including lacto- N -tetraose (LNT), lacto- N -neotetraose (LNnT) and lacto- N -biose (LNB), as well as the primary end-products of HMO metabolism l -fucose and d -lactose, which are naturally present in human breast milk (Fig. 4c ). Interestingly, among the three community states, only BB—comprising nearly all B. breve genomes (97.6%, N  = 290/297)—encode the enzyme (α- l -fucosidase, GH95 or GH29) required for metabolizing the most abundant HMO component 2′-fucosyllactose (2′-FL). Although these B. breve strains lack known transporters for importing 2′-FL for intracellular metabolism, previous in vitro experiments have shown that similar strains are capable of growing on 2′-FL 32 , 33 . Therefore, B. breve might be able to metabolize 2′-FL via a previously uncharacterized pathway. In contrast, such capability is extremely rare among BL (5.0%, N  = 19/391) and completely absent in EF (Fig. 4c ). Notably, the species-level variations in HMO utilization observed in the study strains are representative of BB/BL/EF species, exhibiting patterns consistent with those previously reported 34 . These patterns are not influenced by breastfeeding rates in this neonatal cohort, which are uniformly high and statistically indistinguishable among the community states (79.1%, 81.8% and 88.6% for EF, BL and BB, respectively).

Given that opportunistic pathogens including E. faecalis, E. faecium, Klebsiella oxytoca, K. pneumoniae, Enterobacter cloacae and Clostridium perfringens , which are enriched in the EF community state, lack the capability to metabolize HMOs and their by-products, we hypothesize that B. breve ’s versatility in utilizing these predominant neonatal dietary components substantially enhances its fitness against opportunistic pathogens in vivo. Considering that all neonates in the study would have been exposed to the same level of HMOs through a predominantly breast milk-based diet, regardless of their community state, we reason that the metabolic capability to utilize HMOs, including but not limited to 2′-FL, not only contributes to the dominance and stability of the BB community state but also enables B. breve to outcompete pathogenic species that cannot utilize HMOs. Supporting our hypothesis, we demonstrate in a gnotobiotic mouse model, co-colonized with B. breve and the opportunistic pathogen driver species E. faecalis , that B. breve dominates, and this dominance is amplified by dietary 2′-FL supplementation (Fig. 4d ). The 2′-FL-mediated pathogen resistance in vivo phenotype of B. breve also extends to the Gram-negative enteropathogen K. oxytoca , albeit to a lesser extent. Importantly, the anti-pathogen effect was absent in mice colonized with a natural B. breve variant isolated from a BBS neonate lacking the α- l -fucosidase (GH95) enzyme necessary for 2′-FL metabolism. These findings suggest that B. breve ’s strain-specific and gene-dependent utilization of HMOs could have a crucial role in enhancing resistance to pathogen colonization by inhibiting pathogen growth.

In presumably the largest neonatal gut metagenome study ever undertaken, we discovered three distinct NGM community states in over 1,000 healthy, full-term neonates drawn from the general UK population, representing diverse ethnicities and sociodemographic backgrounds. Factors that may influence the maternal gut microbiota, such as maternal age, ethnicity and parity, as well as events that influence its vertical transmission to the neonatal gut during the perinatal period (for example, CS and maternal antibiotics), serve as independent determinants of the acquisition of primary colonizers. The presence of a highly unstable community state (EF) with AMR-enriched opportunistic pathogens underscores the hospital environments and practices, such as maternal antibiotics during labour and elective CS births, as important risk factors 1 , 35 , 36 , 37 , 38 . Although antibiotics after birth and breastfeeding are known important factors shaping the later infant-stage microbiome development 13 , 15 , 39 , 40 , these postnatal factors had no observable effect on very early NGM dynamics on either the acquisition or the switching of the NGM community states. Together, our findings highlight that the NGM assembly outcome is highly dependent on the succession of primary colonizer species, with prenatal and perinatal factors associated with birth exerting profound influences.

Although the early-life microbiota is thought to be highly dynamic as reflected by high inter-individual variation 1 , here we describe an undisturbed, native primary succession pattern in microbiota assembly driven by a single Bifidobacterium species. B. longum is strongly linked to factors that promote maternal gut microbiota transmission at birth, such as vaginal delivery and absence of antibiotics. While B. breve seems unaffected by these factors, its independent association with maternal ethnicity (Asian) could be linked to the mother’s FUT2 secretor status, which determines the presence of 2′-FL and other HMOs in breast milk and is reportedly more common in Asian participants than in white participants 41 . The pattern of exclusive dominance by either B. breve or B. longum during very early life could also be observed in other cohorts across geographically diverse populations 11 , 12 , 13 , 14 , 15 , 16 . Earlier neonatal cohorts, limited by their smaller sample sizes ( N  < 100 compared with N  > 1,000 in this study) and lack of longitudinal samplings, were unable to report such patterns as distinctly and conclusively as we have in this study. Given that de novo identification of optimal community state clusters is sample size dependent 10 , our expanded BBS dataset—nearly 10 times larger than the previously largest neonatal dataset 13 —provided us with the statistical power to report a distinct tripartite NGM community structure. This includes a previously undescribed at-risk community state (EF) harbouring AMR-carrying opportunistic pathogens, and presumably for the first time, the epidemiological and longitudinal dynamics signatures of each NGM community state. Our findings provide crucial evidence that can guide the rational selection of species and strains for infant interventional trials, as well as the development of next-generation microbiota-based therapeutics. Future studies can stratify infants by their earliest gut community states to examine potential associations with longer-term health outcomes.

Both Bifidobacterium community states can drive deterministic and stable assembly trajectories in vivo through optimized utilization of HMOs exclusively present in human breast milk, the predominant diet during the neonatal period. Our human and in vivo data are in agreement with recent observations based on in vitro experiments 42 , 43 , showing that B. breve is functionally better adapted to an HMO-rich diet in very early life and dominate NGM through priority effects. Here we further demonstrated, in human and mouse, the functional impact of B. breve priority effects, resulting in stronger colonization resistance against AMR-enriched pathogens, including E. faecalis and K. oxytoca .

While the exact origins of opportunistic pathogens such as E. faecalis contributing to EF remain to be confirmed, their strong association with disruptions of natural birth (for example, CS and antibiotics) and their ubiquitous presence in the hospital birth environment 22 , 23 strongly indicate the hospital operating room as the most likely source, with exposure further exacerbated by the lack of maternal microbiota transmission that frequently occurs during natural birth. Although the EF perturbation patterns appear to be largely transient, with the neonatal microbiota naturally recovering from a delayed colonization trajectory 1 , 44 , inadequate pathogen clearance could persist into infancy. Along with the short-term exposure to high AMR and virulence, early acquisition of pathogens represents increased risk for infection susceptibility due to the immature immune system in very early life 45 . Also, the delayed or lack of exposure to commensal B. breve or/and B. longum as a primary colonizer in the critical neonatal window of immunity 45 and neurological 46 development could potentially result in neurodevelopment and immune-mediated disorders later in childhood 47 . Epidemiological evidence from other independent birth cohorts indicates that a non- Bifidobacterium (for example, EF) community state may predispose neonates to an increased risk of neurological disorders 48 and respiratory diseases (for example, asthma and atopy 49 , 50 ), including respiratory infections 26 , 51 , later in childhood.

Bifidobacterium spp. are known to achieve bifidogenic effects through the provision of HMOs, with a notable focus on B. infantis and its probiotic application as a specialized HMO-utilizing species. Despite its prevalence and dominance in infants from low- to middle-income and non-industrialized settings 17 , 52 , B. infantis is notably absent in this UK cohort and other Western cohorts, suggesting that it may no longer be naturally colonizing newborns in Western, industrialized populations. Its notable absence indicates a potential lack of a reservoir for B. infantis to establish itself as a primary colonizer, despite the considerable selective advantage that extensive exposure to HMOs during the neonatal period would presumably provide. Our results demonstrate that an HMO functional niche could be filled by other species ( B. breve or B. longum ) capable of metabolizing HMO if they are prevalent in the perinatal microbial species pool. The findings of strain-dependent utilization of HMOs, including but not limited to 2′-FL, and colonization resistance phenotypes of B. breve further highlight that the success of primary succession is probably dependent on both the species prevalence and strain-level functional variation.

Maternal seeding of microbial metabolizers of the specialized bioactives in breast milk probably represents an evolutionally conserved strategy to prime human gut microbiota assembly with primary colonizers with the highest likelihood for priority effects, such as B. breve and, to a lesser extent, B. longum . While both species have been associated with maternal origins 53 , strain transmission analyses from both our work as well as that of others 19 have identified only B. longum as the most frequently transmitted species from the mother’s gut. Although B. breve did not appear to originate from the maternal gut microbiota, we cannot rule out the possibility of vertical transmission of very low-abundance B. breve strains. Recent cultivation-based evidence has confirmed that such transmission can occur below the limits of metagenomic strain detection 54 . Other unsampled maternal or environmental sources could also be involved in seeding B. breve . One likely source is breast milk microbiota, where B. breve has been detected and implicated in the entero-mammary pathway—a retrograde mechanism for milk inoculation 21 . Future research should investigate the global strain reservoir and transmission patterns of Bifidobacterium species, especially for the poorly understood B. breve . Considering the limited success of probiotic-derived B. infantis strains in natural engraftment of neonatal gut microbiota in both industrialized and non-industrialized populations 18 , 55 , comprehensive strain-level functional characterizations of naturally prevalent and stable primary colonizers, such as B. breve , are vital. This effort will expedite the discovery of infant probiotics that are better optimized for local populations.

Study population

The Baby Biome Study (BBS) participants were recruited at the Barking, Havering and Redbridge University Hospitals NHS Trust, the University Hospitals Leicester NHS Trust and the University College London Hospitals NHS Foundation Trust from May 2014 to December 2017. The study was approved by the NHS London City and East Research Ethics Committee (REC reference 12/LO/1492). Mothers provided written informed consent for their participation and the participation of their children in the study. The study was performed in compliance with all relevant ethics regulations.

Whole-genome sequencing and analysis

The study participants, drawn from a general population of women giving birth in hospitals in the UK without any clinical inclusion or exclusion criteria as per the BBS study protocol 56 , are predominantly healthy, full-term neonates. The study dataset comprised 2,387 metagenomes, with 1,679 from the previously published 1 BBS phase 1 (BBS1) and 708 new neonatal gut metagenomes in BBS phase 2 (BBS2), totalling 1,288 participants. The aim of BBS2 was to sequence all the remaining neonatal samples collected from the original BBS study. The study sample size was predicated on detecting differences by mode of birth rather than providing statistical power to discern differences in microbial community states. The sampling and data processing protocols, ranging from sample collection to sequence data generation, quality control (low-quality trimming and human decontamination) and processing, remained unchanged from those previously described 1 for BBS1. In brief, faecal samples were collected at home by parents from neonates in the first 3 weeks of life (primarily on days 4, 7 and 21) and later in infancy. Paired maternal faecal samples were taken at the hospital around the time of birth. Most new samples in BBS2 were collected on day 7 of life. The only change was an institute-wide upgrade in the Illumina sequencing platform, transitioning from HiSeq 2500-v4 (2 ×125 bp) to HiSeq 4000 (2 ×151 bp). A multiplexing strategy was employed to ensure that the target depth remained consistent with BBS1. While the upgraded sequencing platform has resulted in a marginal increase in sequencing depth for BBS2 (from 19.3 to 20.4 million reads per sample post-quality control, calculated with seqkit (v.2.4.0) 57 , P  < 0.001, two-sided t -test), it did not impact either the community state assignment ( P  = 0.4731, likelihood ratio test via multinomial logistic regression) or the recovery of high-quality genomes (proportion of the total genome bins) for NGM driver species ( P  = 0.9716, Mantel–Haenszel chi-squared test, stratified by species).

Read-based taxonomic classification was performed against the Genome Taxonomy Database (GTDB, RS207) representative bacterial and archaeal species genomes ( N  = 65,703) using bowtie2 (v.2.3.5) 58 and inStrain (v.1.3.0) 59 ‘profile’ with the recommended ‘–database mode’ and 50% genome breadth (covered by ≥1 read) cut-off, as previously described 52 , 59 . The R package phyloseq (v.1.12.0) 60 was used for metagenomic data analysis, and results were processed and visualized using tidyverse (v.2.0.0) in RStudio (v.4.1.0).

Strain sharing analysis was performed using StrainPhlAn4 (ref. 61 ), following the workflow and species-specific strain identity thresholds previously described 19 . Where appropriate, multiple testing corrections were applied to all statistical tests using the Benjamini–Hochberg FDR method with a significance threshold of 5%, unless otherwise specified.

Cultivation and whole genome sequencing of the NGM species isolates were performed using the previously established workflow 1 for BBS1. In brief, the NGM species in driver NGM samples were cultured from corresponding frozen faecal samples using selective media: Bifidobacterium selective media (Sigma-Aldrich) for B. longum and B. breve , and Enterococcus selective agar (Sigma-Aldrich) for E. faecalis . Purified bacterial isolates were sequenced on the Illumina HiSeq X or NovaSeq 6000 system (2 ×151 bp), and assembled and quality-controlled using shovill (v.1.1.0; https://github.com/tseemann/shovill ) and CheckM2 (ref. 62 ), respectively.

Clinical and sociodemographic metadata sources and management

Participant data were collected using a clinical record form at enrolment by the BBS research midwives or from available clinical records at birth. Hospital maternity electronic records with pregnancy and perinatal clinical information were obtained directly from the hospital trusts, and databases containing the variables of interest were merged. Variables were harmonized where possible across different databases. For discrepancies, data from the BBS clinical record forms were given priority, and hospital electronic data were used to complete missing data. At the time of stool sample collection, mothers completed a short form on feeding mode and antibiotic exposure. A total of 20 variables were included in the final analyses on the basis of clinical relevance, quality of data and completeness ( N  = 6 maternal, N  = 8 perinatal or at time of delivery, N  = 5 postnatal, N  = 1 at the time of stool sample collection variables). Ten variables had no missing or <1% missing data. Four had between <1% and 5% missing data (index of multiple deprivation (IMD), maternal smoking, prolonged rupture of membranes (PROM) and neonatal labour antibiotics after birth), two had between 5% and 15% missing data (maternal ethnicity and feeding mode at the time of stool sample collection), and one had >30% missing data (skin to skin).

We used participant postcode to determine IMD 63 , which provides a measure of socioeconomic status that is calculated as an area-level relative deprivation score that we organized into quintiles from 1 (least deprived) to 5 (most deprived). The score considers seven individually weighted domains (income, employment, education, health, crime, barriers to housing and services, and living environment). Prophylactic antibiotics were administered to all mothers undergoing caesarean section in this cohort, as well as to newborns displaying risk factors or clinical indicators of early-onset neonatal infection, in accordance with local trust policies and UK national guidelines at the time 64 , 65 . To our knowledge, no participants were given antibiotics for treating bloodstream infections of E. faecalis . Skin-to-skin contact is defined as contact of mother and baby immediately after birth at least for 1 h or until the next feeding 66 . Feeding mode at the time of stool sample collection was determined through a questionnaire that included three categories: exclusive breastfeeding, exclusive bottle feeding, or both (that is, mixed feeding). For comparisons involving (non)exclusive breastfeeding, the latter two categories were merged into a single ‘non-exclusive breastfeeding’ category.

Statistical analyses

No statistical methods were used to pre-determine sample sizes, but this study already represents the largest dataset of longitudinal faecal metagenomes ( n  = 1,904; n  = 2,387 including infancy samples) of newborns ( n  = 1,288). No data were excluded unless they failed quality control steps. Microbiome data collection and analysis were not randomized or performed blind to the conditions of the experiments, as this is an observational study. Biological counting experiments were blinded by another person other than the experimenter before being counted to avoid experimental bias. For mouse experiments, treatments were randomized by cage by researchers blinded to treatment conditions. Unless otherwise stated, non-parametric statistical tests were performed unless tests for normality and equal variances showed that these assumptions were met.

For the epidemiological analyses of NGM community states in the first week of life, BBS participants with sufficient metadata were explored (90.4%, N  = 1,108 of 1,225 participants with week-1 sampling). The week-1 NGM community state was determined for each eligible participant by using the earliest available sample from week 1, collected either on day 4 ( N  = 64) or day 7 ( N  = 1,044).

To ascertain risk factors for specific NGM community states: BB versus non-BB, EF versus non-EF, and BL versus non-BL, univariate analyses using fixed-effect logistic regression models were initially performed. Subsequent multivariate models were constructed, also using fixed-effect logistic regression, and included only participants with complete datasets while excluding variables with over 15% missing data. Likelihood ratio tests were employed to calculate all P values. A hierarchical framework was applied in building the multivariate models. Variables were organized in a sequential order into either distal (maternal) or more proximal categories (delivery, postnatal care and the first week of life). Variables were considered potential confounders if they occurred simultaneously with or before exposure variables 67 . Within each category, all variables from that category or previous categories were incorporated into the model to account for confounding.

Sensitivity analyses were conducted to identify factors associated with NGM community state, switching between weeks 1 and 3. This included a subset of ‘neonatal longitudinal’ participants with sufficient metadata (87.6%, N  = 268 of 306, corresponding to Fig. 2a ). Both univariate and subsequent multivariate analyses were conducted using fixed-effect logistic regression in the same manner as described above. Multivariate models were further adjusted for the week-1 community state (that is, EF, BB or BL) to discern whether any associations were driven by the baseline community state. There was no strong evidence of association, other than for the baseline community state itself. These analyses could not extend to independent community states switches due to insufficient sample size. All analyses were conducted using Stata (v.17.0).

Community state assignment

The NGM community state assignment was applied to all neonatal samples ( N  = 1,904) using two popular methods, namely, the original clustering-based PAM method described in ref. 68 and the probabilistic modelling-based Dirichlet multinomial mixtures (DMM) approach described previously 69 . In accordance with the original protocols, PAM clustering was applied to the species-level relative abundance distance measured by the Jensen–Shannon divergence (JSD) using the R packages ‘cluster’ (v.2.1.4) and ‘vegan’ (v.2.6.4), and DMM models were fitted on the species-level relative abundance matrix, modelled by the Dirichlet multinomial distribution, using the R package ‘DirichletMultinomial’ (v.1.4). For both methods, the optimal number of clusters of three was determined on the basis of the Calinski–Harabasz index for PAM clustering and the model fit score based on Laplace approximation for DMM. The community states were named according to the top taxonomic driver (species) that contributed the most to microbial community variation (‘envfit’ R 2 , P  < 0.05) in PAM and to each Dirichlet component (cluster) in DMM. The strength of association between the PAM and the DMM-based community states was 0.726 (Cramer’s V correlation). For downstream analyses, the PAM-based community state assignment was selected because it maximized both the sample size of community states BB and BL (Extended Data Fig. 2e ) and the mean relative abundance of the driver species in the respective community state ( B. breve in BB, E. faecalis in EF; Extended Data Fig. 2f ).

To validate the single-species dominance in external neonatal cohorts, the same workflow for community state type assignment was independently applied to four public gut metagenomic datasets with a comparable sampling window (<6 months) to the BBS cohort, including partial or exclusive sampling of the neonatal period (0–1 month). The earliest sampling windows were from cohorts derived from diverse geographical populations and lifestyles, including Sweden 13 , 42 ( PRJEB6456 , days 4–12, N  = 37), Israel 14 ( PRJNA994433 , weeks 1–24, N  = 60), the USA (TEDDY cohort 15 , 16 , PRJNA400115 , months 2–6, N  = 69) and Bangladesh 17 ( PRJNA806984 , months 0–2, N  = 234).

Metagenome assembly and functional analyses

Quality-controlled, raw paired-end reads were first assembled with SPAdes (v.3.13.5) 70 with the option –meta. Unassembled reads were then filtered out by mapping raw reads back to metaSPAdes 71 -assembled contigs using bwa-mem (v.0.7.17) 72 , followed by re-assembly with MEGAHIT (v.1.1.3) 73 using default parameters. Subsequently, the metaSPAdes and MEGAHIT assemblies were combined, sorted and short contigs (<1,500 bp) removed. The resulting assemblies were then independently binned with MetaBAT 2 (v.2.13) 74 , MaxBin2 (v.2.2.4) 75 and CONCOCT (v.0.4) 76 using default parameters and a minimum contig length threshold of 1,500 bp (option –minContig 1500). The depth of contig coverage required for the binning was inferred by mapping the raw reads back to their assemblies with bwa-mem (v.0.7.17) and then calculating the corresponding read depths of each individual contig with samtools 77 (‘samtools view -Sbu’ followed by ‘samtools sort’) together with the ‘jgi_summarize_bam_contig_depths’ function from MetaBAT 2.

Thereafter, individual genome bin sets produced by three binning programs were consolidated into a refined bin set consisting of the best version of each bin based on the most optimal genome completion and contamination metrics among all seven versions of hybridized bin sets (MetaBAT 2, MaxBin2, CONCOCT, MetaBAT 2 + MaxBin2, MetaBAT 2 + CONCOCT, MaxBin2 + CONCOCT, MetaBAT 2 + MaxBin2 + CONCOCT) as estimated by CheckM (v.1.0.7) 78 using the metaWRAP (v.1.2) 79 ‘bin_refinement’ pipeline 79 . In total, 22,668 prokaryotic metagenome-assembled genomes (MAGs) met the criteria of having >50% completeness and <5% contamination, as determined by CheckM2 (ref. 62 ). These MAGs were subsequently taxonomically assigned using the GTDB 80 (R214) taxonomy with GTDB-Tk (v.2.3.0) 81 .

For genome analyses of the three NGM driver species, data were derived from samples as either cultivated isolate genomes or metagenome-assembled genomes (MAGs) when cultured strains were unavailable. Only near-complete, high-quality MAGs were used in the functional analyses ( N  = 1,116). All genomes met strict quality control criteria, which included ≥90% completeness, ≤5% contamination, an N50 value of ≥10 kb, passing the GUNC test, an average contig length of ≥5 kb and ≤500 contigs, as previously described 82 . Genome annotation of metabolic function was performed using DRAM (v.1.4.5) 83 , which integrates annotations from multiple databases, including Pfam, KEGG (KOfam), UniProt, dbCAN (carbohydrate-active enzymes) and MEROPS (peptidases). The functional gene counts from KEGG and CAZy annotations were used to generate a PCA plot using the R package ‘pcaMethods’, employing conventional singular value decomposition with imputation. The genome-based prediction of HMO substrate utilization was based on KEGG and CAZy annotations mapped against a list of manually curated relevant genes and pathways as described recently 42 , 43 . The genes corresponding to HMO substrates (enzymes; transporters) were: 2′FL (GH95 and/or GH29, FL1_Blon0341-0343 and/or FL2_Blon2202-2204), lactose (GH2, LacS), fucose (FumC/D/E/F/G, FucP), LNT (GH42 or GH136, GltABC), LNnT (GH20, Bbr_1554) and LNB (GH112, GltABC). In silico screening of AMR and virulence factor genes was performed at the species level with species MAGs and at the sample level with raw metagenome assemblies as input for ABRicate against the NCBI AMRFinderPlus and VFDB databases as previously described 1 . The AMR genes encoding for the extended-spectrum β-lactamase (ESBL) phenotype were annotated using the curated antibiotic subclass of the NCBI Pathogen Detection Reference Gene Catalog (as of 1 October 2023).

Bacterial strains and reagents

The bacterial strains used in this study were either part of the in-house (HMIL) culture collection cultivated from the BBS faecal samples or requested from public collections (DSMZ). Specific strains were: B. breve strains (type strain DSM 20213 and D19 isolated from a BBS neonate), E. faecalis (D13 isolated from a BBS neonate) and K. oxytoca (D63 isolated from a BBS neonate). Purified HMO 2′FL (GlyCare 2FL 9000, batch 20156002) and LNnT were purchased from Glycom, DSM.

Mouse experiment

Wild-type C57BL/6N mice were maintained under germ-free conditions at the Wellcome Sanger Institute Home Office-approved facility, with all procedures carried out in accordance with the UK Animals (Scientific Procedures) Act of 1986 under Home Office approval (PPL no. 80/2643). Germ-free mice were housed under a 12 h light/12 h dark cycle, ambient temperature and humidity condition in positive-pressure isolators (Bell), with faeces tested by culture, microscopy and PCR to ensure sterility. Consumables were autoclaved at 121 °C for 15 min before introduction into the isolators. For experimentation, 6-week-old mice of both sexes were randomly assigned to treatment groups. Cages were opened in a vaporized hydrogen peroxide-sterilized, class II cabinet (Bioquell), with mono-colonized gnotobiotic lines generated by oral gavage on day 1 ( B. breve ) at the concentration of 10 9  colony-forming units (c.f.u.) per ml and day 4 (challenged by opportunistic pathogen species E. faecalis or K. oxytoca at the concentration of 10 4  c.f.u. per ml). Materials were prepared in Dulbecco’s PBS at 100 mg ml −1 immediately before administration under anaerobic conditions (10% H, 10% CO 2 , 80% N) in a Whitley DG250 workstation at 37 °C. Mice were maintained in sterile ISOcages (Tecniplast) and housed on ISOrack for the period of the experiment.

Control groups of mice colonized with BB, EF or KO without any treatment were also included to confirm mono-colonization. One of the two groups of the co-colonized mice (for both BB + EF and BB + KO experiments) were exposed to 2′-FL via daily drinking water (50 mg ml −1  per day) throughout the experiment. Faecal samples were collected on each oral gavage day and plated to test for contamination. Mice were killed on day 11 (7 days post inoculation on day 4), with faecal samples collected and plated for colony count on yeast extract casitone fatty acids (YCFA) aerobically (to select for E. faecalis or K. oxytoca ) and YCFA with mupirocin (to select for Bifidobacterium spp.) media under anaerobic conditions. YCFA is a complex, broad-range medium 84 . Each experimental condition included 3–5 mice per cage and 3 technical replicate cages.

DNA was extracted from faeces using FastDNA Spin Kit for Soil (MPBio) according to manufacturer instructions, and DNA eluted into 100 µl of double-distilled H 2 O. Eluted DNA was then diluted 1:50 and qPCR performed using SYBR Green chemistry (Thermo Fisher). The absolute bacterial load in each faecal sample was determined by qPCR using a calibration curve generated with genomic DNA and taxon-specific primer sequences ( E. faecalis , F: 5′-CCCTTATTGTTAGTTGCCATCATT-3′, R: 5′-ACTCGTTGTACTTCCCATTGT-3′; Bifidobacterium spp., F: 5′-CTCCTGGAAACGGGTGG-3′, R: 5′-GGTGTTCTTCCCGATATCTACA-3′; K. oxytoca , F: 5′-GGACTACGCCGTCTATCGTCAAG-3′, R: 5′- TAGCCTTTATCAAGCGGATACTGG-3′). As previously described 85 , the relative abundance of each target species was estimated by normalizing to those of a universal bacterial 16S primer (F: 5′-GTGSTGCAYGGYTGTCGTCA-3′, R: 5′-ACGTCRTCCMCACCTTCCTC-3′).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Shotgun metagenomic sequencing data (after quality trimming and human decontamination) of the entire Baby Biome Study cohort have been deposited to the European Nucleotide Archive under study accession number ERP115334 . Bacterial genome assemblies for the three species analysed have been deposited in Zenodo at https://doi.org/10.5281/zenodo.12667210 (ref. 86 ). Sample metadata and participant-level clinical metadata of de-identified study participants are provided in the Supplementary Tables. The raw faecal samples and bacterial isolates are available from the corresponding authors upon request.

Code availability

All software used to perform these analyses is publicly available. Software tools used are listed in the main text and Methods.

Shao, Y. et al. Stunted microbiota and opportunistic pathogen colonization in caesarean-section birth. Nature 574 , 117–121 (2019).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Mitchell, C. M. et al. Delivery mode affects stability of early infant gut microbiota. Cell Rep. Med. 1 , 100156 (2020).

Bogaert, D. et al. Mother-to-infant microbiota transmission and infant microbiota development across multiple body sites. Cell Host Microbe 31 , 447–460 (2023).

Article   CAS   PubMed   Google Scholar  

Ferretti, P. et al. Mother-to-infant microbial transmission from different body sites shapes the developing infant gut microbiome. Cell Host Microbe 24 , 133–145 (2018).

Yassour, M. et al. Strain-level analysis of mother-to-child bacterial transmission during the first few months of life. Cell Host Microbe 24 , 146–154 (2018).

Fehr, K. et al. Breastmilk feeding practices are associated with the co-occurrence of bacteria in mothers’ milk and the infant gut: the CHILD cohort study. Cell Host Microbe 28 , 285–297 (2020).

Sprockett, D., Fukami, T. & Relman, D. A. Role of priority effects in the early-life assembly of the gut microbiota. Nat. Rev. Gastroenterol. Hepatol. 15 , 197–205 (2018).

Article   PubMed   PubMed Central   Google Scholar  

Debray, R. et al. Priority effects in microbiome assembly. Nat. Rev. Microbiol. 20 , 109–121 (2022).

Mäklin, T. et al. Strong pathogen competition in neonatal gut colonisation. Nat. Commun. 13 , 7417 (2022).

Costea, P. I. et al. Enterotypes in the landscape of gut microbial community composition. Nat. Microbiol. 3 , 8–16 (2018).

Avershina, E. et al. Bifidobacterial succession and correlation networks in a large unselected cohort of mothers and their children. Appl. Environ. Microbiol. 79 , 497–507 (2013).

Laursen, M. F. & Roager, H. M. Human milk oligosaccharides modify the strength of priority effects in the Bifidobacterium community assembly during infancy. ISME J . 17, 2452–2457 (2023).

Bäckhed, F. et al. Dynamics and stabilization of the human gut microbiome during the first year of life. Cell Host Microbe 17 , 690–703 (2015).

Article   PubMed   Google Scholar  

Ennis, D., Shmorak, S., Jantscher-Krenn, E. & Yassour, M. Longitudinal quantification of Bifidobacterium longum subsp. infantis reveals late colonization in the infant gut independent of maternal milk HMO composition. Nat. Commun. 15 , 894 (2024).

Stewart, C. J. et al. Temporal development of the gut microbiome in early childhood from the TEDDY study. Nature 562 , 583–588 (2018).

Vatanen, T. et al. The human gut microbiome in early-onset type 1 diabetes from the TEDDY study. Nature 562 , 589–594 (2018).

Vatanen, T. et al. A distinct clade of Bifidobacterium longum in the gut of Bangladeshi children thrives during weaning. Cell 185 , 4280–4297.e12 (2022).

Casaburi, G. et al. Metagenomic insights of the infant microbiome community structure and function across multiple sites in the United States. Sci. Rep. 11 , 1472 (2021).

Valles-Colomer, M. et al. The person-to-person transmission landscape of the gut and oral microbiomes. Nature 614 , 125–135 (2023).

Martín, R. et al. Isolation of bifidobacteria from breast milk and assessment of the bifidobacterial population by PCR-denaturing gradient gel electrophoresis and quantitative real-time PCR. Appl. Environ. Microbiol. 75 , 965–969 (2009).

Kordy, K. et al. Contributions to human breast milk microbiome and enteromammary transfer of Bifidobacterium breve . PLoS ONE 15 , e0219633 (2020).

Brooks, B. et al. Microbes in the neonatal intensive care unit resemble those found in the gut of premature infants. Microbiome 2 , 1 (2014).

Brooks, B. et al. Strain-resolved analysis of hospital rooms and infants reveals overlap between the human and room microbiome. Nat. Commun. 8 , 1814 (2017).

Song, S. J. et al. Naturalization of the microbiota developmental trajectory of Cesarean-born neonates after vaginal seeding. Med 2 , 951–964.e5 (2021).

Dos Santos, S. J. et al. Maternal vaginal microbiome composition does not affect development of the infant gut microbiome in early life. Front. Cell. Infect. Microbiol. 13 , 303 (2023).

Google Scholar  

Reyman, M. et al. Impact of delivery mode-associated gut microbiota dynamics on health in the first year of life. Nat. Commun. 10 , 4997 (2019).

Lewis, Z. T. et al. Maternal fucosyltransferase 2 status affects the gut bifidobacterial communities of breastfed infants. Microbiome 3 , 13 (2015).

Martin, R. et al. Early-life events, including mode of delivery and type of feeding, siblings and gender, shape the developing gut microbiota. PLoS ONE 11 , e0158498 (2016).

Schlievert, P. M., Kilgore, S. H., Seo, K. S. & Leung, D. Y. Glycerol monolaurate contributes to the antimicrobial and anti-inflammatory activity of human milk. Sci. Rep. 9 , 14550 (2019).

Sweeney, E. et al. The effect of breastmilk and saliva combinations on the in vitro growth of oral pathogenic and commensal microorganisms. Sci. Rep. 8 , 15112 (2018).

Coburn, P. S. & Gilmore, M. S. The Enterococcus faecalis cytolysin: a novel toxin active against eukaryotic and prokaryotic cells. Cell. Microbiol. 5 , 661–669 (2003).

Bunesova, V., Lacroix, C. & Schwab, C. Fucosyllactose and l -fucose utilization of infant Bifidobacterium longum and Bifidobacterium kashiwanohense . BMC Microbiol. 16 , 248 (2016).

Ruiz-Moyano, S. et al. Variation in consumption of human milk oligosaccharides by infant gut-associated strains of Bifidobacterium breve . Appl. Environ. Microbiol. 79 , 6040–6049 (2013).

Sakanaka, M. et al. Varied pathways of infant gut-associated Bifidobacterium to assimilate human milk oligosaccharides: prevalence of the gene set and its correlation with bifidobacteria-rich microbiota formation. Nutrients 12 , 71 (2019).

Azad, M. B. et al. Impact of maternal intrapartum antibiotics, method of birth and breastfeeding on gut microbiota during the first year of life: a prospective cohort study. BJOG 123 , 983–993 (2016).

Tapiainen, T. et al. Impact of intrapartum and postnatal antibiotics on the gut microbiome and emergence of antimicrobial resistance in infants. Sci. Rep. 9 , 10635 (2019).

Nogacka, A. et al. Impact of intrapartum antimicrobial prophylaxis upon the intestinal microbiota and the prevalence of antibiotic resistance genes in vaginally delivered full-term neonates. Microbiome 5 , 93 (2017).

Li, W. et al. Vertical transmission of gut microbiome and antimicrobial resistance genes in infants exposed to antibiotics at birth. J. Infect. Dis. 224 , 1236–1246 (2021).

Bokulich, N. A. et al. Antibiotics, birth mode, and diet shape microbiome maturation during early life. Sci. Transl. Med. 8 , 343ra82 (2016).

Yassour, M. et al. Natural history of the infant gut microbiome and impact of antibiotic treatment on bacterial strain diversity and stability. Sci. Transl. Med. 8 , 343ra81 (2016).

Azad, M. B. et al. Human milk oligosaccharide concentrations are associated with multiple fixed and modifiable maternal characteristics, environmental factors, and feeding practices. J. Nutr. 148 , 1733–1742 (2018).

Ojima, M. N. et al. Priority effects shape the structure of infant-type Bifidobacterium communities on human milk oligosaccharides. ISME J. 16 , 2265–2279 (2022).

Lou, Y. C. et al. Infant microbiome cultivation and metagenomic analysis reveal Bifidobacterium 2′-fucosyllactose utilization can be facilitated by coexisting species. Nat. Commun. 14 , 7417 (2023).

Podlesny, D. & Fricke, W. F. Strain inheritance and neonatal gut microbiota development: a meta-analysis. Int. J. Med. Microbiol. 311 , 151483 (2021).

Olin, A. et al. Stereotypic immune system development in newborn children. Cell 174 , 1277–1292.e14 (2018).

Bethlehem, Ra. I. et al. Brain charts for the human lifespan. Nature 604 , 525–533 (2022).

Torow, N. & Hornef, M. W. The neonatal window of opportunity: setting the stage for life-long host–microbial interaction and immune homeostasis. J. Immunol. 198 , 557–563 (2017).

Beghetti, I. et al. Early-life gut microbiota and neurodevelopment in preterm infants: any role for Bifidobacterium ? Eur. J. Pediatr. 181 , 1773–1777 (2022).

Depner, M. et al. Maturation of the gut microbiome during the first year of life contributes to the protective farm effect on childhood asthma. Nat. Med. 26 , 1766–1775 (2020).

Fujimura, K. E. et al. Neonatal gut microbiota associates with childhood multisensitized atopy and T cell differentiation. Nat. Med. 22 , 1187–1191 (2016).

Alcazar, C. G.-M. et al. The association between early-life gut microbiota and childhood respiratory diseases: a systematic review. Lancet Microbe 3 , e867–e880 (2022).

Olm, M. R. et al. Robust variation in infant gut microbiome assembly across a spectrum of lifestyles. Science 376 , 1220–1223 (2022).

Browne, H. P., Shao, Y. & Lawley, T. D. Mother–infant transmission of human microbiota. Curr. Opin. Microbiol. 69 , 102173 (2022).

Feehily, C. et al. Detailed mapping of Bifidobacterium strain transmission from mother to infant via a dual culture-based and metagenomic approach. Nat. Commun. 14 , 3015 (2023).

Barratt, M. J. et al. Bifidobacterium infantis treatment promotes weight gain in Bangladeshi infants with severe acute malnutrition. Sci. Transl. Med. 14 , eabk1107 (2022).

Bailey, S. R. et al. A pilot study to understand feasibility and acceptability of stool and cord blood sample collection for a large-scale longitudinal birth cohort. BMC Pregnancy Childbirth 17 , 439 (2017).

Shen, W., Sipos, B. & Zhao, L. SeqKit2: a Swiss army knife for sequence and alignment processing. iMeta 3 , e191 (2024).

Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9 , 357–359 (2012).

Olm, M. R. et al. inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains. Nat. Biotechnol. 39 , 727–736 (2021).

McMurdie, P. J. & Holmes, S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 8 , e61217 (2013).

Blanco-Míguez, A. et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat. Biotechnol . 41 , 1633–1644 (2023).

Chklovski, A., Parks, D. H., Woodcroft, B. J. & Tyson, G. W. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat. Methods 20 , 1203–1212 (2023).

Ministry of Housing, Communities and Local Government. English indices of deprivation 2019 (GOV.UK, 2019).

Caesarean Birth NICE guideline [NG192] (NICE, 30 January 2024); https://www.nice.org.uk/guidance/ng192

Neonatal Infection: Antibiotics for Prevention and Treatment NICE guideline [NG195] (NICE, 19 March 2024); https://www.nice.org.uk/guidance/ng195

Widström, A., Brimdyr, K., Svensson, K., Cadwell, K. & Nissen, E. Skin‐to‐skin contact the first hour after birth, underlying implications and clinical practice. Acta Paediatr. 108 , 1192–1204 (2019).

Victora, C. G., Huttly, S. R., Fuchs, S. C. & Olinto, M. T. The role of conceptual frameworks in epidemiological analysis: a hierarchical approach. Int. J. Epidemiol. 26 , 224–227 (1997).

Arumugam, M. et al. Enterotypes of the human gut microbiome. Nature 473 , 174–180 (2011).

Holmes, I., Harris, K. & Quince, C. Dirichlet multinomial mixtures: generative models for microbial metagenomics. PLoS ONE 7 , e30126 (2012).

Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19 , 455–477 (2012).

Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27 , 824–834 (2017).

Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26 , 589–595 (2010).

Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31 , 1674–1676 (2015).

Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7 , e7359 (2019).

Wu, Y.-W., Tang, Y.-H., Tringe, S. G., Simmons, B. A. & Singer, S. W. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2 , 26 (2014).

Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11 , 1144–1146 (2014).

Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25 , 2078–2079 (2009).

Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25 , 1043–1055 (2015).

Uritskiy, G. V., DiRuggiero, J. & Taylor, J. MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6 , 158 (2018).

Parks, D. H. et al. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat. Biotechnol. 38 , 1079–1086 (2020).

Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics 38 , 5315–5316 (2022).

Nayfach, S., Shi, Z. J., Seshadri, R., Pollard, K. S. & Kyrpides, N. C. New insights from uncultivated genomes of the global human gut microbiome. Nature 568 , 505–510 (2019).

Shaffer, M. et al. DRAM for distilling microbial metabolism to automate the curation of microbiome function. Nucleic Acids Res. 48 , 8883–8900 (2020).

Duncan, S. H., Hold, G. L., Harmsen, H. J., Stewart, C. S. & Flint, H. J. Growth requirements and fermentation products of Fusobacterium prausnitzii , and a proposal to reclassify it as Faecalibacterium prausnitzii gen. nov., comb. nov. Int. J. Syst. Evol. Microbiol. 52 , 2141–2146 (2002).

Forster, S. C. et al. Identification of gut microbial species linked with disease variability in a widely used mouse model of colitis. Nat. Microbiol. 7 , 590–599 (2022).

Shao, Y. Bacterial genomes of the Baby Biome Study. Zenodo https://doi.org/10.5281/zenodo.12667210 (2024).

Download references

Acknowledgements

This work was funded by the Wellcome Trust and the Wellcome Sanger Institute (WT101169MA, 206194 and 220540/Z/20/A). For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. We thank the participating families for their time and contribution to Baby Biome Study; and the research midwives at recruiting hospitals for recruitment and clinical metadata collection.

Author information

Authors and affiliations.

Host–Microbiota Interactions Laboratory, Wellcome Sanger Institute, Hinxton, UK

Yan Shao, Simon Clare, Nicholas J. R. Dawson, Andre Mu, Anne Adoum, Katherine Harcourt, Junyan Liu, Hilary P. Browne, Mark D. Stares & Trevor D. Lawley

Institute for Global Health, University College London, London, UK

Cristina Garcia-Mauriño, Alison Rodger & Nigel Field

Birmingham Clinical Trials Unit, University of Birmingham, Birmingham, UK

Peter Brocklehurst

You can also search for this author in PubMed   Google Scholar

Contributions

Y.S. and T.D.L. conceived and designed the study. Y.S. coordinated the experiments and performed computational analyses with assistance from A.M. Y.S. and M.D.S. cultured bacteria strains and performed DNA extraction. S.C. performed germ-free mouse experiments with assistance from N.J.R.D., A.A., K.H., J.L. and H.P.B. A.R., P.B., N.F. and T.D.L. conceived and designed the Baby Biome Study and obtained funding. N.F., A.R. and P.B. managed participant recruitment and sample collection, and coordinated the clinical metadata collection. C.G.-M. curated the clinical metadata and undertook the clinical epidemiological analyses with N.F. Y.S. and T.D.L. wrote the manuscript with inputs from H.P.B., A.M., C.G.-M., A.R., P.B. and N.F.

Corresponding authors

Correspondence to Yan Shao or Trevor D. Lawley .

Ethics declarations

Competing interests.

T.D.L. is the co-founder and CSO of Microbiotica. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Microbiology thanks the anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended data fig. 1 overview of sampling in the baby biome study..

(a-b) Shotgun metagenomes of 2,387 faecal samples from 1,288 neonatal subjects across Phase 1 (Shao et al. 1 ) and Phase 2 (this paper). The majority of samples (80% or 1,904) are from the neonatal period (b) , primarily taken on day 4 (N=360), day 7 (N=1,149), and day 21 (N=350). (a) Rows represent subjects with paired maternal samples (for ‘maternal transmission’ analysis), longitudinal samples taken during the neonatal period (for ‘neonatal longitudinal’ analysis), and samples from the infancy period (for ‘infancy persistence analysis’). These relationships are indicated by lines linking the samples, with summarised proportions in (c) .

Extended Data Fig. 2 Consistency of NGM community state assignment across typing methods.

(a-d) Identification of three NGM community states using both (a) Partitioning Around Medoids (PAM) clustering of JSD, with statistical support from the Calinski-Harabasz (CH) index, and (c) Dirichlet Multinomial Mixture (DMM) modelling using the Laplace approximation. ( b, d ) PCoA plots, representing 1,904 neonatal gut metagenomes, are color-coded by community state assignments, and based on species-level Bray-Curtis distances. (e-g) PAM-based and DMM-based community state assignment concordance: (e ) Correlation between community state assignments shown with a Cramér's V correlation of 0.726. The proportions of community states assigned by each method are labelled. The breakdown of community states BB/EF/BL in PAM is 336/827/741, and in DMM is 252/1097/555. (f) Overlap in the dominant core species (≥1% mean abundance) in each community state, grouped at the genus level, with exceptions for the driver species B. breve , E. faecalis and B. longum . PAM-based assignment was chosen in downstream analyses given the higher relative abundances of these driver species in their respective community states (versus DMM-based assignment): B. breve 67.9% vs 56.5% (p<0.001), E. faecalis 21.7% vs 16.8% (p<0.001), and B. longum 27.25% vs 29.90% (p=0.24). Wilcoxon-test (two-sided) with FDR correction. ( g ) The top 10 driver species for each DMM-based community state are displayed, ranked by their assignment strength, as indicated on the y-axis.

Extended Data Fig. 3 Consistency of NGM community state assignment across neonatal time points.

Identification of three NGM community states using PAM-based clustering across three major time points in the neonatal period (day 4, N=360; day 7, N=1,149; day 21, N=350). PCoA plots, are color-coded by community state assignments and based on species-level JSD. Ellipses encapsulate 67% of the samples within each respective cluster.

Extended Data Fig. 4 Abundance and co-occurrence of the NGM community state driver species.

PCoA plots depicted in Fig. 1 , with arrows, illustrate the scale and direction of core NGM species (>1% mean abundance) driving the formation of NGM community states (clusters). The length of the arrows is scaled to reflect the degree of contribution to the variation in NGM composition, with the arrow points towards increasing species abundance. Species that frequently co-occur with the NGM driver species within their respective community states share the same arrow direction.

Extended Data Fig. 5 Validation of NGM community states and driver species across geographies and lifestyles.

All three NGM community states and the driver species ( B. breve, B. longum or B. infantis, and E. faecalis ) were independently detected in infant gut metagenomic cohorts (0–6 months) from diverse geographical regions and lifestyles. These include Europe (Sweden, days 4–12, N=37), the United States (TEDDY cohort, months 2–6, N=69), the Middle East (Israel, weeks 1–24, N=60), and South Asia (Bangladesh, months 0–2, N=234). In the Bangladeshi cohort, which is a non-industrialised and non-urban population, the B. infantis and E. coli -driven clusters are representative of the B. longum (closely related to B. infantis ) and E. faecalis (also facultative anaerobe opportunistic pathogen) community states, respectively. The analysis and visualization methods are consistent with those described in Fig. 1a, b .

Extended Data Fig. 6 Strain-level dynamics and stability across NGM species.

(a) Frequency of study participants detected with the same strains (in grey, otherwise in white) from their mother's faecal samples across NGM community states. To delineate transmission trends, the chart is categorized by birth mode and the three NGM driver species. Frequency of strain-sharing event (for example, maternal transmission in mother-baby pair or strain persistence within-individual longitudinal samples) is presented as raw counts of detectable strain sharing events normalized by the total number of subjects per birth mode and NGM community state (week 1). (b) Bar plots counting strain-sharing events across three settings: (Left) Maternal transmission in mother-infant dyads (183 subjects; 167 transmissions from 213 evaluated species-sample pairs). (Middle) Neonatal persistence via neonatal longitudinal sampling (359 subjects; 700 transmissions from 938 evaluated pairs). (Right) Infancy persistence from neonatal into infancy period (302 subjects; 464 transmissions from 920 evaluated pairs). When longitudinal samples were considered, strain sharing events were considered only once per subject per setting, using the time point with highest counts. Only species with ≥20 strain-sharing events detected across three settings are shown. Three community state driver species are highlighted in boxes. Transmission patterns often align with phylogeny: Actinomycetota/Actinobacteria (pink) and Bacteroidota/Bacteroidetes (green) typically transmit maternally during vaginal birth and persist into infancy. Conversely, Bacillota/Firmicutes (purple) and Pseudomonadota/Proteobacteria (orange) show lower maternal transmission rates and reduced neonatal persistence. Notable outliers include E. coli and B. breve . The size of bubbles represents the transmissibility of each species, which is its ratio of detected to potential strain-sharing events, as determined by StrainPhlAn4. Only subject pairs with sequencing depth sufficient for StrainPhlAn strain-level analyses are displayed; data points not shown are non-evaluable.

Extended Data Fig. 7 Colonisation dynamics in neonatal longitudinal samples.

(a) Overview of NGM community states of all subjects individually sampled on major neonatal period sampling points day 4, 7 or 21, stratified by birth mode. In VD, N=176/602/156 on day 4, 7, and 21, respectively; In CS, N=184/547/194 on day 4, 7, and 21, respectively. Total samples N=1859. (b) Longitudinal shifts in NGM community states and the levels of driver species from week 1 to week 3, based on subjects longitudinally sampled across days 4, 7, and 21, N=234; VD, N=111; CS, N=123). Community states that remained consistent from first (day 4) to the final neonatal longitudinal sampling (day 7 or/and 21), is depicted as a percentage of their starting pool size (labelled in black). Subjects that began with either BB or BL community state on day 4 were significantly more likely to remain in the same community state on day 7 and 21, compared to those that began with EF (pairwise chi-squared tests with FDR correction, q-values < 0.01). However, this trend was not observed as early as day 7 (global chi-squared test, p=0.7043). The colour scheme represents the community states or driver species: BB and B. breve in green; EF and E. faecalis in purple; BL and B. longum in orange. Statistical differences in species abundance between longitudinal samples was determined using paired ANOVA test (two-sided) with FDR correction. Boxplot center line and red point indicate the median and mean, respectively; box limits indicate the upper and lower quartiles; and whiskers indicate 1.5× the interquartile range.

Extended Data Fig. 8 Species-driven functional divergence in NGM community states.

(a-b) Principal Component Analysis (PCA) of community state driver enterotype species genomes. Groupings are based on the presence of genes tied to the full metabolic repertoire using (a) KEGG orthologs (KOfams) and carbon metabolism via (b) Carbohydrate-Active enZYmes (CAZymes). Each dot denotes an individual strain: B. longum (BL, N=342) in orange, B. breve (BB, N=267) in green, and E. faecalis (EF, N=507) in blue. Ellipses encapsulate the 95% confidence intervals. Arrows showcase the contribution of select CAZy genes to principal components (details in Extended Data Fig. 8 ). CAZy genes for human milk oligosaccharides (HMOs) utilisation are highlighted in red.

Extended Data Fig. 9 Carbon metabolism of NGM community state driver species.

(a) A heatmap displays the clustering of carbohydrate-active enzymes (CAZymes) across the genomes of three driver species. Genes are coloured based on their corresponding carbohydrate substrate categories. (b-d) Volcano plots depict differentially enriched CAZymes in each driver species, comparing (b) BB vs. EF, (c) BL vs. EF, and (d) BB vs. BL. The effect size represents the difference in the proportion of genes between species. P-values are adjusted using Fisher's exact test (two-sided) with FDR correction. Genes related to HMO metabolism are marked in red. Significantly enriched genes are labelled for clarity. Arrows at the top indicate the direction of species enrichment in each comparison. B. longum (BL, N=342) is shown in orange, B. breve (BB, N=267) in green, and E. faecalis (EF, N=507) in blue.

Supplementary information

Reporting summary, supplementary data.

Table of contents tab. Supplementary Table 1. Sample_accession. ENA accessions of the 2,387 samples of entire BBS cohort analysed in this manuscript: BBS1 (N = 1,679) and BBS2 (N = 708). 2. Neonatal_subject. Clinical metadata of the BBS neonates included in the statistical analysis (N = 1,108), cohort characteristics described in Extended Data Table 1. 3. Neonatal_sample. Sample metadata (age/day and community states) of the neonatal samples included in the analyses. N = 1,904. 4. Epi_metadata_summary. Descriptive table of the BBS neonatal population with available metadata (N = 1,108/1,288, 90%). 5. Epi_result_neonatal_state. Clinical and sociodemographic variables associated with the acquisition of NGM community state measured in the first week of life (N = 1,108). 6. Epi_result_neonatal_switch. Clinical and sociodemographic variables associated with NGM community state switching between week 1 and week 3 (N = 306). 7. Genome_accession. Sample accessons of the species genomes generated from the BBS samples and analysed in the functional analyses.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Shao, Y., Garcia-Mauriño, C., Clare, S. et al. Primary succession of Bifidobacteria drives pathogen resistance in neonatal microbiota assembly. Nat Microbiol (2024). https://doi.org/10.1038/s41564-024-01804-9

Download citation

Received : 22 April 2024

Accepted : 05 August 2024

Published : 06 September 2024

DOI : https://doi.org/10.1038/s41564-024-01804-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

primary research articles

  • Open access
  • Published: 06 September 2024

Comprehensive analysis of POLH-AS1 as a prognostic biomarker in hepatocellular carcinoma

  • Yan Dong 1   na1 ,
  • Xinyi Chen 2   na1 ,
  • Shen Yang 3   na1 ,
  • Yilong Fu 3 ,
  • Liangyu Wang 3 ,
  • Xueping Gao 4 ,
  • Di Chen 5 &
  • Lixia Xu 3  

BMC Cancer volume  24 , Article number:  1112 ( 2024 ) Cite this article

1 Altmetric

Metrics details

Hepatocellular carcinoma (HCC), a prevalent primary malignant tumor, is notorious for its high mortality rate. Despite advancements in HCC treatment, patient outcomes remain suboptimal. This study endeavors to assess the potential prognostic significance of POLH-AS1 in HCC.

In this research, we gathered RNA-Seq information from individuals with HCC in The Cancer Genome Atlas (TCGA). We analyzed the levels of POLH-AS1 expression in both HCC cells and tissues using statistical tests. Additionally, we examined various prognostic factors in HCC using advanced methodologies. Furthermore, we employed Spearman’s rank correlation analysis to examine the association between POLH-AS1 expression and the tumor’s immune microenvironment. Finally, the functional roles of POLH-AS1 in HCC were validated in two HCC cell lines (HEP3B and HEPG2).

Our analysis revealed elevated POLH-AS1 expression across various cancers, including HCC, with heightened expression correlating with HCC progression. Notably, POLH-AS1 expression emerged as a potential biomarker for HCC patient survival and prognosis. Mechanistically, we identified the involvement of POLH-AS1 in tumorigenesis pathways such as herpes simplex virus 1 infection, interactions with neuroactive receptors, and the cAMP signaling pathway. Lastly, inhibition of POLH-AS1 was discovered to hinder the proliferation, invasion and migration of HEP3B and HEPG2 HCC cells.

Conclusions

POLH-AS1 emerges as a promising prognostic biomarker and therapeutic target for HCC, offering potential avenues for enhanced patient management and treatment strategies.

Peer Review reports

Hepatocellular carcinoma (HCC) is a significant malignant tumor originating from liver cells and ranks as one of the most lethal cancers globally [ 1 , 2 ]. In 2020, approximately 906,000 individuals were diagnosed with HCC worldwide, making it the third leading cause of cancer-related deaths. The prognosis remains grim, with a five-year relative survival rate of around 18% [ 3 ]. Currently, early surgical resection remains the foremost therapeutic strategy, aiming to curtail mortality rates associated with HCC [ 4 ]. Despite advancements in medical science, the anticipated therapeutic efficacy for patients with HCC has not yet achieved optimal levels. The challenges are attributed to the highly invasive, heterogeneous, and drug-resistant nature of HCC, resulting in a poor prognosis [ 4 , 5 , 6 ]. Understanding the molecular mechanisms underlying HCC initiation and progression is crucial for improving clinical interventions and patient outcomes. Therefore, a deeper exploration of these molecular processes is essential to develop more effective treatment strategies and enhance survival rates in HCC patients.

Long non-coding RNAs (lncRNAs) have emerged as critical players in cancer biology, influencing six key characteristics: cell growth, motility, immortality, angiogenesis, and survival [ 7 ]. Various studies have highlighted the dualistic role of lncRNAs, acting as either tumor promoters or suppressors within the tumor microenvironment [ 8 ]. Their significant role in cancer-related pathways suggests their potential as biomarkers for tumor diagnosis, treatment, and prognosis [ 9 ]. Consequently, the exploration of lncRNAs as biomarkers has become a prominent area of research in oncology.

POLH antisense RNA1 (POLH-AS1) is a long non-coding RNA derived from the reverse transcription of the POLH gene. Recent studies have illuminated its pivotal role as a master regulator in HCC, significantly impacting patient prognosis by modulating various emerging cell death pathways, including necroptosis, ferroptosis, and cuproptosis [ 10 , 11 , 12 ]. Furthermore, a set of necrosis-associated lncRNAs, including POLH-AS1, has been proposed to guide the prognosis of HCC and inform immunotherapeutic approaches [ 13 ]. However, the precise regulatory mechanisms through which POLH-AS1 affects HCC progression remain largely unexplored.

In this study, we spotlight a promising lncRNA, POLH-AS1, demonstrating its potential to forecast prognosis and guide the choice of immunotherapy for HCC patients. Furthermore, we tentatively examined the expression levels of POLH-AS1 across HCC cell lines and human normal liver cell lines. Finally, we delved into the functional relevance of POLH-AS1 in HCC progression, unveiling that its inhibition resulted in attenuated cell proliferation, migration, and invasion. In aggregate, our findings underscore POLH-AS1 as a noteworthy prognostic biomarker and a viable target for tailored HCC treatment.

Data acquisition and processing

Data on hepatocellular carcinoma and transcriptome data obtained from RNA sequencing were extracted from The Cancer Genome Atlas (TCGA) database. 374 HCC samples and 50 samples of normal tissue were included in the research after individuals with missing clinical data were eliminated. Following this, research was carried out to explore the relationship between the levels of POLH-AS1 expression and the survival rate of individuals diagnosed with HCC.

Tumor samples collection

Between Oct 2021 and Mar 2022, 8 HCC tissue samples and 8 normal liver tissues were gathered from patients at the First Affiliated Hospital of Zhengzhou University, Henan, China. The clinicopathological characteristics of the patients with HCC were summarized in supplementary table S1 . HCC tissue samples were stored with liquid nitrogen after resection, and mRNA expression levels were evaluated using quantitative reverse transcription polymerase chain reaction (RT-qPCR).

Prognostic model development and evaluation

Univariate and multivariate Cox regression analyses were carried out to evaluate the potential of POLH-AS1 as an independent prognostic indicator at a significance level of p  < 0.05. The analyses incorporated clinicopathological variables such as age, gender, histologic grade, histologic type, pathologic stage and alpha-fetoprotein (AFP). Time-dependent receiver operating characteristic (ROC) curve analysis was performed using the survivor ROC software [ 14 ]. Further, we developed the nomogram for the prediction of clinical outcomes for HCC patients [ 15 ].

Functional enrichment analysis

Functional enrichment analysis was performed as described in our previous study [ 16 , 17 ]. All HCC samples in the TCGA dataset were categorized into high and low expression groups based on the median expression level of POLH-AS1 as the cutoff value. The ‘edgeR’ software was utilized to detect differentially expressed genes (DEGs) in HCC tissue with low and high POLH-AS1 expression levels, meeting the adjusted criteria ( p  < 0.05 and |log2fold-change (FC)| > 1). Next, the DEGs underwent Gene Ontology (GO) analyses to identify the most significantly enriched biological functions. To determine the enriched signaling pathways, the DEGs underwent Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis [ 16 ].

Immune infiltration and immune checkpoint analysis

Immune infiltration and immune checkpoint analysis were conducted as described previously [ 16 , 17 ]. The Gene Set Enrichment Analysis (GSEA) was utilized to evaluate the immune infiltration cells linked to POLH-AS1. A study on immunity revealed marker genes for 24 distinct types of immune cells [ 18 ]. Using Spearman’s rank correlation, the relationships between POLH-AS1 and these 24 cell types were investigated [ 17 ]. Subsequent analysis of the relationship between POLH-AS1 and immunological checkpoints produced a statistically significant result ( p  < 0.05).

Cell culture

For the cell culture, we used a previously published protocol [ 16 , 17 ]. The Chinese Academy of Sciences (Shanghai, China) provided the HEP3B and HEPG2 human HCC cell lines, along with normal human liver cells (NCs). These cells were maintained in RPMI-1640 medium supplemented with 10% fetal bovine serum (FBS) and 2 mM L-glutamine at 37 °C in an incubator with 5% CO 2 .

RNA extraction and RT-qPCR

The RNA extraction and RT-qPCR were administered as described previously [ 16 , 17 ]. RNA extraction from the specified cell lines and tissue samples was conducted using the RNA easy mini kit (QIAGEN, USA). GAPDH mRNA levels were utilized for data normalization, and the 2^(-ΔΔCT) method was employed for outcome quantification. The primer sequences are shown in table S2 .

Cell transfection

Small interfering RNA (siRNA) plasmid of POLH-AS1 and the negative control (si-NC) were packed from GenePharma (Shanghai, China). The full length of POLH-AS1 synthesized by GenePharma was subcloned into a lentivirus vector. HEP3B and HEPG2 cells were transfected using Lipofectamine ® 3000 (Invitrogen, USA) according to our previously published article and the manufacturer’s protocol [ 16 ]. The sequences of siRNA are shown in table S3 .

Cell counting kit-8 (CCK-8) assay

CCK-8 assay was performed to detect cell proliferation as previously reported [ 16 , 19 ]. In 96-well plates, the HCC cells were planted at a density of 3 × 10 3 cells per well. Next, 10 µL of CCK-8 solution was applied to the medium at 0, 24, 48, and 72 h, and it was then incubated for two hours. At 450 nm in wavelength, the optical density (OD) was measured with a SpectraMax i3x device. Then, a proliferation curve was produced using the absorbance values that were discovered at the 72-hour mark.

Transwell migration and invasion assays

Transwell assays were used to evaluate the migratory and invasive potential of HCC cells, as previously described [ 16 , 19 ]. The bottom chambers of the migration experiment were filled with 500 µL of culture media that contained 10% FBS. In the top chamber, 3 × 10 4 HCC cells were seeded per well using 250 µL of serum-free media. Swabs were used to extract the cells from the top compartment after 48 h. Using an Olympus microscope (Tokyo, Japan), the remaining cells were preserved with 95% ethanol, dyed with a 0.5% crystal violet solution, photographed, and the number of moving cells was counted. Before cell seeding, the filter for the invasion experiment was pre-coated with Matrigel (BD Biosciences, San Jose, CA, USA). The next steps were the same as for the migration test.

Wound healing assay

An assay of wound healing was conducted based on previous studies [ 16 ]. In 6-well plates, cells were seeded and cultured until reaching confluence. A wound was then made in the center of the plate using pipette tips, followed by a switch to a serum-free medium. After 48 h, images were captured, and the closure of the wound was subsequently assessed.

Statistical analysis

Statistical data analysis was conducted using R software version 3.6.3. A comparison of the variations in POLH-AS1 levels among the two groups was performed utilizing Fisher’s exact test, Mann-Whitney test, and Chi-square test. The Wilcox or Kruskal test was utilized to evaluate the relationship between POLH-AS1 levels in patients with HCC and their clinical information. The Kaplan-Meier technique was used to analyze survival. p  < 0.05 was deemed as the statistically significant threshold.

Features of HCC patients

The TCGA databases provided 374 RNAseq data sets of HCC patients with clinical resources including their age, histologic grade, histological type, pathologic stage, AFP, and vascular invasion. Clinical data is displayed in Table  1 .

The high expression level of POLH-AS1 in HCC tissues

Initially, we examined the expression levels of POLH-AS1 in various tumor tissues, utilizing data from TCGA. The analysis revealed that POLH-AS1 expression was elevated in several malignancies, with HCC showing particularly high levels compared to corresponding normal tissues (Fig.  1 A). As shown in Fig.  1 B and C, the RNA levels of POLH-AS1 in HCC tissues were consistently higher than those in normal liver tissues. Furthermore, we performed RT-qPCR on POLH-AS1 levels in eight individual HCC cases and their corresponding normal liver tissues. The qPCR results confirmed that POLH-AS1 expression was significantly higher in HCC tissues than in the matched normal tissues (Fig.  1 D).

figure 1

The expression level of POLH-AS1 in different types of tumors. ( A ) The expression of POLH-AS1 between normal tissues and cancer samples in TCGA database. ( B ) The expression of POLH-AS1 in normal tissues and HCC tissues in TCGA database. ( C ) The expression of POLH-AS1 in 50 pairs of HCC tissues and non-cancerous adjacent tissues in TCGA database. ( D ) The expression of POLH-AS1 was assessed in 8 HCC tissues and 8 normal liver tissues by RT‑qPCR assay. * p  < 0.05, ** p  < 0.01, *** p  < 0.001

The expression of POLH-AS1 correlates with clinicopathological characteristics of HCC

Utilizing data from the TCGA cohort, we explored the association between POLH-AS1 levels and various clinical factors, including histologic grade, histologic type, pathological stage, and AFP concentration. The analysis revealed that POLH-AS1 expression was significantly elevated in G3&G4 HCC compared to G1&G2 HCC (Fig.  2 A). Additionally, POLH-AS1 expression was higher in hepatocholangio carcinoma (mixed type) than in fibrolamellar carcinoma and hepatocellular carcinoma (Fig.  2 B). Furthermore, elevated expression of POLH-AS1 was observed in Stage III&IV compared to Stage I&II (Fig.  2 C). A positive correlation between POLH-AS1 expression and AFP concentration was also established, with higher POLH-AS1 expression observed in patients with elevated serum AFP levels (Fig.  2 D).

figure 2

The correlations between POLH-AS1 expression and clinicopathological characteristics of HCC. ( A )-( D ) The relationship between the POLH-AS1 expression and the histological grade ( A ), histological type ( B ), pathological stage ( C ), AFP ( D ). ( E ) ROC analysis of POLH-AS1 expression shows promising discrimination power between normal samples and HCC tissues. ( F ) Time-dependent ROC curves and AUC values for 1-year, 3-year, and 5-year OS prediction. * p  < 0.05, *** p  < 0.001

Moreover, ROC curve analysis was performed to evaluate the diagnostic potential of POLH-AS1 in HCC patients. The findings demonstrated that POLH-AS1 possesses significant diagnostic value for HCC, as evidenced by an AUC value of 0.864 (95% CI = 0.823–0.906) (Fig.  2 E). Additionally, time-dependent ROC curve analysis revealed AUC values of 0.677, 0.624, and 0.608 for predicting 1-year, 3-year, and 5-year survival rates in HCC patients, respectively (Fig.  2 F), indicating that POLH-AS1 serves as a reliable prognostic marker for HCC survival.

High POLH-AS1 expression indicated poor prognosis in HCC patients

The prognostic value of POLH-AS1 in HCC was assessed using Kaplan-Meier survival analysis with RNA-seq data from TCGA. The findings revealed a significant inverse correlation between POLH-AS1 expression levels and overall survival ( p  = 0.005), disease-specific survival (DSS, p  = 0.011), and progression-free interval (PFI, p  = 0.015) (Fig.  3 A-C). Further univariate and multivariate analyses confirmed that POLH-AS1 upregulation is an independent prognostic factor in HCC (Fig. S1 A-B and table S4 ).

figure 3

The relationship between POLH-AS1 and the survival of patients with HCC. ( A )-( C ) K-M survival analysis showing the effect of POLH-AS1 expression level on OS ( A ), DSS ( B ), and PFI ( C ) in patients with HCC in TCGA cohort. ( D ) Nomogram for predicting the probability of 1-, 3-, and 5-year OS for patients with HCC. ( E ) The calibration curves showing the concordance between the prediction by nomogram and actual survival

A nomogram model incorporating clinical characteristics was developed to predict overall survival rates for HCC patients at 1, 3, and 5 years (Fig.  3 D). Calibration curves were then used to evaluate the accuracy of the nomogram’s predictions, showing a strong concordance between the predicted survival rates and actual outcomes (Fig.  3 E).

Identification of DEGs and functional enrichment analysis

To explore the potential mechanisms of POLH-AS1 in HCC, we categorized HCC patients into high- and low-POLH-AS1 expression groups based on the median expression level of POLH-AS1. We identified 2,183 upregulated and 682 downregulated genes, applying the criteria of |log2FC| > 1 and adjusted p  < 0.05 (Fig.  4 A). The relative expression levels of the top 30 DEGs between the two groups are depicted in Fig.  4 B.

figure 4

Identification of DEGs and functional enrichment analysis of POLH-AS1 in HCC. ( A ) Volcano plot of differentially expressed genes. Red and green indicated up-regulated and down-regulated genes, respectively (|log2 fold change (FC)| > 1 and p  < 0.05). ( B ) Heatmap showing the top 30 co-expressed differential genes in the POLH-AS1 low and high expression groups. ( C ) The bubble plot showing the GO functional enrichment analysis results (BP, biological process; CC, cellular component; MF, molecular function). ( D ) The bubble plot showing the results of KEGG enrichment analysis

To assess the functional significance of these DEGs in HCC, we conducted KEGG and GO enrichment analyses. GO analysis highlighted significant enrichment in processes such as organelle fission, ion channel complex formation, and passive transmembrane transporter activity (Fig.  4 C). KEGG analysis revealed enrichment in pathways linked to carcinogenesis, including herpes simplex virus 1 infection, neuroactive ligand-receptor interaction, cAMP signaling pathway, and proteoglycans in cancer (Fig.  4 D). These findings strongly suggest the involvement of these DEGs in the development and progression of HCC.

Examination of the relationship between POLH-AS1 and immune infiltration in HCC

Immune infiltration plays a crucial role in HCC progression and provides valuable insights for potential immunotherapies [ 20 ]. Utilizing the ssGSEA technique, we explored the correlation between POLH-AS1 levels and the presence of 24 unique immune cell populations within HCC. The results demonstrated that POLH-AS1 expression was significantly positively correlated with Th2 cells ( R  = 0.276, p  < 0.001) and T helper cells ( R  = 0.192, p  < 0.001). Conversely, POLH-AS1 expression was negatively associated with DCs ( R  = − 0.383, p  < 0.001), neutrophils ( R = -0.300, p  < 0.001), pDCs ( R  = − 0.286, p  < 0.001), and cytotoxic cells ( R = -0.285, p  < 0.001) (Fig.  5 A-B and Fig. S2 A-F). Additionally, we explored the relationship between POLH-AS1 and immune checkpoints. The analysis revealed a significant positive correlation between POLH-AS1 expression and several immune checkpoint molecules, including CD276 ( R  = 0.4, p  < 0.001), TNFSF4 ( R  = 0.37, p  < 0.001), TNFSF15 ( R  = 0.3, p  < 0.001), and NRP1 ( R  = 0.27, p  < 0.001) (Fig.  5 C and Fig. S2 G). Collectively, these findings suggested that POLH-AS1 expression is closely related to the immune microenvironment in HCC, potentially influencing tumor immune cell infiltration and the expression of immune checkpoints, which could hold significant implications for immunotherapy strategies in HCC.

figure 5

The association between POLH-AS1 expression and immune infiltration in HCC. ( A ) The infiltrating levels of 24 subtypes immune cells in high and low POLH-AS1 expression groups. ( B ) The correlation between the 24 subtypes immune cells and POLH-AS1 expression level. ( C ) The correlation between POLH-AS1 and immune checkpoint genes. * p  < 0.05, ** p  < 0.01, *** p  < 0.001

Inhibition of POLH-AS1 impeded cell proliferation in HCC

To further investigate the role of POLH-AS1 in the initiation and progression of HCC, we examined its expression in HEP3B and HEPG2 cell lines. RT-qPCR analysis revealed a significantly elevated expression of POLH-AS1 in both HEP3B and HEPG2 cell lines compared to the negative control (NC) cells (Fig.  6 A). Subsequently, POLH-AS1 expression in HEP3B and HEPG2 cells was downregulated using small interfering RNA (siRNA), effectively silencing POLH-AS1, as confirmed by RT-qPCR analysis (Fig.  6 B-C). The results of CCK8 assays demonstrated that knockdown of POLH-AS1 significantly inhibited the proliferation of HEP3B and HEPG2 cells (Fig.  6 D-E), whereas overexpression of POLH-AS1 markedly promoted these cellular behaviors (Fig. S3 A-C).

figure 6

Inhibition of POLH-AS1 impeded cell proliferation in HCC. ( A ) RT-qPCR analysis showing the expression of POLH-AS1 in two HCC cell lines (HEP3B and HEPG2) and a normal liver cell (NC). ( B )-( C ) RT-qPCR analysis showing the efficiency of si-POLH-AS1 transfection in HEP3B and HEPG2 cells. ( D )-( E ) CCK8 assays showing proliferation of HEP3B ( D ) and HEPG2 ( E ) cells transfected with control (si-NC) or si-POLH-AS1. Data are presented as the mean ± SDs. *** p  < 0.001

Suppression of POLH-AS1 hindered migration and invasion of HCC

Our study extended beyond the effects of POLH-AS1 on HCC cell growth to examine its role in cell migration and invasion. Transwell assays demonstrated that knockdown of POLH-AS1 significantly impaired the migration and invasion capabilities of HEP3B and HEPG2 cells (Fig.  7 A-D), while overexpression of POLH-AS1 markedly enhanced these cellular behaviors (Fig. S3 D-G). Additionally, wound healing assays revealed that knockdown of POLH-AS1 significantly reduced the wound closure rate in HCC cells (Fig.  7 E-H). Collectively, these findings provide compelling evidence that suppression of POLH-AS1 hinders the migration and invasion of HCC cells, suggesting that targeting POLH-AS1 may offer therapeutic benefits for HCC.

figure 7

Knockdown of POLH-AS1 inhibited cell migration and invasion in HCC. ( A )-( D ) Transwell migration and invasion assays showing the migratory and invasive ability of POLH-AS1-deficient HEP3B ( A, B ) and HEPG2 ( C, D ) cells. Scales bar, 100 µM. The data are the means ± SDs. ( E )-( H ) Wound healing migration assays showing HEP3B ( E, F ) and HEPG2 ( G, H ) cell migration of control cells compared to POLH-AS1-depleted cells. Scales bar, 100 µM. Data are presented as the mean ± SDs. *** p  < 0.001

HCC is a common yet highly aggressive malignant tumor, often progressing silently and resulting in a grim prognosis for patients. Despite the availability of various treatment modalities, including radiation, chemotherapy, and surgery, each approach has its inherent limitations. Although multiple therapeutic options exist for HCC, the overall prognosis remains poor, with a 5-year survival rate of just 18% [ 21 ]. The urgent need for novel therapeutic strategies is underscored by the unfavorable prognosis, emerging drug resistance, and significant side effects associated with current treatments.

Recent research has revealed a significant correlation between altered lncRNA expression levels and poor prognosis in HCC, underscoring the potential of these biomarkers in predicting both diagnosis and prognosis [ 11 ]. These pioneering discoveries offer renewed hope for advanced HCC patients by opening new avenues for treatment. LncRNAs are crucial regulatory elements that influence cancer aggressiveness by modulating its progression, particularly in HCC. The depletion of oncogenic lncRNAs has been shown to induce apoptotic cell death and cause cell cycle arrest in HCC [ 22 ], whereas their overexpression substantially increases the proliferation of cancer cells [ 23 ]. For instance, researchers exploring immunotherapy for HCC have identified several lncRNAs that can predict patient prognosis [ 24 , 25 ]. From a selection of potentially significant lncRNAs, we have focused on POLH-AS1 to investigate its relationship with HCC and to assess its potential utility in predicting outcomes and guiding treatment strategies for HCC patients.

In our research, we found that POLH-AS1 was highly expressed in HCC and that this elevated expression was associated with more advanced clinicopathological features. Further investigation into the relationship between POLH-AS1 expression and the prognosis of HCC patients revealed that increased POLH-AS1 expression may be linked to poorer outcomes. Additionally, the results from univariate and multivariate analyses, ROC curve analysis, and Kaplan-Meier survival analysis all support the notion that POLH-AS1 can serve as an independent prognostic marker in HCC. Finally, we explored the functional relevance of POLH-AS1 in HCC progression, unveiling that its inhibition resulted in attenuated cell proliferation, migration, and invasion. In aggregate, our finding suggested that POLH-AS1 might be used as a potential prognostic factor that affected the prognosis of patients with HCC. However, the mechanism by which POLH-AS1 leads to poor prognosis of HCC is unclear and needs further investigation.

To explore the role of POLH-AS1 in the malignant progression of HCC, we performed a GSEA using RNA-Seq data from TCGA. GSEA is widely used to reliably uncover potential molecular mechanisms underlying specific genes involved in disease pathology [ 26 ]. Our analysis revealed significant enrichment of POLH-AS1 in pathways related to herpes simplex virus 1 infection, neuroactive ligand-receptor interaction, and cAMP signaling, all of which are known to impact the prognosis and treatment of HCC patients. Emerging evidence suggests that these pathways are critically involved in HCC tumorigenesis and progression [ 27 , 28 , 29 ]. For example, Lam et al. identified an efficient and safe herpes simplex virus type 1 amplicon vector for transcriptionally targeted therapy in human hepatocellular carcinomas [ 28 ]. Similarly, neuroactive ligand-receptor interaction has been shown to play a pivotal role in HCC cell proliferation and invasion [ 29 ]. Moreover, vasoactive intestinal peptide was found to induce apoptosis in hepatocellular carcinoma by inhibiting the cAMP/Bcl-xL signaling pathway [ 27 ]. These findings collectively indicate that POLH-AS1 may influence the prognosis of HCC through its involvement in these cancer-related signaling pathways.

A growing body of research has highlighted that immune infiltration, a key component of the tumor microenvironment, plays a crucial role in oncogenesis and tumor progression, as well as influencing the response to immunotherapy [ 30 ]. However, no studies had previously reported a correlation between POLH-AS1 and immune infiltration in HCC. In our study, we identified a negative association between POLH-AS1 expression and dendritic cells (DCs) in HCC. Previous research has shown that local ablation of hepatocellular carcinoma can activate dendritic cells, thereby inducing sustained anti-tumor immune responses and ultimately reducing tumor progression and recurrence [ 31 , 32 ]. This suggests that POLH-AS1 may promote HCC progression by impairing DC function. Moreover, we explored the relationship between POLH-AS1 expression and immune checkpoints, including CD276, TNFSF4, TNFSF15, and NRP1, discovering a significant positive co-expression correlation between POLH-AS1 and these immune checkpoints. Immune checkpoints are known as a class of immunosuppressive molecules that enhance the immune response against HCC [ 30 ]. These findings suggest that POLH-AS1 plays a role in the tumor immune microenvironment primarily by regulating DCs function and immune checkpoints, and that POLH-AS1 may influence patient prognosis by modulating the immune microenvironment in HCC.

Nevertheless, our study has certain limitations. Firstly, the sample data were exclusively sourced from TCGA databases, with no clinical information from external cohorts to validate the findings. Additionally, the molecular mechanisms through which POLH-AS1 affects HCC growth, migration, and invasion remain inadequately elucidated. Further investigation into the regulatory mechanisms of POLH-AS1 will be pursued both in vivo and in vitro.

Our investigation showcased the potential of POLH-AS1 as both a prognostic determinant and a viable target for therapeutic intervention in HCC patients. A deeper comprehension of its impact on cell growth regulation could pave the way for clinical innovations aimed at enhancing the prognostic outlook for individuals with HCC.

Data availability

Data information from this research is available in the TCGA repositories (http://cancergenome.nih.gov) and UCSC Xena (http://xenabrowser.net/datapages/) platform.

Ma H, Kang Z, Foo TK, Shen Z, Xia B. Disrupted BRCA1-PALB2 interaction induces tumor immunosuppression and T-lymphocyte infiltration in HCC through cGAS-STING pathway. Hepatology. 2023;77(1):33–47.

PubMed   Google Scholar  

Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer statistics 2020: GLOBOCAN estimates of incidence and Mortality Worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.

Article   PubMed   Google Scholar  

Vogel A, Meyer T, Sapisochin G, Salem R, Saborowski A. Hepatocellular carcinoma. Lancet. 2022;400(10360):1345–62.

Article   CAS   PubMed   Google Scholar  

Llovet JM, Kelley RK, Villanueva A, Singal AG, Pikarsky E, Roayaie S, Lencioni R, Koike K, Zucman-Rossi J, Finn RS. Hepatocellular carcinoma. Nat Rev Dis Primers. 2021;7(1):6.

Craig AJ, von Felden J, Garcia-Lezana T, Sarcognato S, Villanueva A. Tumour evolution in hepatocellular carcinoma. Nat Rev Gastroenterol Hepatol. 2020;17(3):139–52.

Xu LX, He MH, Dai ZH, Yu J, Wang JG, Li XC, Jiang BB, Ke ZF, Su TH, Peng ZW, et al. Genomic and transcriptional heterogeneity of multifocal hepatocellular carcinoma. Ann Oncol. 2019;30(6):990–7.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Schmitt AM, Chang HY. Long noncoding RNAs in Cancer pathways. Cancer Cell. 2016;29(4):452–63.

Zhang Y, Dong X, Guo X, Li C, Fan Y, Liu P, Yuan D, Ma X, Wang J, Zheng J, et al. LncRNA-BC069792 suppresses tumor progression by targeting KCNQ4 in breast cancer. Mol Cancer. 2023;22(1):41.

Article   PubMed   PubMed Central   Google Scholar  

Pan Y, Zhang Q, Zhang H, Kong F. Prognostic and immune microenvironment analysis of cuproptosis-related LncRNAs in breast cancer. Funct Integr Genomics. 2023;23(1):38.

Wang W, Wang L, Song C, Mu T, Hu J, Feng H. Prognostic signature constructed of seven ferroptosis-related lncRNAs predicts the prognosis of HBV-Related HCC. J Gastrointest Cancer 2023.

Hashemi M, Mirzaei S, Zandieh MA, Rezaei S, Amirabbas K, Dehghanpour A, Esmaeili N, Ghahremanzade A, Saebfar H, Heidari H, et al. Long non-coding RNAs (lncRNAs) in hepatocellular carcinoma progression: Biological functions and new therapeutic targets. Prog Biophys Mol Biol. 2023;177:207–28.

Liu X, Cheng W, Li H, Song Y. Identification and validation of cuproptosis-related LncRNA signatures as a novel prognostic model for head and neck squamous cell cancer. Cancer Cell Int. 2022;22(1):345.

Wang W, Ye Y, Zhang X, Ye X, Liu C, Bao L. Construction of a necroptosis-Associated Long non-coding RNA signature to Predict Prognosis and Immune Response in Hepatocellular Carcinoma. Front Mol Biosci. 2022;9:937979.

Song J, Wang L, Ng NN, Zhao M, Shi J, Wu N, Li W, Liu Z, Yeom KW, Tian J. Development and validation of a machine learning model to explore tyrosine kinase inhibitor response in patients with stage IV EGFR variant-positive Non-small Cell Lung Cancer. JAMA Netw Open. 2020;3(12):e2030442.

Wang W, Ye Y, Zhang X, Sun W, Bao L. An angiogenesis-related three-long non-coding ribonucleic acid signature predicts the immune landscape and prognosis in hepatocellular carcinoma. Heliyon. 2023;9(3):e13989.

Xu L, Chen S, Li Q, Chen X, Xu Y, Zhou Y, Li J, Guo Z, Xing J, Chen D. Integrating bioinformatics and experimental validation to unveil disulfidptosis-related lncRNAs as prognostic biomarker and therapeutic target in hepatocellular carcinoma. Cancer Cell Int. 2024;24(1):30.

Zhu X, Chen D, Sun Y, Yang S, Wang W, Liu B, Gao P, Li X, Wu L, Ma S, et al. LncRNA WEE2-AS1 is a diagnostic biomarker that predicts poor prognoses in patients with glioma. BMC Cancer. 2023;23(1):120.

Wang L, Cao Y, Guo W, Xu J. High expression of cuproptosis-related gene FDX1 in relation to good prognosis and immune cells infiltration in colon adenocarcinoma (COAD). J Cancer Res Clin Oncol. 2023;149(1):15–24.

Chen D, Xu Y, Gao X, Zhu X, Liu X, Yan D. A novel signature of cuproptosis-related lncRNAs predicts prognosis in glioma: evidence from bioinformatic analysis and experiments. Front Pharmacol. 2023;14:1158723.

Bekric D, Ocker M, Mayr C, Stintzing S, Ritter M, Kiesslich T, Neureiter D. Ferroptosis in Hepatocellular Carcinoma: mechanisms, drug targets and approaches to clinical translation. Cancers (Basel) 2022, 14(7).

Paskeh MDA, Asadi A, Mirzaei S, Hashemi M, Entezari M, Raesi R, Hushmandi K, Zarrabi A, Ertas YN, Aref AR, et al. Targeting AMPK signaling in ischemic/reperfusion injury: from molecular mechanism to pharmacological interventions. Cell Signal. 2022;94:110323.

Zhong X, Huang S, Liu D, Jiang Z, Jin Q, Li C, Da L, Yao Q, Wang D. Galangin promotes cell apoptosis through suppression of H19 expression in hepatocellular carcinoma cells. Cancer Med. 2020;9(15):5546–57.

Luo Y, Lin J, Zhang J, Song Z, Zheng D, Chen F, Zhuang X, Li A, Liu X. LncRNA SNHG17 Contributes to Proliferation, Migration, and Poor Prognosis of Hepatocellular Carcinoma. Can J Gastroenterol Hepatol 2021, 2021:9990338.

Fang C, Liu S, Feng K, Huang C, Zhang Y, Wang J, Lin H, Wang J, Zhong C. Ferroptosis-related lncRNA signature predicts the prognosis and immune microenvironment of hepatocellular carcinoma. Sci Rep. 2022;12(1):6642.

Zhang Z, Zhang W, Wang Y, Wan T, Hu B, Li C, Ge X, Lu S. Construction and validation of a ferroptosis-related lncRNA signature as a Novel Biomarker for Prognosis, Immunotherapy and targeted therapy in Hepatocellular Carcinoma. Front Cell Dev Biol. 2022;10:792676.

Liu T, Yang K, Chen J, Qi L, Zhou X, Wang P. Comprehensive Pan-cancer Analysis of KIF18A as a marker for prognosis and immunity. Biomolecules 2023, 13(2).

Hara M, Takeba Y, Iiri T, Ohta Y, Ootaki M, Watanabe M, Watanabe D, Koizumi S, Otsubo T, Matsumoto N. Vasoactive intestinal peptide increases apoptosis of hepatocellular carcinoma by inhibiting the cAMP/Bcl-xL pathway. Cancer Sci. 2019;110(1):235–44.

Lam PY, Sia KC, Khong JH, De Geest B, Lim KS, Ho IA, Wang GY, Miao LV, Huynh H, Hui KM. An efficient and safe herpes simplex virus type 1 amplicon vector for transcriptionally targeted therapy of human hepatocellular carcinomas. Mol Ther. 2007;15(6):1129–36.

Li C, Jia Y, Li N, Zhou Q, Liu R, Wang Q. DNA methylation-mediated high expression of CCDC50 correlates with poor prognosis and hepatocellular carcinoma progression. Aging. 2023;15(15):7424–39.

CAS   PubMed   PubMed Central   Google Scholar  

Liu B, Liu Z, Wang Y, Lian X, Han Z, Cheng X, Zhu Y, Liu R, Zhao Y, Gao Y. Overexpression of GINS4 is associated with poor prognosis and survival in glioma patients. Mol Med. 2021;27(1):117.

Ali MY, Grimm CF, Ritter M, Mohr L, Allgaier HP, Weth R, Bocher WO, Endrulat K, Blum HE, Geissler M. Activation of dendritic cells by local ablation of hepatocellular carcinoma. J Hepatol. 2005;43(5):817–22.

Cabillic F, Toutirais O, Lavoué V, de La Pintière CT, Daniel P, Rioux-Leclerc N, Turlin B, Mönkkönen H, Mönkkönen J, Boudjema K, et al. Aminobisphosphonate-pretreated dendritic cells trigger successful Vgamma9Vdelta2 T cell amplification for immunotherapy in advanced cancer patients. Cancer Immunol Immunother. 2010;59(11):1611–9.

Download references

Acknowledgements

Not applicable.

This work was funded by the Henan Medical Science and Technology Joint Building Program (no. LHGJ20190255).

Author information

Yan Dong, Xinyi Chen and Shen Yang contributed equally to this work.

Authors and Affiliations

Department of Hepatobiliary Surgery, the First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China

Department of Gynecological Oncology, Sun Yat-sen University Cancer Center, Guangzhou, Guangdong, China

Department of Infectious Diseases, the First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China

Shen Yang, Yilong Fu, Liangyu Wang & Lixia Xu

Department of Laboratory Medicine, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China

Xueping Gao

Department of Neurosurgery, the First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China

You can also search for this author in PubMed   Google Scholar

Contributions

LXX, YD, and XYC were responsible for designing the project and writing the manuscript. SY, DC and XPG downloaded and analyzed the data. YLF and LYW collected samples and processed the data. The final manuscript was reviewed and approved by all writers.

Corresponding authors

Correspondence to Xueping Gao , Di Chen or Lixia Xu .

Ethics declarations

Ethics approval and consent to participate.

Approval for the study was granted by the Ethics Committee of the First Affiliated Hospital of Zhengzhou University, in accordance with the principles of the Declaration of Helsinki. Informed consent was obtained from all the participants in the study.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, supplementary material 3, supplementary material 4, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Dong, Y., Chen, X., Yang, S. et al. Comprehensive analysis of POLH-AS1 as a prognostic biomarker in hepatocellular carcinoma. BMC Cancer 24 , 1112 (2024). https://doi.org/10.1186/s12885-024-12857-8

Download citation

Received : 09 February 2024

Accepted : 27 August 2024

Published : 06 September 2024

DOI : https://doi.org/10.1186/s12885-024-12857-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

ISSN: 1471-2407

primary research articles

Information Technology

Primary research articles.

  • Library vs. Google
  • Background Reading
  • Keyword Searching
  • Evaluating Sources
  • Citing Sources
  • Need more help?

How Can I Find Primary Research Articles?

Many of the recommended databases in this subject guide contain primary research articles (also known as empirical articles or research studies). Search in databases like ScienceDirect , MEDLINE , and Health Source: Nursing/Academic Edition .

Primary Research Articles: How Will I Know One When I See One?

Primary research articles  to conduct and publish an experiment or research study, an author or team of authors designs an experiment, gathers data, then analyzes the data and discusses the results of the experiment. a published experiment or research study will therefore  look  very different from other types of articles (newspaper stories, magazine articles, essays, etc.) found in our library databases. the following guidelines will help you recognize a primary research article, written by the researchers themselves and published in a scholarly journal., structure of a primary research article typically, a primary research article has the following sections:.

  • The author summarizes her article
  • The author discusses the general background of her research topic; often, she will present a literature review, that is, summarize what other experts have written on this particular research topic
  • The author describes the study she designed and conducted
  • The author presents the data she gathered during her experiment
  • The author offers ideas about the importance and implications of her research findings, and speculates on future directions that similar research might take
  • The author gives a References list of sources she used in her paper

The structure of the article will often be clearly shown with headings: Introduction, Method, Results, Discussion.

A primary research article will almost always contains statistics, numerical data presented in tables. Also, primary research articles are written in very formal, very technical language.

  • << Previous: Resources
  • Next: Research Tips >>
  • Last Updated: Aug 1, 2024 5:09 PM
  • URL: https://libguides.umgc.edu/information-technology

primary research articles

Maintenance work is planned from 22:00 BST on Monday 16th September 2024 to 22:00 BST on Tuesday 17th September 2024.

During this time the performance of our website may be affected - searches may run slowly, some pages may be temporarily unavailable, and you may be unable to access content. If this happens, please try refreshing your web browser or try waiting two to three minutes before trying again.

We apologise for any inconvenience this might cause and thank you for your patience.

primary research articles

CrystEngComm

The fracture stress of 8-inch silicon carbide during the pvt growth.

The primary challenge for preparing SiC crystals during the PVT process is the fractures, especially with the increase in diameter (≧8 inches). In this study, the elastic-plastic behaviors of the azimuthal component (σ φφ ) of primary stress causing SiC crystal fractures have been investigated to research the fracture mechanism. It is concluded that the plastic deformations caused by prismatic plane slips contribute positively to σ φφ . Besides, the magnitude of the plastic component of σ φφ is determined by the magnitude of resolved shear stresses (RSS) on each prismatic slip system over the same period, according to the Alexander-Haasen (AH) model. Further, the evolution of the magnitude of RSS (|RSS|) during the cooling process has been simulated, and the effects of the experimental conditions on the elastic-plastic behaviors of σ φφ have been investigated qualitatively. Control experiments of SiC crystal growth have been carried out simultaneously, and our conclusions have been verified by the experimental results. Lastly, the maximum |RSS| can be a criterion for the fracture of SiC crystal, giving a critical value near 30 MPa under our experimental conditions. The analysis of the relation between the growth conditions and fractures guides the growth of perfect 8-inch SiC crystals.

Article information

Download citation, permissions.

primary research articles

B. Xu, S. Lu, H. Cui, X. Pi, D. Yang and X. Han, CrystEngComm , 2024, Accepted Manuscript , DOI: 10.1039/D4CE00769G

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page .

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page .

Read more about how to correctly acknowledge RSC content .

Social activity

Search articles by author.

This article has not yet been cited.

Advertisements

IMAGES

  1. 27 Real Primary Research Examples (2024)

    primary research articles

  2. Primary Research

    primary research articles

  3. (PDF) First year students benefit from reading primary research articles

    primary research articles

  4. FREE 10+ Primary Research Report Samples in PDF

    primary research articles

  5. Primary Research: What It Is, Purpose & Methods + Examples

    primary research articles

  6. Major findings of 5 Primary Research Articles

    primary research articles

VIDEO

  1. Introduction to Research types: Primary & Secondary

  2. How to Find a Primary Research Article

  3. Primary Research

  4. Author Searching Web of Science @ UIC

  5. Research in Primary Care for Nurse Practitioners

  6. المحاضرة الرابعة

COMMENTS

  1. JSTOR Home

    JSTOR Home ... JSTOR Home

  2. Primary Research

    Learn what primary research is, how to conduct it, and why it is useful for your academic projects. Find out the types, examples, advantages, and disadvantages of primary research methods.

  3. UMGC Library: Sciences: Primary Research Articles

    Primary Research Articles. To conduct and publish an experiment or research study, an author or team of authors designs an experiment, gathers data, then analyzes the data and discusses the results of the experiment. A published experiment or research study will therefore look very different from other types of articles (newspaper stories ...

  4. Finding Scholarly Articles: Home

    Scholarly or primary research articles are peer-reviewed, which means that they have gone through the process of being read by reviewers or referees before being accepted for publication. When a scholar submits an article to a scholarly journal, the manuscript is sent to experts in that field to read and decide if the research is valid and the ...

  5. Is it Primary Research? How Do I Know?

    Simply limiting your search results in a database to "peer-reviewed" will not retrieve a list of only primary research studies. Learn to recognize the parts of a primary research study. Terminology will vary slightly from discipline to discipline and from journal to journal. However, there are common components to most research studies. STEP ONE:

  6. What is Primary Research and How do I get Started?

    Primary research is an excellent skill to learn as it can be useful in a variety of settings including business, personal, and academic. But I'm not an expert! With some careful planning, primary research can be done by anyone, even students new to writing at the university level. The information provided on this page will help you get started.

  7. Finding Primary Research Articles in the Sciences: Home

    Click here to get help from a Polk State Librarian. This guide goes over how to find and analyze primary research articles in the sciences (e.g. nutrition, health sciences and nursing, biology, chemistry, physics, sociology, psychology). In addition, the guide explains how to tell the difference between a primary source and a secondary source ...

  8. Guides: Peer-Review and Primary Research: What is a Primary Study

    Peer-Review and Primary Research

  9. Peer Review & Primary Research Articles

    A primary research article reports on an empirical research study conducted by the authors. The goal of a primary research article is to present the result of original research that makes a new contribution to the body of knowledge. Characteristics: Almost always published in a peer-reviewed journal;

  10. Primary research

    Primary research articles provide a report of individual, original research studies, which constitute the majority of articles published in peer-reviewed journals. All primary research studies are conducted according to a specified methodology, which will be partly determined by the aims and objectives of the research. ...

  11. Research Guides: Science Writing: Primary Research Articles

    What is a primary research article? If you're writing an empirical article (also known as a primary research article) then you're doing original, typically experimental, research -- you are creating new knowledge and will have original findings. These primary research articles will always have a methodology section where you describe how you ...

  12. Approaches to prioritising primary health research: a scoping review

    Introduction. Health research can strengthen health systems, accelerate progress on the Sustainable Development Goals and improve population health. 1-4 The past few years have witnessed increased global calls to make better use of health research in policy-making and practice. 5-7 The global COVID-19 pandemic has reinforced the importance of appropriately identifying the health issues ...

  13. Maximizing Legacy and Impact of Primary Research: A Call for Better

    Those with experience in meta-analysis and systematic review understand the value of well-reported summary data in primary research articles, and failing this, the provision of raw data. To ensure the legacy of primary research and maximize its value, however, it should be the priority of journal editors and manuscript authors to ensure that ...

  14. JSTOR Primary Sources

    JSTOR Primary Sources

  15. Identifying Articles

    Primary research articles provide a background on their subject by summarizing previously conducted research, this typically occurs only in the Introduction section of the article. Review Article. Review articles do not report new experiments. Rather, they attempt to provide a thorough review of a specific subject by assessing either all or the ...

  16. Identifying Primary and Secondary Research Articles

    Primary Research Articles. Primary research articles report on a single study. In the health sciences, primary research articles generally describe the following aspects of the study: The study's hypothesis or research question; The number of participants in the study, generally referred to as the "n"

  17. Primary Research

    Primary research is any research that you conduct yourself. It can be as simple as a 2-question survey, or as in-depth as a years-long longitudinal study. The only key is that data must be collected firsthand by you. Primary research is often used to supplement or strengthen existing secondary research.

  18. UMGC Library: Primary Sources: Empirical Research Articles

    Because primary research articles are written in technical language by professional researchers for experts like themselves, the articles can be very hard to understand. However, if you carefully review the introduction, results, and discussion sections, you will usually be able to understand and use one or two main ideas that the author is ...

  19. What is Primary Research?

    Introduction. Conducting research involves two types of data: primary data and secondary data. While secondary research deals with existing data, primary research collects new data. Ultimately, the most appropriate type of research depends on which method is best suited to your research question. While this article discusses the difference ...

  20. What is Primary Research? Definitions, Methods, Sources, Examples, and More

    Definitions, Methods, Sources, Examples, and More. Primary research is a cornerstone of insightful, accurate, and effective decision-making in both academic and professional settings. At its core, refers to the process of collecting data directly from sources rather than relying on previously gathered information, distinguishing it clearly from.

  21. Tutorial: Evaluating Information: Primary vs. Secondary Articles

    In the sciences, primary (or empirical) research articles: are original scientific reports of new research findings (Please note that an original scientific article does not include review articles, which summarize the research literature on a particular subject, or articles using meta-analyses, which analyze pre-published data.); usually include the following sections: Introduction, Methods ...

  22. Primary Research: What It Is, Purpose & Methods + Examples

    Primary Research: What It Is, Purpose & Methods + Examples

  23. Lesley University Library: Finding and Using Primary Resources: Home

    Primary sources are those created contemporaneously to whatever period a researcher is studying. In contrast to secondary sources, they don't provide any analysis on a given topic after the fact; instead, they reflect on information or events as they unfolded (for example, a newspaper article, from the time of a particular historical event, discussing the historical event as it happened).

  24. Research Guides: Introduction to Special Collections & Archives

    Primary sources also include qualitative forms, like what people say, do, and experience. These sources can take various forms like written, audio, video, or photographic. Archival Sources are primary sources that have been created during the course of everyday life and have enduring value as evidence of the past. This enduring value and the ...

  25. Primary succession of Bifidobacteria drives pathogen resistance in

    Primary colonization by microbial communities dominated by Bifidobacteria contribute to stable gut microbiota assembly and long-term pathogen resistance in neonates. ... Future research should ...

  26. Comprehensive analysis of POLH-AS1 as a prognostic biomarker in

    Hepatocellular carcinoma (HCC), a prevalent primary malignant tumor, is notorious for its high mortality rate. Despite advancements in HCC treatment, patient outcomes remain suboptimal. This study endeavors to assess the potential prognostic significance of POLH-AS1 in HCC. In this research, we gathered RNA-Seq information from individuals with HCC in The Cancer Genome Atlas (TCGA).

  27. Primary Research Articles

    A primary research article will almost always contains statistics, numerical data presented in tables. Also, primary research articles are written in very formal, very technical language. Because primary research articles are written in technical language by professional researchers for experts like themselves, the articles can be very hard to ...

  28. Ultimate Guide to Primary Market Research: Methods, Examples, and Tips

    This article explains primary market research, its methods, and its benefits and drawbacks. Key Takeaways. Primary market research involves direct data collection from target audiences, offering tailored insights that enhance understanding of consumer behaviors and market conditions.

  29. The fracture stress of 8-inch silicon carbide during the PVT growth

    The primary challenge for preparing SiC crystals during the PVT process is the fractures, especially with the increase in diameter (≧8 inches). In this study, the elastic-plastic behaviors of the azimuthal component (σφφ) of primary stress causing SiC crystal fractures have been investigated to research the