E.g., adding the University of Toronto limits the search of Smith, J to authors associated with this institution.
Use ORCID identification
An ORCID is a 16-digit number and is used by editors, funding agencies, publishers, and institutions to reliably identify individuals in the same way that ISBNs and DOIs identify books and articles. Use this to find a specific author.
If you use an ORCID in the search, none of the other values for the last name, first name, or affiliation are used.Unsure about spelling the author’s name? Use Wildcards to replace letters with unknowns:
Replaces zero or more characters - e.g., Jo* finds John, Johnston, Jonathan. | |
Replaces a single character - e.g., Jo?n finds John, Joan |
The hyphen is treated as punctuation and therefore ignored if it is not in an exact phrase. Wildcards must be used with words because they cannot be standalone. When an hyphen is placed between a wildcard and a word, the wildcard will be dropped.
It is also possible to search an Author in Advanced Search with their author ID. For example: AU-ID(000000000)
Multiple author IDs can be searched as well: For example: AU-ID(000000000) OR AU-ID(111111111) OR AU-ID(222222222)
Any author search field Codes can be used with OR between them to search multiple authors.
|
An organization search returns a list of organizations with links to documents and a summary of the organization's research areas, collaborations, and publications.
To search for documents and authors within those organizations:
AND | Both terms must appear - e.g., "Cognitive architecture" AND robots |
OR | At least one term must appear - e.g., liver OR cirrhosis |
AND NOT | Exclude one term - e.g., lung AND NOT cancer |
Use these characters (wildcards) to find variations of a word:
Replaces zero or more characters - e.g., Chem* finds Chemistry, Chemicals, Chemists | |
Replaces a single character - e.g., Nure?berg finds Nuremberg, Nurenberg |
Note: The hyphen is treated as punctuation and therefore ignored if it is not in an exact phrase. Wildcards must be used with words because they cannot be standalone. When an hyphen is placed between a wildcard and a word, the wildcard will be dropped.
For example, you can enter Técnicas or Tecnicas for your search. Searching for Tecnicas returns results for e as well as é.
Here are some different field codes you could use:
Affiliation ID | If you know an affiliation’s ID, type in AF-ID(xxxxxxxx) |
Affiliation | To find documents where your search terms occur in the , use: AFFIL(london and hospital) To find documents where both terms appear in a document's affiliation, in the same affiliation, use: AFFIL(london) and AFFIL (hospital) |
Multiple affiliations | To search for documents from multiple affiliations, use a boolean operator to combine a search: |
|
You can combine two or more searches with the operators OR , AND , and AND NOT using Combine queries.
Thank you for your feedback, it will help us serve you better. If you require assistance, please scroll down and use one of the contact options to get in touch.
Help us to help you:
Thank you for your feedback!
For further assistance:
You may be searching for:
Quickly find and gain valuable insights from 300 million+ research papers.
300 million+ scholarly papers
Access over 300 million authoritative and reputable academic papers. Find the most reliable and up-to-date information in one place.
Save over 50% of research time
Instantly find the most relevant papers related to any research topic or question, cutting hours spent searching different databases.
Free access for all
Knowledge should be accessible to all. You can use HIX Scholar's robust search engine and vast scholarly resources without any cost.
Covers all academic disciplines
Whether you are looking for research papers in physics, literature, humanities, or any other field, HIX Scholar has got you covered.
Find reliable sources for research papers and school projects quickly and easily.
Find relevant academic resources to incorporate into lectures and classes.
Find technical papers and studies to support engineering projects and advancements.
Researchers
Conduct efficient and effective literature reviews for research projects.
Legal Professionals
Gather valuable information from scholarly articles to support legal cases and arguments.
Health Practitioners
Access up-to-date medical literature to inform diagnoses, treatments, and patient care.
Search and retrieve relevant information from hundreds of millions of papers.
This website uses cookies to ensure you get the best experience. Learn more about DOAJ’s privacy policy.
Hide this message
You are using an outdated browser. Please upgrade your browser to improve your experience and security.
Directory of Open Access Journals
Doaj in numbers.
80 languages
135 countries represented
13,752 journals without APCs
20,920 journals
10,461,097 article records
About the directory.
DOAJ is a unique and extensive index of diverse open access journals from around the world, driven by a growing community, and is committed to ensuring quality content is freely available online for everyone.
DOAJ is committed to keeping its services free of charge, including being indexed, and its data freely available.
→ About DOAJ
→ How to apply
DOAJ is twenty years old in 2023.
Fund our 20th anniversary campaign
DOAJ is independent. All support is via donations.
82% from academic organisations
18% from contributors
Support DOAJ
Publishers don't need to donate to be part of DOAJ.
Meet the doaj team: head of editorial and deputy head of editorial (quality), vacancy: operations manager, press release: pubscholar joins the movement to support the directory of open access journals, new major version of the api to be released.
→ All blog posts
We would not be able to work without our volunteers, such as these top-performing editors and associate editors.
→ Meet our volunteers
Librarianship, Scholarly Publishing, Data Management
Brisbane, Australia (Chinese, English)
Adana, Türkiye (Turkish, English)
Humanities, Social Sciences
Toruń, Poland (Polish, English)
Medical Sciences, Nutrition
Caracas, Venezuela (Spanish, English)
Research Evaluation
Milan, Italy (Italian, German, English)
Social Sciences, Humanities
Ponorogo, Indonesia (Bahasa Indonesia, English, Dutch)
Systematic Entomology
Edirne, Türkiye (English, Turkish, German)
Library and Information Science
Kyiv, Ukraine (Ukrainian, Russian, English, Polish)
DOAJ’s team of managing editors, editors, and volunteers work with publishers to index new journals. As soon as they’re accepted, these journals are displayed on our website freely accessible to everyone.
→ See Atom feed
→ A log of journals added (and withdrawn)
→ DOWNLOAD all journals as CSV
Lucy Flamm, Social Work Librarian, is available for appointments and workshops relating to social work research including identifying sources, citations, and data analysis. Lucy can be contacted via email at [email protected].
Additional subject guides across the Library are available to support your research. A full list is available here , with guides most relevant to the social work community listed below:
The latest resurgence in the U.S. of policies aimed at reducing imports and bolstering domestic production has included the expansion of Buy American provisions. While some of these are new and untested, in this paper we evaluate long-standing procurement limitations on the purchase of foreign products by the U.S. Federal Government. We use procurement micro-data to first map and measure the positive employment effects of government purchases. We then calibrate a quantitative trade model adapted to include features relevant to the Buy American Act: a government sector, policy barriers in final and intermediate goods, labor force participation, and external economies of scale. We show that current Buy American provisions on final goods purchase have created up to 100,000 jobs at a cost of between $111,500 and $137,700 per job. However, the recently announced tightening of the policy on the use of foreign inputs will create fewer jobs at a higher cost of $154,000 to $237,800 per job. We also find scant evidence of the use of Buy American rules as an effective industrial policy.
We thank Vidya Venkatachalam and Bohan Wang for excellent research assistance. We appreciate comments by seminar and conference participants at LMU Munich, Boston University, Duke University, Hong Kong University of Science and Technology, the University of Hong Kong, Singapore Management University, Peking University HSBC Business School, Yale University and the CESIfo Venice Summer Institute. We also thank Andrés Rodriguéz-Clare and Steve Tadelis for very helpful suggestions. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.
MARC RIS BibTeΧ
Download Citation Data
In addition to working papers , the NBER disseminates affiliates’ latest findings through a range of free periodicals — the NBER Reporter , the NBER Digest , the Bulletin on Retirement and Disability , the Bulletin on Health , and the Bulletin on Entrepreneurship — as well as online conference reports , video lectures , and interviews .
Peer Reviewed
Article metrics.
CrossRef Citations
Altmetric Score
PDF Downloads
Academic journals, archives, and repositories are seeing an increasing number of questionable research papers clearly produced using generative AI. They are often created with widely available, general-purpose AI applications, most likely ChatGPT, and mimic scientific writing. Google Scholar easily locates and lists these questionable papers alongside reputable, quality-controlled research. Our analysis of a selection of questionable GPT-fabricated scientific papers found in Google Scholar shows that many are about applied, often controversial topics susceptible to disinformation: the environment, health, and computing. The resulting enhanced potential for malicious manipulation of society’s evidence base, particularly in politically divisive domains, is a growing concern.
Swedish School of Library and Information Science, University of Borås, Sweden
Department of Arts and Cultural Sciences, Lund University, Sweden
Division of Environmental Communication, Swedish University of Agricultural Sciences, Sweden
The use of ChatGPT to generate text for academic papers has raised concerns about research integrity. Discussion of this phenomenon is ongoing in editorials, commentaries, opinion pieces, and on social media (Bom, 2023; Stokel-Walker, 2024; Thorp, 2023). There are now several lists of papers suspected of GPT misuse, and new papers are constantly being added. 1 See for example Academ-AI, https://www.academ-ai.info/ , and Retraction Watch, https://retractionwatch.com/papers-and-peer-reviews-with-evidence-of-chatgpt-writing/ . While many legitimate uses of GPT for research and academic writing exist (Huang & Tan, 2023; Kitamura, 2023; Lund et al., 2023), its undeclared use—beyond proofreading—has potentially far-reaching implications for both science and society, but especially for their relationship. It, therefore, seems important to extend the discussion to one of the most accessible and well-known intermediaries between science, but also certain types of misinformation, and the public, namely Google Scholar, also in response to the legitimate concerns that the discussion of generative AI and misinformation needs to be more nuanced and empirically substantiated (Simon et al., 2023).
Google Scholar, https://scholar.google.com , is an easy-to-use academic search engine. It is available for free, and its index is extensive (Gusenbauer & Haddaway, 2020). It is also often touted as a credible source for academic literature and even recommended in library guides, by media and information literacy initiatives, and fact checkers (Tripodi et al., 2023). However, Google Scholar lacks the transparency and adherence to standards that usually characterize citation databases. Instead, Google Scholar uses automated crawlers, like Google’s web search engine (Martín-Martín et al., 2021), and the inclusion criteria are based on primarily technical standards, allowing any individual author—with or without scientific affiliation—to upload papers to be indexed (Google Scholar Help, n.d.). It has been shown that Google Scholar is susceptible to manipulation through citation exploits (Antkare, 2020) and by providing access to fake scientific papers (Dadkhah et al., 2017). A large part of Google Scholar’s index consists of publications from established scientific journals or other forms of quality-controlled, scholarly literature. However, the index also contains a large amount of gray literature, including student papers, working papers, reports, preprint servers, and academic networking sites, as well as material from so-called “questionable” academic journals, including paper mills. The search interface does not offer the possibility to filter the results meaningfully by material type, publication status, or form of quality control, such as limiting the search to peer-reviewed material.
To understand the occurrence of ChatGPT (co-)authored work in Google Scholar’s index, we scraped it for publications, including one of two common ChatGPT responses (see Appendix A) that we encountered on social media and in media reports (DeGeurin, 2024). The results of our descriptive statistical analyses showed that around 62% did not declare the use of GPTs. Most of these GPT-fabricated papers were found in non-indexed journals and working papers, but some cases included research published in mainstream scientific journals and conference proceedings. 2 Indexed journals mean scholarly journals indexed by abstract and citation databases such as Scopus and Web of Science, where the indexation implies journals with high scientific quality. Non-indexed journals are journals that fall outside of this indexation. More than half (57%) of these GPT-fabricated papers concerned policy-relevant subject areas susceptible to influence operations. To avoid increasing the visibility of these publications, we abstained from referencing them in this research note. However, we have made the data available in the Harvard Dataverse repository.
The publications were related to three issue areas—health (14.5%), environment (19.5%) and computing (23%)—with key terms such “healthcare,” “COVID-19,” or “infection”for health-related papers, and “analysis,” “sustainable,” and “global” for environment-related papers. In several cases, the papers had titles that strung together general keywords and buzzwords, thus alluding to very broad and current research. These terms included “biology,” “telehealth,” “climate policy,” “diversity,” and “disrupting,” to name just a few. While the study’s scope and design did not include a detailed analysis of which parts of the articles included fabricated text, our dataset did contain the surrounding sentences for each occurrence of the suspicious phrases that formed the basis for our search and subsequent selection. Based on that, we can say that the phrases occurred in most sections typically found in scientific publications, including the literature review, methods, conceptual and theoretical frameworks, background, motivation or societal relevance, and even discussion. This was confirmed during the joint coding, where we read and discussed all articles. It became clear that not just the text related to the telltale phrases was created by GPT, but that almost all articles in our sample of questionable articles likely contained traces of GPT-fabricated text everywhere.
Evidence hacking and backfiring effects
Generative pre-trained transformers (GPTs) can be used to produce texts that mimic scientific writing. These texts, when made available online—as we demonstrate—leak into the databases of academic search engines and other parts of the research infrastructure for scholarly communication. This development exacerbates problems that were already present with less sophisticated text generators (Antkare, 2020; Cabanac & Labbé, 2021). Yet, the public release of ChatGPT in 2022, together with the way Google Scholar works, has increased the likelihood of lay people (e.g., media, politicians, patients, students) coming across questionable (or even entirely GPT-fabricated) papers and other problematic research findings. Previous research has emphasized that the ability to determine the value and status of scientific publications for lay people is at stake when misleading articles are passed off as reputable (Haider & Åström, 2017) and that systematic literature reviews risk being compromised (Dadkhah et al., 2017). It has also been highlighted that Google Scholar, in particular, can be and has been exploited for manipulating the evidence base for politically charged issues and to fuel conspiracy narratives (Tripodi et al., 2023). Both concerns are likely to be magnified in the future, increasing the risk of what we suggest calling evidence hacking —the strategic and coordinated malicious manipulation of society’s evidence base.
The authority of quality-controlled research as evidence to support legislation, policy, politics, and other forms of decision-making is undermined by the presence of undeclared GPT-fabricated content in publications professing to be scientific. Due to the large number of archives, repositories, mirror sites, and shadow libraries to which they spread, there is a clear risk that GPT-fabricated, questionable papers will reach audiences even after a possible retraction. There are considerable technical difficulties involved in identifying and tracing computer-fabricated papers (Cabanac & Labbé, 2021; Dadkhah et al., 2023; Jones, 2024), not to mention preventing and curbing their spread and uptake.
However, as the rise of the so-called anti-vaxx movement during the COVID-19 pandemic and the ongoing obstruction and denial of climate change show, retracting erroneous publications often fuels conspiracies and increases the following of these movements rather than stopping them. To illustrate this mechanism, climate deniers frequently question established scientific consensus by pointing to other, supposedly scientific, studies that support their claims. Usually, these are poorly executed, not peer-reviewed, based on obsolete data, or even fraudulent (Dunlap & Brulle, 2020). A similar strategy is successful in the alternative epistemic world of the global anti-vaccination movement (Carrion, 2018) and the persistence of flawed and questionable publications in the scientific record already poses significant problems for health research, policy, and lawmakers, and thus for society as a whole (Littell et al., 2024). Considering that a person’s support for “doing your own research” is associated with increased mistrust in scientific institutions (Chinn & Hasell, 2023), it will be of utmost importance to anticipate and consider such backfiring effects already when designing a technical solution, when suggesting industry or legal regulation, and in the planning of educational measures.
Recommendations
Solutions should be based on simultaneous considerations of technical, educational, and regulatory approaches, as well as incentives, including social ones, across the entire research infrastructure. Paying attention to how these approaches and incentives relate to each other can help identify points and mechanisms for disruption. Recognizing fraudulent academic papers must happen alongside understanding how they reach their audiences and what reasons there might be for some of these papers successfully “sticking around.” A possible way to mitigate some of the risks associated with GPT-fabricated scholarly texts finding their way into academic search engine results would be to provide filtering options for facets such as indexed journals, gray literature, peer-review, and similar on the interface of publicly available academic search engines. Furthermore, evaluation tools for indexed journals 3 Such as LiU Journal CheckUp, https://ep.liu.se/JournalCheckup/default.aspx?lang=eng . could be integrated into the graphical user interfaces and the crawlers of these academic search engines. To enable accountability, it is important that the index (database) of such a search engine is populated according to criteria that are transparent, open to scrutiny, and appropriate to the workings of science and other forms of academic research. Moreover, considering that Google Scholar has no real competitor, there is a strong case for establishing a freely accessible, non-specialized academic search engine that is not run for commercial reasons but for reasons of public interest. Such measures, together with educational initiatives aimed particularly at policymakers, science communicators, journalists, and other media workers, will be crucial to reducing the possibilities for and effects of malicious manipulation or evidence hacking. It is important not to present this as a technical problem that exists only because of AI text generators but to relate it to the wider concerns in which it is embedded. These range from a largely dysfunctional scholarly publishing system (Haider & Åström, 2017) and academia’s “publish or perish” paradigm to Google’s near-monopoly and ideological battles over the control of information and ultimately knowledge. Any intervention is likely to have systemic effects; these effects need to be considered and assessed in advance and, ideally, followed up on.
Our study focused on a selection of papers that were easily recognizable as fraudulent. We used this relatively small sample as a magnifying glass to examine, delineate, and understand a problem that goes beyond the scope of the sample itself, which however points towards larger concerns that require further investigation. The work of ongoing whistleblowing initiatives 4 Such as Academ-AI, https://www.academ-ai.info/ , and Retraction Watch, https://retractionwatch.com/papers-and-peer-reviews-with-evidence-of-chatgpt-writing/ . , recent media reports of journal closures (Subbaraman, 2024), or GPT-related changes in word use and writing style (Cabanac et al., 2021; Stokel-Walker, 2024) suggest that we only see the tip of the iceberg. There are already more sophisticated cases (Dadkhah et al., 2023) as well as cases involving fabricated images (Gu et al., 2022). Our analysis shows that questionable and potentially manipulative GPT-fabricated papers permeate the research infrastructure and are likely to become a widespread phenomenon. Our findings underline that the risk of fake scientific papers being used to maliciously manipulate evidence (see Dadkhah et al., 2017) must be taken seriously. Manipulation may involve undeclared automatic summaries of texts, inclusion in literature reviews, explicit scientific claims, or the concealment of errors in studies so that they are difficult to detect in peer review. However, the mere possibility of these things happening is a significant risk in its own right that can be strategically exploited and will have ramifications for trust in and perception of science. Society’s methods of evaluating sources and the foundations of media and information literacy are under threat and public trust in science is at risk of further erosion, with far-reaching consequences for society in dealing with information disorders. To address this multifaceted problem, we first need to understand why it exists and proliferates.
Finding 1: 139 GPT-fabricated, questionable papers were found and listed as regular results on the Google Scholar results page. Non-indexed journals dominate.
Most questionable papers we found were in non-indexed journals or were working papers, but we did also find some in established journals, publications, conferences, and repositories. We found a total of 139 papers with a suspected deceptive use of ChatGPT or similar LLM applications (see Table 1). Out of these, 19 were in indexed journals, 89 were in non-indexed journals, 19 were student papers found in university databases, and 12 were working papers (mostly in preprint databases). Table 1 divides these papers into categories. Health and environment papers made up around 34% (47) of the sample. Of these, 66% were present in non-indexed journals.
Indexed journals* | 5 | 3 | 4 | 7 | 19 |
Non-indexed journals | 18 | 18 | 13 | 40 | 89 |
Student papers | 4 | 3 | 1 | 11 | 19 |
Working papers | 5 | 3 | 2 | 2 | 12 |
Total | 32 | 27 | 20 | 60 | 139 |
Finding 2: GPT-fabricated, questionable papers are disseminated online, permeating the research infrastructure for scholarly communication, often in multiple copies. Applied topics with practical implications dominate.
The 20 papers concerning health-related issues are distributed across 20 unique domains, accounting for 46 URLs. The 27 papers dealing with environmental issues can be found across 26 unique domains, accounting for 56 URLs. Most of the identified papers exist in multiple copies and have already spread to several archives, repositories, and social media. It would be difficult, or impossible, to remove them from the scientific record.
As apparent from Table 2, GPT-fabricated, questionable papers are seeping into most parts of the online research infrastructure for scholarly communication. Platforms on which identified papers have appeared include ResearchGate, ORCiD, Journal of Population Therapeutics and Clinical Pharmacology (JPTCP), Easychair, Frontiers, the Institute of Electrical and Electronics Engineer (IEEE), and X/Twitter. Thus, even if they are retracted from their original source, it will prove very difficult to track, remove, or even just mark them up on other platforms. Moreover, unless regulated, Google Scholar will enable their continued and most likely unlabeled discoverability.
Environment | researchgate.net (13) | orcid.org (4) | easychair.org (3) | ijope.com* (3) | publikasiindonesia.id (3) |
Health | researchgate.net (15) | ieee.org (4) | twitter.com (3) | jptcp.com** (2) | frontiersin.org (2) |
A word rain visualization (Centre for Digital Humanities Uppsala, 2023), which combines word prominences through TF-IDF 5 Term frequency–inverse document frequency , a method for measuring the significance of a word in a document compared to its frequency across all documents in a collection. scores with semantic similarity of the full texts of our sample of GPT-generated articles that fall into the “Environment” and “Health” categories, reflects the two categories in question. However, as can be seen in Figure 1, it also reveals overlap and sub-areas. The y-axis shows word prominences through word positions and font sizes, while the x-axis indicates semantic similarity. In addition to a certain amount of overlap, this reveals sub-areas, which are best described as two distinct events within the word rain. The event on the left bundles terms related to the development and management of health and healthcare with “challenges,” “impact,” and “potential of artificial intelligence”emerging as semantically related terms. Terms related to research infrastructures, environmental, epistemic, and technological concepts are arranged further down in the same event (e.g., “system,” “climate,” “understanding,” “knowledge,” “learning,” “education,” “sustainable”). A second distinct event further to the right bundles terms associated with fish farming and aquatic medicinal plants, highlighting the presence of an aquaculture cluster. Here, the prominence of groups of terms such as “used,” “model,” “-based,” and “traditional” suggests the presence of applied research on these topics. The two events making up the word rain visualization, are linked by a less dominant but overlapping cluster of terms related to “energy” and “water.”
The bar chart of the terms in the paper subset (see Figure 2) complements the word rain visualization by depicting the most prominent terms in the full texts along the y-axis. Here, word prominences across health and environment papers are arranged descendingly, where values outside parentheses are TF-IDF values (relative frequencies) and values inside parentheses are raw term frequencies (absolute frequencies).
Finding 3: Google Scholar presents results from quality-controlled and non-controlled citation databases on the same interface, providing unfiltered access to GPT-fabricated questionable papers.
Google Scholar’s central position in the publicly accessible scholarly communication infrastructure, as well as its lack of standards, transparency, and accountability in terms of inclusion criteria, has potentially serious implications for public trust in science. This is likely to exacerbate the already-known potential to exploit Google Scholar for evidence hacking (Tripodi et al., 2023) and will have implications for any attempts to retract or remove fraudulent papers from their original publication venues. Any solution must consider the entirety of the research infrastructure for scholarly communication and the interplay of different actors, interests, and incentives.
We searched and scraped Google Scholar using the Python library Scholarly (Cholewiak et al., 2023) for papers that included specific phrases known to be common responses from ChatGPT and similar applications with the same underlying model (GPT3.5 or GPT4): “as of my last knowledge update” and/or “I don’t have access to real-time data” (see Appendix A). This facilitated the identification of papers that likely used generative AI to produce text, resulting in 227 retrieved papers. The papers’ bibliographic information was automatically added to a spreadsheet and downloaded into Zotero. 6 An open-source reference manager, https://zotero.org .
We employed multiple coding (Barbour, 2001) to classify the papers based on their content. First, we jointly assessed whether the paper was suspected of fraudulent use of ChatGPT (or similar) based on how the text was integrated into the papers and whether the paper was presented as original research output or the AI tool’s role was acknowledged. Second, in analyzing the content of the papers, we continued the multiple coding by classifying the fraudulent papers into four categories identified during an initial round of analysis—health, environment, computing, and others—and then determining which subjects were most affected by this issue (see Table 1). Out of the 227 retrieved papers, 88 papers were written with legitimate and/or declared use of GPTs (i.e., false positives, which were excluded from further analysis), and 139 papers were written with undeclared and/or fraudulent use (i.e., true positives, which were included in further analysis). The multiple coding was conducted jointly by all authors of the present article, who collaboratively coded and cross-checked each other’s interpretation of the data simultaneously in a shared spreadsheet file. This was done to single out coding discrepancies and settle coding disagreements, which in turn ensured methodological thoroughness and analytical consensus (see Barbour, 2001). Redoing the category coding later based on our established coding schedule, we achieved an intercoder reliability (Cohen’s kappa) of 0.806 after eradicating obvious differences.
The ranking algorithm of Google Scholar prioritizes highly cited and older publications (Martín-Martín et al., 2016). Therefore, the position of the articles on the search engine results pages was not particularly informative, considering the relatively small number of results in combination with the recency of the publications. Only the query “as of my last knowledge update” had more than two search engine result pages. On those, questionable articles with undeclared use of GPTs were evenly distributed across all result pages (min: 4, max: 9, mode: 8), with the proportion of undeclared use being slightly higher on average on later search result pages.
To understand how the papers making fraudulent use of generative AI were disseminated online, we programmatically searched for the paper titles (with exact string matching) in Google Search from our local IP address (see Appendix B) using the googlesearch – python library(Vikramaditya, 2020). We manually verified each search result to filter out false positives—results that were not related to the paper—and then compiled the most prominent URLs by field. This enabled the identification of other platforms through which the papers had been spread. We did not, however, investigate whether copies had spread into SciHub or other shadow libraries, or if they were referenced in Wikipedia.
We used descriptive statistics to count the prevalence of the number of GPT-fabricated papers across topics and venues and top domains by subject. The pandas software library for the Python programming language (The pandas development team, 2024) was used for this part of the analysis. Based on the multiple coding, paper occurrences were counted in relation to their categories, divided into indexed journals, non-indexed journals, student papers, and working papers. The schemes, subdomains, and subdirectories of the URL strings were filtered out while top-level domains and second-level domains were kept, which led to normalizing domain names. This, in turn, allowed the counting of domain frequencies in the environment and health categories. To distinguish word prominences and meanings in the environment and health-related GPT-fabricated questionable papers, a semantically-aware word cloud visualization was produced through the use of a word rain (Centre for Digital Humanities Uppsala, 2023) for full-text versions of the papers. Font size and y-axis positions indicate word prominences through TF-IDF scores for the environment and health papers (also visualized in a separate bar chart with raw term frequencies in parentheses), and words are positioned along the x-axis to reflect semantic similarity (Skeppstedt et al., 2024), with an English Word2vec skip gram model space (Fares et al., 2017). An English stop word list was used, along with a manually produced list including terms such as “https,” “volume,” or “years.”
Haider, J., Söderström, K. R., Ekström, B., & Rödl, M. (2024). GPT-fabricated scientific papers on Google Scholar: Key features, spread, and implications for preempting evidence manipulation. Harvard Kennedy School (HKS) Misinformation Review . https://doi.org/10.37016/mr-2020-156
Antkare, I. (2020). Ike Antkare, his publications, and those of his disciples. In M. Biagioli & A. Lippman (Eds.), Gaming the metrics (pp. 177–200). The MIT Press. https://doi.org/10.7551/mitpress/11087.003.0018
Barbour, R. S. (2001). Checklists for improving rigour in qualitative research: A case of the tail wagging the dog? BMJ , 322 (7294), 1115–1117. https://doi.org/10.1136/bmj.322.7294.1115
Bom, H.-S. H. (2023). Exploring the opportunities and challenges of ChatGPT in academic writing: A roundtable discussion. Nuclear Medicine and Molecular Imaging , 57 (4), 165–167. https://doi.org/10.1007/s13139-023-00809-2
Cabanac, G., & Labbé, C. (2021). Prevalence of nonsensical algorithmically generated papers in the scientific literature. Journal of the Association for Information Science and Technology , 72 (12), 1461–1476. https://doi.org/10.1002/asi.24495
Cabanac, G., Labbé, C., & Magazinov, A. (2021). Tortured phrases: A dubious writing style emerging in science. Evidence of critical issues affecting established journals . arXiv. https://doi.org/10.48550/arXiv.2107.06751
Carrion, M. L. (2018). “You need to do your research”: Vaccines, contestable science, and maternal epistemology. Public Understanding of Science , 27 (3), 310–324. https://doi.org/10.1177/0963662517728024
Centre for Digital Humanities Uppsala (2023). CDHUppsala/word-rain [Computer software]. https://github.com/CDHUppsala/word-rain
Chinn, S., & Hasell, A. (2023). Support for “doing your own research” is associated with COVID-19 misperceptions and scientific mistrust. Harvard Kennedy School (HSK) Misinformation Review, 4 (3). https://doi.org/10.37016/mr-2020-117
Cholewiak, S. A., Ipeirotis, P., Silva, V., & Kannawadi, A. (2023). SCHOLARLY: Simple access to Google Scholar authors and citation using Python (1.5.0) [Computer software]. https://doi.org/10.5281/zenodo.5764801
Dadkhah, M., Lagzian, M., & Borchardt, G. (2017). Questionable papers in citation databases as an issue for literature review. Journal of Cell Communication and Signaling , 11 (2), 181–185. https://doi.org/10.1007/s12079-016-0370-6
Dadkhah, M., Oermann, M. H., Hegedüs, M., Raman, R., & Dávid, L. D. (2023). Detection of fake papers in the era of artificial intelligence. Diagnosis , 10 (4), 390–397. https://doi.org/10.1515/dx-2023-0090
DeGeurin, M. (2024, March 19). AI-generated nonsense is leaking into scientific journals. Popular Science. https://www.popsci.com/technology/ai-generated-text-scientific-journals/
Dunlap, R. E., & Brulle, R. J. (2020). Sources and amplifiers of climate change denial. In D.C. Holmes & L. M. Richardson (Eds.), Research handbook on communicating climate change (pp. 49–61). Edward Elgar Publishing. https://doi.org/10.4337/9781789900408.00013
Fares, M., Kutuzov, A., Oepen, S., & Velldal, E. (2017). Word vectors, reuse, and replicability: Towards a community repository of large-text resources. In J. Tiedemann & N. Tahmasebi (Eds.), Proceedings of the 21st Nordic Conference on Computational Linguistics (pp. 271–276). Association for Computational Linguistics. https://aclanthology.org/W17-0237
Google Scholar Help. (n.d.). Inclusion guidelines for webmasters . https://scholar.google.com/intl/en/scholar/inclusion.html
Gu, J., Wang, X., Li, C., Zhao, J., Fu, W., Liang, G., & Qiu, J. (2022). AI-enabled image fraud in scientific publications. Patterns , 3 (7), 100511. https://doi.org/10.1016/j.patter.2022.100511
Gusenbauer, M., & Haddaway, N. R. (2020). Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Research Synthesis Methods , 11 (2), 181–217. https://doi.org/10.1002/jrsm.1378
Haider, J., & Åström, F. (2017). Dimensions of trust in scholarly communication: Problematizing peer review in the aftermath of John Bohannon’s “Sting” in science. Journal of the Association for Information Science and Technology , 68 (2), 450–467. https://doi.org/10.1002/asi.23669
Huang, J., & Tan, M. (2023). The role of ChatGPT in scientific communication: Writing better scientific review articles. American Journal of Cancer Research , 13 (4), 1148–1154. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10164801/
Jones, N. (2024). How journals are fighting back against a wave of questionable images. Nature , 626 (8000), 697–698. https://doi.org/10.1038/d41586-024-00372-6
Kitamura, F. C. (2023). ChatGPT is shaping the future of medical writing but still requires human judgment. Radiology , 307 (2), e230171. https://doi.org/10.1148/radiol.230171
Littell, J. H., Abel, K. M., Biggs, M. A., Blum, R. W., Foster, D. G., Haddad, L. B., Major, B., Munk-Olsen, T., Polis, C. B., Robinson, G. E., Rocca, C. H., Russo, N. F., Steinberg, J. R., Stewart, D. E., Stotland, N. L., Upadhyay, U. D., & Ditzhuijzen, J. van. (2024). Correcting the scientific record on abortion and mental health outcomes. BMJ , 384 , e076518. https://doi.org/10.1136/bmj-2023-076518
Lund, B. D., Wang, T., Mannuru, N. R., Nie, B., Shimray, S., & Wang, Z. (2023). ChatGPT and a new academic reality: Artificial Intelligence-written research papers and the ethics of the large language models in scholarly publishing. Journal of the Association for Information Science and Technology, 74 (5), 570–581. https://doi.org/10.1002/asi.24750
Martín-Martín, A., Orduna-Malea, E., Ayllón, J. M., & Delgado López-Cózar, E. (2016). Back to the past: On the shoulders of an academic search engine giant. Scientometrics , 107 , 1477–1487. https://doi.org/10.1007/s11192-016-1917-2
Martín-Martín, A., Thelwall, M., Orduna-Malea, E., & Delgado López-Cózar, E. (2021). Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: A multidisciplinary comparison of coverage via citations. Scientometrics , 126 (1), 871–906. https://doi.org/10.1007/s11192-020-03690-4
Simon, F. M., Altay, S., & Mercier, H. (2023). Misinformation reloaded? Fears about the impact of generative AI on misinformation are overblown. Harvard Kennedy School (HKS) Misinformation Review, 4 (5). https://doi.org/10.37016/mr-2020-127
Skeppstedt, M., Ahltorp, M., Kucher, K., & Lindström, M. (2024). From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts. Information Visualization , 23 (3), 217–238. https://doi.org/10.1177/14738716241236188
Swedish Research Council. (2017). Good research practice. Vetenskapsrådet.
Stokel-Walker, C. (2024, May 1.). AI Chatbots Have Thoroughly Infiltrated Scientific Publishing . Scientific American. https://www.scientificamerican.com/article/chatbots-have-thoroughly-infiltrated-scientific-publishing/
Subbaraman, N. (2024, May 14). Flood of fake science forces multiple journal closures: Wiley to shutter 19 more journals, some tainted by fraud. The Wall Street Journal . https://www.wsj.com/science/academic-studies-research-paper-mills-journals-publishing-f5a3d4bc
The pandas development team. (2024). pandas-dev/pandas: Pandas (v2.2.2) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.10957263
Thorp, H. H. (2023). ChatGPT is fun, but not an author. Science , 379 (6630), 313–313. https://doi.org/10.1126/science.adg7879
Tripodi, F. B., Garcia, L. C., & Marwick, A. E. (2023). ‘Do your own research’: Affordance activation and disinformation spread. Information, Communication & Society , 27 (6), 1212–1228. https://doi.org/10.1080/1369118X.2023.2245869
Vikramaditya, N. (2020). Nv7-GitHub/googlesearch [Computer software]. https://github.com/Nv7-GitHub/googlesearch
This research has been supported by Mistra, the Swedish Foundation for Strategic Environmental Research, through the research program Mistra Environmental Communication (Haider, Ekström, Rödl) and the Marcus and Amalia Wallenberg Foundation [2020.0004] (Söderström).
The authors declare no competing interests.
The research described in this article was carried out under Swedish legislation. According to the relevant EU and Swedish legislation (2003:460) on the ethical review of research involving humans (“Ethical Review Act”), the research reported on here is not subject to authorization by the Swedish Ethical Review Authority (“etikprövningsmyndigheten”) (SRC, 2017).
This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided that the original author and source are properly credited.
All data needed to replicate this study are available at the Harvard Dataverse: https://doi.org/10.7910/DVN/WUVD8X
The authors wish to thank two anonymous reviewers for their valuable comments on the article manuscript as well as the editorial group of Harvard Kennedy School (HKS) Misinformation Review for their thoughtful feedback and input.
This week: the arXiv Accessibility Forum
Help | Advanced Search
Title: can llms generate novel research ideas a large-scale human study with 100+ nlp researchers.
Abstract: Recent advancements in large language models (LLMs) have sparked optimism about their potential to accelerate scientific discovery, with a growing number of works proposing research agents that autonomously generate and validate new ideas. Despite this, no evaluations have shown that LLM systems can take the very first step of producing novel, expert-level ideas, let alone perform the entire research process. We address this by establishing an experimental design that evaluates research idea generation while controlling for confounders and performs the first head-to-head comparison between expert NLP researchers and an LLM ideation agent. By recruiting over 100 NLP researchers to write novel ideas and blind reviews of both LLM and human ideas, we obtain the first statistically significant conclusion on current LLM capabilities for research ideation: we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility. Studying our agent baselines closely, we identify open problems in building and evaluating research agents, including failures of LLM self-evaluation and their lack of diversity in generation. Finally, we acknowledge that human judgements of novelty can be difficult, even by experts, and propose an end-to-end study design which recruits researchers to execute these ideas into full projects, enabling us to study whether these novelty and feasibility judgements result in meaningful differences in research outcome.
Comments: | main paper is 20 pages |
Subjects: | Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG) |
Cite as: | [cs.CL] |
(or [cs.CL] for this version) | |
Focus to learn more arXiv-issued DOI via DataCite (pending registration) |
Access paper:.
Code, data and media associated with this article, recommenders and search tools.
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .
Recommended databases, google scholar, how to recognize a scholarly, peer-reviewed article.
Google Scholar is a simple tool to search for scholarly (peer-reviewed) articles, and see how many times an article has been cited in other research. And when you hit a paywall, it can connect to Webster University Library's resources. If you are using a computer with an off-campus IP address, follow these instructions to connect Google Scholar to library resources:
Scholarly articles are sometimes "peer-reviewed" or "refereed" because they are evaluated by other scholars or experts in the field before being accepted for publication. A scholarly article is commonly an experimental or research study, or an in-depth theoretical or literature review. It is usually many more pages than a magazine article.
The clearest and most reliable indicator of a scholarly article is the presence of references or citations. Look for a list of works cited, a reference list, and/or numbered footnotes or endnotes. Citations are not merely a check against plagiarism. They set the article in the context of a scholarly discussion and provide useful suggestions for further research.
Many of our databases allow you to limit your search to just scholarly articles. This is a useful feature, but it is not 100% accurate in terms of what it includes and what it excludes. You should still check to see if the article has references or citations.
The table below compares some of the differences between magazines (e.g. Psychology Today) and journals (e.g Journal of Abnormal Psychology).
Popular magazines | Scholarly journals | |
---|---|---|
Reference list, citations | no | yes |
Appearance | flashy cover, photographs, advertisements | mostly text, often graphs and charts of data, few ads |
Titles | short and catchy | long and precise |
Article length | short | long |
Audience | general public | students, professionals, researchers |
Authors | staff writers, journalists | practitioners, theorists, educators |
Peer-review | no | yes |
Publisher | commercial company | educational institution or professional organization |
IMAGES
VIDEO
COMMENTS
Google Scholar provides a simple way to broadly search for scholarly literature. Search across a wide variety of disciplines and sources: articles, theses, books, abstracts and court opinions.
Find the research you need | With 160+ million publication pages, 1+ million questions, and 25+ million researchers, this is where everyone can access science
Semantic Scholar is a free, AI-powered research tool for scientific literature, based at Ai2. Semantic Scholar uses groundbreaking AI and engineering to understand the semantics of scientific literature to help Scholars discover relevant research.
Access 160+ million publications and connect with 25+ million researchers. Join for free and gain visibility by uploading your research.
Get 30 days free. 1. Google Scholar. Google Scholar is the clear number one when it comes to academic search engines. It's the power of Google searches applied to research papers and patents. It not only lets you find research papers for all academic disciplines for free but also often provides links to full-text PDF files.
Enrich your research with primary sources Enrich your research with primary sources. ... Part of UN Secretary-General Papers: Ban Ki-moon (2007-2016) Part of Perspectives on Terrorism, Vol. 12, No. 4 ... Search for images Enhance your scholarly research with underground newspapers, magazines, and journals. ...
Elsevier journals offer the latest peer-reviewed research papers on climate change, biodiversity, renewable energy and other topics addressing our planet's climate emergency. ... Keep up to date with health and medical developments to stimulate research and improve patient care. Search our books and journals covering education, reference ...
Work faster and smarter with advanced research discovery tools. Search the full text and citations of our millions of papers. Download groups of related papers to jumpstart your research. Save time with detailed summaries and search alerts. Advanced Search; PDF Packages of 37 papers; Summaries and Search Alerts
Scopus' literature search is built to distill massive amounts of information down to the most relevant documents and information in less time. With Scopus you can search and filter results in the following ways: Document search: Search directly from the homepage and use detailed search options to ensure you find the document (s) you want.
Finding recent papers. Your search results are normally sorted by relevance, not by date. To find newer articles, try the following options in the left sidebar: ... There's rarely a single answer to a research question. Click "Related articles" or "Cited by" to see closely related work, or search for author's name and see what else they have ...
Research Policy Adviser Aggregation plays an increasingly essential role in maximising the long-term benefits of open access, helping to turn the promise of a 'research commons' into a reality. The aggregation services that CORE provides therefore make a very valuable contribution to the evolving open access environment in the UK.
Advanced. Journal List. PubMed Central ® (PMC) is a free full-text archive of biomedical and life sciences journal literature at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM)
11 Best Academic Writing Tools For Researchers. #6. CORE. CORE is an academic search engine that focuses on open access research papers. A link to the full text PDF or complete text web page is supplied for each search result. It's academic search engine dedicated to open access research papers.
Search for research papers. Ask a research question and get back a list of relevant papers from our database of 125 million. Get one sentence abstract summaries. Select relevant papers and search for more like them. Extract details from papers into an organized table. Synthesis.
Search through over 200M research papers across every domain of science & academia. Time-saving AI insights. Get instant insights with Copilot, the Consensus Meter and more. We leverage both OpenAI & custom LLMs. ... "No more endless scrolling and scanning research papers. Simply ask a question and Consensus gives you AI-powered summaries of ...
Elsevier Journal Finder helps you find journals that could be best suited for publishing your scientific article. Journal Finder uses smart search technology and field-of-research specific vocabularies to match your paper's abstract to scientific journals.
Get a visual overview of a new academic field. Enter a typical paper and we'll build you a graph of similar papers in the field. Explore and build more graphs for interesting papers that you find - soon you'll have a real, visual understanding of the trends, popular works and dynamics of the field you're interested in.
Stand on the shoulders of giants. Google Scholar provides a simple way to broadly search for scholarly literature. Search across a wide variety of disciplines and sources: articles, theses, books, abstracts and court opinions.
Academic search engine for students and researchers. Locates relevant academic search results from web pages, books, encyclopedias, and journals.
Scopus allows you to search for publications based on search terms relating to specific parts of a document (e.g., title, author, keywords, ISSN). For information about how to work with document search results, see. Use boolean operators to combine different search queries and proximity operators to find words near/within a specified distance ...
Whether you are looking for research papers in physics, literature, humanities, or any other field, HIX Scholar has got you covered. We Empower All Knowledge Seekers. ... Try Our AI Search Engine for Research for Free. Search and retrieve relevant information from hundreds of millions of papers. You may be searching for:
About the directory. DOAJ is a unique and extensive index of diverse open access journals from around the world, driven by a growing community, and is committed to ensuring quality content is freely available online for everyone. DOAJ is committed to keeping its services free of charge, including being indexed, and its data freely available.
Make an impact and build your research profile in the open with ScienceOpen. Search and discover relevant research in over 96 million Open Access articles and article records; Share your expertise and get credit by publicly reviewing any article; Publish your poster or preprint and track usage and impact with article- and author-level metrics; Create a topical Collection to advance your ...
Search for more papers by this author. Cozette Comer, Cozette Comer. University Libraries, Virginia Tech, Blacksburg, Virginia, USA. ... Preprint repositories are another potential avenue to search for recent studies. A preprint is a research paper posted on a public server that has not undergone formal peer review.
Search this Guide Search. Open Access for Social Work Research. Home; Directories and Indexes; Journals; Subject Guides for Your Research; Get Help; Social Work Research Support. Lucy Flamm, Social Work Librarian, is available for appointments and workshops relating to social work research including identifying sources, citations, and data ...
Need to find articles for your research paper? ... Academic Search Premier. A multi-disciplinary database designed for academic institutions. The database offers information in nearly every area of academic study, including arts, biology, chemistry, computer sciences, ethnic studies, engineering, language and linguistics, literature, medical ...
The latest resurgence in the U.S. of policies aimed at reducing imports and bolstering domestic production has included the expansion of Buy American provisions. While some of these are new and untested, in this paper we evaluate long-standing procurement limitations on the purchase of foreign products by the U.S. Federal Government.
Academic journals, archives, and repositories are seeing an increasing number of questionable research papers clearly produced using generative AI. They are often created with widely available, general-purpose AI applications, most likely ChatGPT, and mimic scientific writing. Google Scholar easily locates and lists these questionable papers alongside reputable, quality-controlled research.
Recent advancements in large language models (LLMs) have sparked optimism about their potential to accelerate scientific discovery, with a growing number of works proposing research agents that autonomously generate and validate new ideas. Despite this, no evaluations have shown that LLM systems can take the very first step of producing novel, expert-level ideas, let alone perform the entire ...
Look for a list of works cited, a reference list, and/or numbered footnotes or endnotes. Citations are not merely a check against plagiarism. They set the article in the context of a scholarly discussion and provide useful suggestions for further research. Many of our databases allow you to limit your search to just scholarly articles.