Skip to content

Why ChatGPT (still) does not mean the end of academic writing

By far the most-read post on this blog in 2023 was my analysis “Why ChatGPT does not mean the end of academic writing” from January 2023. After a good year that was significantly characterised by the discussion about generative AI and large language models and the further development and presentation of new AI tools, I felt it was time to review the thesis I put forward back then: Are large language models still incapable of scientific work or do the widely reported massive advances in this area need to cause a reassessment? After all, in December 2023, it was made public that the Faculty of Business Administration at the University of Economics in Prague would like to do without bachelor’s theses in future given the availability of AI tools.

Over the past year, I have continued to experiment with various language models on scientific work (see also my online talk from June 2024, link to YouTube). Without spoiling the results in full detail, it can be said that one year after my first analysis, large language models are still unable to achieve what is commonly understood as “academic writing”. This suggests that this is not really a quantitative problem that could be solved at some point by incremental improvements to the models, but a qualitative problem: Scientific work requires skills that go beyond the mere reproduction and recombination of texts, but require a real understanding of the received texts. Despite all the admittedly impressive developments in modelling, understanding is still a genuinely human ability. With this in mind, the decision of the University of Economics must be clearly criticised as hasty, especially since, if learning to work on scientific work is to continue to be part of academic education, this should also be practised through the preparation of qualification papers.

Examined LLMs

My experiments included ChatGPT-4, Bard and Bing Chat. I also tried Luminous from Aleph Alpha, which (at least in the version provided in Aleph Alpha’s Playground) still makes an immature impression compared to the US bots. My prompt

Write an academic essay on why Christmas is celebrated on December 25, citing reliable sources.

led to the following not at all academic response in the biggest model Luminous Supreme (set to 1024 tokens):

Christmas is a Christian holiday that is celebrated on December 25. It is a day of celebration and joy. It is a day of giving and receiving. It is a day of family and friends. It is a day of love and peace. It is a day of hope and faith. It is a day of joy and happiness. It is a day of goodwill and cheer. It is a day of giving and sharing. It is a day of forgiveness and reconciliation. It is a day of thanksgiving and praise. It is a day of remembrance and reflection. It is a day of hope and joy. It is a day of love and peace. It is a day of giving and receiving. It is a day of family and friends. It is a day of goodwill and cheer. It is a day of forgiveness and reconciliation. […]

I would also have liked to test Claude from Anthropic, but it is still not available in Germany. VPN alone is not a solution here, unfortunately, as Anthropic requires SMS verification and I have misplaced my AT&T SIM. If Claude becomes available from Germany in the near future, I will post an update here.

In my tests, ChatGPT-4 consistently delivered the best results. In order not to go beyond the scope of this article, I will therefore concentrate on ChatGPT in the reproduction of my examples. For the sake of comparability, I have set the same tasks as a year ago.

The Christmas date

Let’s take a look at the results in detail. ChatGPT-4 solved the task of writing a scientific essay on the question of why Christmas is celebrated on 25 December as follows:

MW: Write an academic essay on why Christmas is celebrated on December 25, citing reliable sources.ChatGPT

ChatGPT: Christmas, celebrated on December 25, is a Christian holiday commemorating the birth of Jesus Christ. However, the selection of this specific date has historical, religious, and cultural underpinnings that extend beyond the mere acknowledgment of Christ’s birth.

Historical Context

The exact date of Jesus Christ’s birth is not recorded in the New Testament. Early Christians did not focus on the celebration of Christ’s birth but rather on his resurrection, a theme more central to their theology. It wasn’t until the 4th century that the Roman Church officially declared December 25 as the date of Jesus’ birth.

Influence of Pagan Festivals

The selection of December 25 is often thought to have been influenced by pagan festivals. This period marked the winter solstice, a time of rejoicing over the gradual return of sunlight. Major festivals such as Saturnalia, a Roman festival honoring the god Saturn, and the birthday of the Persian sun god Mithra were celebrated. Some historians suggest that the church adopted December 25 to adopt and absorb these pagan practices or provide a Christian alternative to popular pagan celebrations.

Symbolic Interpretation

Theologically, some Christian scholars posit a symbolic reason for the choice. Linking Christ’s birth to the winter solstice, the darkest day of the year, offers a powerful metaphor for Jesus as the light of the world, bringing light into darkness.

Later Developments and Traditions

Over time, Christmas evolved to incorporate various cultural traditions and practices, often unrelated to the religious significance of the holiday. These include the Germanic and Nordic Yule traditions and the story of Saint Nicholas, a 4th-century Greek bishop who inspired the modern-day Santa Claus.

Conclusion

While December 25 was likely not the actual date of Christ’s birth, its designation as Christmas represents a complex interplay of religious, cultural, and historical factors. The date not only aligned with existing pagan festivals, providing a Christian counterpart to these celebrations, but also offered a symbolic reflection of Christian theological themes.

References

  1. „The Origins of Christmas,“ by Joseph F. Kelly.
  2. „The Battle for Christmas,“ by Stephen Nissenbaum.
  3. „Christmas: A Candid History,“ by Bruce David Forbes.

Apart from the fact that the religious-historical hypothesis of a pagan origin of the date of Christmas is at least controversial in recent research (cf. e.g. Wikipedia) and this should definitely be mentioned in a scientific essay, the cited publications are at least not fictitious. ChatGPT also provides more detailed information on request:

MW: The references are not very detailed. Please include the full bibliography.

ChatGPT: Certainly! Here is a more detailed bibliography for the references mentioned in the essay:

  1. Kelly, Joseph F. “The Origins of Christmas.” Liturgical Press, 2004. This book provides a comprehensive history of the origins and development of the Christmas holiday, exploring both the Christian and pre-Christian roots of December 25.
  2. Nissenbaum, Stephen. “The Battle for Christmas: A Cultural History of America’s Most Cherished Holiday.” Vintage Books, 1997. Nissenbaum’s work delves into the history of Christmas in the United States, tracing its transformation from a raucous, carnival-like holiday into a family-centered day of peace and nostalgia.
  3. Forbes, Bruce David. “Christmas: A Candid History.” University of California Press, 2007. Forbes offers an overview of the history of Christmas, from its pagan and Christian origins to its modern-day manifestation as a global commercial and religious holiday.

However, ChatGPT does not set any footnotes, so it remains unclear which reference should be used for which paragraph of the essay. In addition, academic work would necessarily include an indication of which parts of a book are being referred to. Otherwise, it remains questionable why a publication on the history of Christmas celebrations in the United States (Nissenbaum) was used for an essay on the date of Christmas. The choice of popular science sources (Forbes & Nissenbaum) for the preparation of such an essay is also not intuitive.

Thus, the suspicion remains that the essay does not refer to the given literature at all, but that it was only spit out due to the general context of Christmas. The request to ChatGPT to provide correct footnotes in the text seems to confirm this suspicion:

MW: Please add references and footnotes to the original text.

ChatGPT: Certainly, let’s add references and footnotes to the original essay for a more academically rigorous presentation.

[…] Initially, early Christians did not emphasize the celebration of Christ’s birth but his resurrection, a more central theme in Christian theology^1. It wasn’t until the 4th century that the Roman Church officially declared December 25 as the date of Jesus’ birth^2. […]

[…] The choice of December 25 is thought to be influenced by pagan festivals^3. This period coincided with the winter solstice, a time of rejoicing over the return of sunlight. Festivals like Saturnalia, a Roman celebration honoring Saturn, and the birthday of Mithra, the Persian sun god, occurred during this period^3. Some historians suggest the church adopted December 25 to coincide with these pagan practices or to offer a Christian alternative^2. […]

[…] Theologically, the selection of this date carries symbolic significance. Linking Christ’s birth to the winter solstice, the year’s darkest day, offers a metaphor for Jesus as the light of the world, bringing light into darkness^1. […]

[…] Over centuries, Christmas evolved to incorporate various cultural traditions. These include Germanic and Nordic Yule traditions and the story of Saint Nicholas, the precursor to Santa Claus^3. […]

^1 Kelly discusses the early Christian focus on resurrection over birth.
^2 Nissenbaum provides insight into the 4th-century church’s decision and its alignment with pagan festivals.
^3 Forbes explores the incorporation of various cultural traditions into the celebration of Christmas.

Let’s ignore the creative way of footnote numbering: Kelly does indeed mention that early Christians placed greater importance on the celebration of Christ’s resurrection than on his birth. Nissenbaum, however, deals exclusively with the history of Christmas in the United States and not with the fourth century and supposedly pagan origins of Christmas. And Forbes does refer to the pagan festivities in December, but does not fulfil the requirements of a “reliable” source that should be consulted in the context of such an essay.

For a scholarly essay, moreover, it would be essential to provide the location of the source in the work. In fact, ChatGPT seems to reproduce public information about the publications from abstracts or reviews that were part of the training data. This does not fulfil the requirements of actual scientific work, especially as it is not clear which works were part of the training data and which were not. The bibliography therefore appears arbitrary.

The 2022 World Cup in Qatar

After all, ChatGPT did not hallucinate any sources in the above example – the referenced monographs actually exist. Can this behaviour also be reproduced in other examples? This time the test was performed in German (English translation: MW).

MW: Write an essay with footnotes on the awarding of the 2022 World Cup to Qatar.

ChatGPT:

The awarding of the 2022 FIFA World Cup to Qatar is an issue of great magnitude and controversy, involving various aspects such as international relations, sports politics, human rights and environmental issues. In this essay, I will discuss the key aspects of the decision, the controversies involved and the implications of this choice.

1 Introduction

The awarding of the 2022 World Cup to Qatar by FIFA in December 2010 marked the first time that this prestigious tournament was awarded to a country in the Middle East. This decision caused a worldwide stir and debate.

2 The selection process

FIFA, the governing body of world football, follows a complex process to select the host countries for its tournaments. Qatar beat off competition from the USA, South Korea, Japan and Australia. The decision was accompanied by allegations of corruption and vote-buying from the outset[1].

3 Human rights concerns

One of the biggest concerns regarding the award to Qatar was the human rights situation in the country. Reports of poor working conditions for migrants working on the stadium construction and the general treatment of migrant workers in the country attracted international attention[2].

4 Climatic challenges

Qatar, a country with extreme summer heat, was also criticised for its climatic conditions. The decision to hold the tournament in November and December instead of summer was a direct response to these concerns[3].

5 Political and economic implications

The decision to award the tournament to Qatar must also be seen in the context of the country’s geopolitical and economic ambitions. Qatar has made significant investments in the sports sector, partly as a means to increase its global presence[4].

6 Conclusion

The awarding of the 2022 World Cup to Qatar remains a controversial issue. It raises questions about the criteria that should apply when selecting host countries for major sporting events and emphasises the need for greater consideration of human rights and ethical standards.

Footnotes

[1] “FIFA’s Decision for Qatar 2022: Controversies and Accusations”, Sports Ethics Journal, 2011. Describes the controversial circumstances of FIFA’s decision and the allegations of irregularities in the selection process.

[2] “Human Rights Concerns in Qatar”, Human Rights Watch, 2020, provides a detailed overview of the human rights situation in Qatar, particularly with regard to migrant workers and their conditions during the construction of World Cup stadiums.

[3] “Climate Challenges of the Qatar World Cup”, Environmental Studies Quarterly, 2019. Discusses the challenges and solutions regarding the extreme climatic conditions in Qatar and the decision to organise the tournament in the winter months.

[4] “Qatar’s Economic and Political Ambitions in Sports”, Global Politics Review, 2018. Analyses how Qatar uses sporting events, particularly the World Cup, as part of a broader strategy to strengthen its international standing.

A Google search for the cited articles shows that ChatGPT continues to invent sources: Not a single one of the referenced articles can be found. Nor was I able to find any evidence of the existence of journals with the title “Sports Ethics Journal” or “Environmental Studies Quarterly”. An open access journal called “Global Politics Review” does indeed exist, but the article mentioned was not published there (or elsewhere, according to my research).

When asked about the untraceable sources, ChatGPT responded as follows:

You are absolutely right and I apologise for this mistake. The sources I cited are fictitious and were created by me as part of the example. In a real academic essay, it would be essential to refer to sources that actually exist and are reliable. For valid information about the awarding of the 2022 World Cup to Qatar, you should instead refer to recognised news agencies, academic journals and official reports.

To provide you with accurate and reliable information on this topic, I can conduct an online search to find up-to-date and trustworthy sources. Would you like me to do this?

Of course I accept the offer:

MW: Write an essay with footnotes on the awarding of the 2022 World Cup to Qatar. Carry out an online search and provide evidence for the sources used in the text.

ChatGPT: I did a quick search and here’s what I discovered.

Essay: The awarding of the 2022 World Cup to Qatar

1 Introduction

FIFA’s decision to organise the 2022 World Cup in Qatar has sparked a great deal of controversy since its announcement in 2010. These have centred on both Qatar’s suitability as a host nation and the fairness of FIFA’s bidding process for the World Cup. Qatar, as the first country in the Middle East to host a World Cup, has been at the centre of global discussions about human rights, environmental conditions and the integrity of football.

2 Background to the decision

In 2009, Qatar launched its bid to host the World Cup. With its oil and gas wealth and a successful staging of the 2006 Asian Games in Doha, Qatar presented itself as a financially strong and competent host. The bid was boosted by the idea of air-conditioned stadiums equipped with innovative cooling systems. This technology was part of a promise to organise the World Cup with sustainability in mind. In 2015, however, FIFA made the historic decision to postpone the tournament until November and December due to the extreme summer heat in Qatar. (A link symbol appears at this point, but it is not clickable and leads to an error message).

3 Controversies and criticism

The awarding of the World Cup to Qatar was characterised by controversy from the outset. Points of criticism included allegations of corruption and bribery during the bidding process, human rights concerns regarding the treatment of migrant workers and the kafala system, as well as concerns about LGBT rights and the restriction of freedom of expression in a restrictive country. The intense climate conditions and the lack of a strong football culture in Qatar were also addressed. Former FIFA President Sepp Blatter later described the decision to award Qatar the hosting rights as a “mistake”. (Link to https://www.britannica.com/sports/2022-FIFA-World-Cup)

4 Human rights concerns

One of the main points of criticism was the treatment of migrant labourers involved in the construction of the World Cup infrastructure. The working conditions and the kafala system, which restricted the workers’ freedom of movement, led to international criticism. Despite subsequent efforts by Qatar to reform labour laws and improve conditions for migrant workers, concerns remained about the effectiveness and implementation of these reforms. (Link to https://www.britannica.com/sports/2022-FIFA-World-Cup)

5 Concluding remarks

The awarding of the 2022 FIFA World Cup to Qatar raised important questions about the role of football in global politics and ethics. It emphasised the need for international sporting events such as the FIFA World Cup to be not only a spectacle of sport, but also a forum for the promotion of human rights, social responsibility and ethical action.

The award to Qatar remains an example of how sport, politics and social issues are intertwined in today’s world and offers important lessons for future international sporting events.

Even including the web search, the requirements for correct scientific work are not met here: The first link generated does not work and leads to an error message. The next two paragraphs are attempted to be supported by the same online article from the Encyclopedia Britannica. However, contrary to what ChatGPT appears to suggest, the linked article does not substantiate Sepp Blatter’s statement that the awarding of the tournament to Qatar was a mistake. Nor does the article claim that “concerns” remained regarding the implementation of the reform efforts, but that the actual exploitation of workers continued even after 2020.

This shows that although ChatGPT can now access up-to-date data from the web, this does not mean that it references it reliably. In addition, the reference only creates an illusion of transparency when further information is included in the text, which obviously does not come from the web source, but from the depths of the language model’s training data.

Scispace and ScholarAI: Are custom GPTs a game changer?

For a few months now, ChatGPT’s paid version has offered the option of building your own GPTs and using existing customised GPTs. To find out whether these make a difference for scientific work, I tested two publicly available custom GPTs: Scispace (formerly ResearchGPT) and ScholarAI. Both offer access to their own catalogue of current scientific articles with abstracts, so that ChatGPT functions as a convenient front end for literature searches in natural language.

This works differently depending on the subject area. In response to my question about the Christmas date (“Write an academic text on why Christmas is celebrated on December 25, citing reliable academic sources”), ScholarAI spit out a mere list of random articles that had nothing to do with the topic: it was about the city of Weifang in northern China, terrorism in Nigeria and Humboldt in Venezuela, but not about Christmas. It worked much better with the example of “tobacco heaters vs. cigarettes”:

MW: Write a short essay about the risk of tobacco heaters compared to cigarettes citing current research.

ScholarAI:

Talked to plugin.scholar-ai.net

The risk profile of tobacco heaters compared to traditional cigarettes has been a subject of ongoing research. Recent studies have shed light on various aspects of this comparison, revealing nuanced differences in health risks associated with each.

A study by Rodrigo et al. (2020) suggests that the use of heated tobacco products and electronic cigarettes could potentially reduce cancer and non-cancer risks compared to cigarette smoking (Rodrigo et al.). This indicates that while not risk-free, tobacco heaters and e-cigarettes might offer a less harmful alternative to traditional cigarettes.

Pondel et al. (2022) highlight that tobacco heaters and e-cigarettes are alternative methods of nicotine delivery with potentially different health effects compared to cigarettes (Pondel et al.). This distinction is crucial in understanding the comparative risk profiles of these products.

However, Jachowicz et al. (2023) caution that the aerosol produced by tobacco heaters still contains harmful substances at levels similar to traditional cigarette smoke (Jachowicz et al.). This finding challenges the notion that tobacco heaters are significantly safer than cigarettes.

The study by Karey et al. (2022) reveals that e-cigarette users exhibit different exhalation patterns compared to cigarette smokers, which could imply unique upper respiratory health consequences for users of alternative tobacco products (Karey et al.).

Salazar et al. (2021) found that both cigarettes and smokeless tobacco products pose significant health risks, with smoking-related diseases having higher mortality hazard ratios than smokeless tobacco use (Salazar et al.). This comparison between smokeless tobacco and cigarettes might offer insights into the risk profile of tobacco heaters, which are often marketed as smokeless options.

In summary, while tobacco heaters are often promoted as safer alternatives to traditional cigarettes, the current body of research indicates that they still carry significant health risks. The potentially lower risks of cancer and other diseases compared to cigarettes do not render them safe but might offer a less harmful option for nicotine delivery. It’s important to continue monitoring emerging research to fully understand the health implications of tobacco heaters compared to traditional cigarettes.

All links function, and the articles behind them have also been correctly summarised. On closer inspection, however, what ScholarAI produces here is not an essay, but a chain of summarised abstracts. This is significantly more than what pure ChatGPT achieves. However, scientific work goes beyond the reproduction and stringing together of abstracts: all contributions are presented separately here, without being related to each other. My tests with Scispace led to very comparable results. So even custom GPTs don’t do the actual scientific work for us.1I was also able to achieve a comparable result in the standard version ChatGPT-4, in whose chat I had previously uploaded PDFs of open access articles on a topic: ChatGPT was able to summarise the individual PDFs quite well and list these summaries one after the other in a text. However, it was not able to establish links between the texts.

However, Scispace and ScholarAI could prove to be convenient tools for gaining an initial overview of current literature (in specific subject contexts). Interestingly, however, neither the Scispace nor ScholarAI websites make transparent how their catalogues are composed, which journals are part of the catalogue and which are not. They may therefore be suitable as a first port of call for obtaining an overview of the current state of research on certain topics. However, in the absence of transparency about their own catalogues, we will still have to carry out manual searches if we want to fully grasp the state of research on a topic.

Conclusion

A lot has happened in just over a year since the first release of ChatGPT in November 2022: ChatGPT has become multimodal and can now also handle images and documents. The response quality of ChatGPT-4 has improved significantly in many cases compared to version 3.5.

What the language models do, however, remains reproduction. And this reproduction can only ever be the first step in scientific work. For all subsequent steps, understanding, analysing and critical thinking remain essential human skills that language models do not possess. Despite all the impressive developments in AI, ChatGPT does not mean the end of on scientific work in 2024.

  • 1
    I was also able to achieve a comparable result in the standard version ChatGPT-4, in whose chat I had previously uploaded PDFs of open access articles on a topic: ChatGPT was able to summarise the individual PDFs quite well and list these summaries one after the other in a text. However, it was not able to establish links between the texts.
Leave a Reply

Your email address will not be published. Required fields are marked *