Experiment tests if teachers can identify AI content

An experiment to see if teachers could distinguish between AI and student generated content has found that (60%) of the teachers involved struggled to distinguish between AI and student written content.

The experiement, conducted by High Speed Training, held a focus group involving 15 secondary school level teachers who were each provided with two answers to real questions from past exam papers covering English Language, Geography and Religious Studies, and were required to say whether they believed the content was written by students or AI. A focus group of three GCSE aged students created the human content, with the same questions also answered by ChatGPT.

With each participant reviewing one AI and one student answer each, three in five (60%) teachers struggled to identify at least one answer they were asked to review, with one in three (33%) failing to correctly identify both of the answers they reviewed. In total, nearly half (47%) of all of the answers reviewed by the focus group were wrongly identified.

Teachers were also asked to provide a rough indication of the quality of the answer, by assigning a numerical ‘grade’ of 1 to 5. Answers generated by ChatGPT scored an average grade of 4, with teachers generally viewing the content to a high standard.

Teachers were also more likely to assign a higher grade to an answer when they believed it was AI. When teachers correctly identified AI content, they assigned an average grade of 4.3, whilst the same answers were graded at an average of 3.7 when the teacher thought they were human, suggesting that teachers expect AI to create high quality content.

The study also experimented to see whether another AI program could detect whether the answers from the study were human or AI. Entering both the student and ChatGPT answers into Google Bard, the experiment found that the software isn’t always successful in identifying where AI has been used, as has already been found out by some in the education sector1. Of the different answers, Bard misidentified a third (33%), wrongly identifying one human answer as AI, and two AI answers as human.

Dr Richard Anderson, Head of Learning and Development at High Speed Training, comments: “AI and chatbots have been huge topics of discussion recently, with many in the education sector wondering if they could be used to cheat in exams and coursework. We wanted to put this to the test to see whether it actually does pose a risk in schools and learning environments.

“Whilst it’s concerning that 60% of teachers struggled to correctly identify where AI had been used, many of the teachers involved had not encountered AI before, and we’re confident that with awareness and exposure, teachers will be able to correctly spot it more frequently. Free and easy access to software such as ChatGPT and other bots is still a relatively new phenomenon, so there is bound to be a period of adjustment for teachers and educators.

“There are positives from the experiment, including that there are several tell-tale signs that teachers can use to spot where a student may have used AI to create their work. As these technologies continue to evolve, educators will have to continue to develop their skills and training to ensure that children are still receiving the best education they can.”

The teachers provided feedback on each answer, these are the five most common giveaways they shared that the text had been AI generated:

Americanised language
One of the simplest identifying signs of AI use is Americanised spelling. Whilst this is easy for students to remove if they know what they’re looking for, they may overlook it, leaving the words as small clues a teacher can pick up on.

Lack of personal case studies
Students are instructed to use memorised and previously studied examples to help illustrate their points and reinforce their argument. A total lack of anecdotal evidence and a reliance on the information provided with the question, could suggest AI involvement.

Vocabulary used
Whilst the AI was instructed to answer questions in simplified language, several teachers spotted that some of the answers contained language that you would not expect to see from GCSE aged pupils. Students tend to use more informal language and a regular use of advanced vocabulary is not common.

Formulaic structure
AI will try to neatly package an answer and cover all points that it is asked to concisely. The teachers in the study pointed out that some answers seemed to try to fit everything in, whereas many students would be unlikely to address every single point in a question.

It’s a little too perfect
Even the best students make some small mistakes in their writing, whether it’s spelling, grammar, or a tendency to waffle and include unnecessary words. AI created content is unlikely to include any of these, and may stand out as being a little too perfect.

 

Read more