5. Detecting plagiarised work

The role of the academic who will thoroughly read the work is key

The work of an academic entails not only research and teaching, but also the prevention and detection of unethical conduct, including plagiarism. This is certainly not an easy task. Without doubt, no academic takes pleasure in discovering such practices by his or her colleagues or students. Research shows that teachers and students often disagree as to the interpretation of the motivation for cheating. This can then affect our practical handling of plagiarism.9

In an age of widely available information sources, a fundamental problem arises: how to effectively and correctly detect plagiarised work, and how to subsequently respond. This gave rise to a range of software tools aimed at aiding the detection of plagiarism. Universities also adopt various preventative measures. No measures can function, however, without a sufficient erudition of academics, who must understand the issue of plagiarism and its forms, and who should be thoroughly trained in the methods of detection and subsequent resolution of dishonest practices.

The expert opinion of the supervisor and opponent (or opponents) is key. A part of the evaluation is not only the technical aspect of the work, but also the ethical aspect. The complexity of evaluating the originality of work results from the fact that the sources used in final thesis have varying levels of validity and availability. It is, therefore, important that the evaluator is an expert in the field. Both the supervisor and opponent must verify that all sources can be traced back, and in their review, they must certify that all rules of integrity in academic writing have been complied with.

Plagiarism can be easily avoided if students regularly consult and present their text to their supervisors. If the supervisors do their jobs well, they draw their students’ attention to any potential shortcomings in the text and help their students to correct them. Final theses are, nevertheless, created by students, and students are responsible not only for the technical quality but also for ethical aspects. It is clear that not all students accept their supervisors’ recommendations, or they do not implement them with sufficient care. Despite the efforts of the supervisors, the thesis still can contain defects which are impossible to resolve.

Antiplagiarism systems

The antiplagiarism system itself cannot make this decision – identifying a correct citation, random similarities due to common phrases or a correct use of the author’s own text always depends on the judgement of an academic.

There is a wide variety of antiplagiarism systems available on the market – ranging from robust tools developed by commercial companies or universities, to freely available software which, however, may be of questionable quality. Some systems also provide tools for further communication between a teacher and a student for the purposes of teaching academic writing. Most of these systems currently work with text and focus on finding text similarities. This type of plagiarism is also the most relevant for this handbook, and for that reason we will now focus only on systems for detecting text similarities.

The antiplagiarism system first transforms the uploaded document into plain text, from which it deletes any function words (e.g., prepositions and conjunctions), sometimes even numbers. Some systems can also deal with text uploaded in the form of an image due to optical character recognition. This is followed by a socalled lemmatisation phase, during which words are transformed into their basic form – as a result, the system can detect similarity between differently declined nouns or conjugated verbs.

The system then compares the text in this form with a database. The extent of the database and the range of available sources have a significant influence on the successful detection of similarities. Some systems search not only through their own database of uploaded documents, but also through sources available on the internet and negotiated closed sources (e.g., publishers’ databases). However, there may be considerable differences even between systems that work online – they may not search through all available websites, but only through their indexed sources. This phase is time-consuming and computationally demanding, which is why it takes a relatively long time – it may take hours or even days.

Similarity report

After the comparison of documents is concluded, the system highlights any detected similarities in a report – usually in a PDF format. A number of systems offer clear interactive reports in their online interfaces. By highlighting any similarities and referring to documents with which the given passages correspond, the system accomplished its job and any further decisions are up to the evaluator.

When working with a similarity report, it is good to realise that the system is not intelligent – it carries out a computationally demanding task, but the intellectual decision as to whether the detected similarity constitutes plagiarism must be made by a human. No system is currently able to determine if the discovered text is cited properly, or if the author correctly referenced the right source. Systems often detect random similarities in commonly used phrases, in long titles, and also in tables or in attachments which are bound to various rules or legislation. A certain rate of overlap is normal, and it is logical that the system will always detect a small percentage of similarity.

The more advanced systems can also recognise similarity in cases where the text is not copied verbatim – most of them can deal with different forms of words (as a result of the lemmatisation, mentioned earlier) and different structures of sentences. Usually, they are not deceived by a couple of changed or added words. Currently, the majority of systems are not able to recognise a text changed by paraphrasing or translated into a different language.10However, developers of the advanced systems are concentrating precisely on these functionalities, which means that we might get to see them soon.

Use your own head, not just the system

When evaluating student work, it is important to use one’s own know-how and not just rely on the results of software analyses. The first indication might be an insufficient link between the stated information and original sources. This might result from careless use of sources, or from an evident concealment of primary sources. In that case, it is necessary to thoroughly compare the extent of similarities with the primary source. If the source is not referenced and a lack of originality of the text is suspected, it is then necessary to find the original source, for example by using an internet search tool. Putting several random sentences into a search tool should be part of a marker’s standard procedure, regardless of the output of the antiplagiarism system.

If we have the original source and the likely plagiarised work at our disposal, we can identify further clues that could prove that the text is copied. The probability of creating two identical texts that include the same grammatical errors and typos is virtually zero.

A master’s thesis by a former Czech Minister of Justice showed a significant similarity to another master’s thesis. The author copied substantial passages of text, sometimes paraphrasing them slightly. Plagiarism was proved, due among other reasons, to identical sentences that included the same typos. In her second master’s thesis, plagiarism was proved through identical source referencing and copying a table, which contained incorrect data as a result of a change to the order of rows.

When considering plagiarism, it is necessary to conduct a complex evaluation of the text. It is definitely not possible to work just with similarity percentages and conclude that a certain work is plagiarised on the basis of one number. Especially when it comes to final theses, it may happen that several students are working on the same or similar topics (e.g., they are analysing the same problem in a laboratory using identical methods, but applying different formulae; they are mapping different territories using identical methods; they are using the same software which describes and shows the chosen methodology, but for different tasks). It is evident that in such theses, a higher rate of similarity will be detected in the methodology sections. Students will write the text using their own words, but they cannot avoid using identical technical terminology. The teacher must examine this and comment on the similarity in his or her reviews.

Procedure when plagiarism is detected

In the case where a high rate of similarity with the original source is detected, and the source is not cited correctly (thus meeting the definition of plagiarism), it is necessary to thoroughly substantiate this fact in the review. The teacher and subsequently the defence committee must express their opinion during the defence as to whether or not the case is one of apparent intentional disregard for rules of integrity. A teacher evaluating a seminar paper must take into account the fact that by writing texts, a student is learning not only about the topic, but also the skill of academic writing. When assessing a transgression, the marker will consider an unintentional omission in a seminar paper differently than the same mistake in a master’s thesis.

In the case of a systematic breach of rules for working with sources, and of a student’s apparent intention to present someone else’s ideas as his or her own, it is not possible to be lenient. The procedure for detected plagiarism is discussed in detail in chapter 6.

In the majority of the cases publicised in the media, plagiarism was only suspected several years after the work was written – when the work or the plagiarised text were published, when a new antiplagiarism tool was used, or when the discrepancies were noted by another researcher pursuing a similar topic. For people who are active in the public sphere, the initiative for a second evaluation of a final thesis usually comes from journalists. It is fully justified to require moral integrity from people who participate in the leadership of institutions or, for example, make decisions about the use of financial resources. It is appropriate to warn students who are writing their paper or creating an artwork about the potential consequences, so that in ten or twenty years they are not accused of plagiarism, which could end their career prematurely.

  1. Foltýnek, T. Vědecký smrtelný hřích. Plagiátorství: příčiny, důsledky, prevence. Dějiny a současnost, 2019, 8, 10–12.
  2. Foltýnek, T., Dlabolová, D., Anohina-Naumeca, A., Razi, S., Kravjar, J., Kamzola, L., Guerrero-Dib, J., Çelik, Ö., Weber-Wulff, D. Testing of Support Tools for Plagiarism Detection. International Journal of Educational Technology in Higher Education, 2020, 17:46. DOI: 10.1186/s41239-020-00192-4.