How Plagiarism Software Found a New Shakespeare Play

How Plagiarism Software Found a New Shakespeare Play

Plagiarism-detection software was created with lazy, sneaky college students in mind — not the likes of William Shakespeare. Yet the software may have settled a centuries-old mystery over the authorship of an unattributed play from the late 1500s called The Reign of Edward III. Literature scholars have long debated whether the play was written by Shakespeare — some bits are incredibly Bard-like, but others don’t resemble his style at all. The verdict, according to one expert: the play is likely a collaboration between Shakespeare and Thomas Kyd, another popular playwright of his time.

Sir Brian Vickers, a literature professor at the University of London, came to his conclusion after using plagiarism-detection software — as well as his own expertise — to compare writing patterns between Edward III and Shakespeare’s body of work. Plagiarism software isn’t new; college professors have been using it to catch cheats for more than a decade. It is, however, growing increasingly sophisticated, enabling a scholar like Vickers to investigate the provenance of unattributed works of literature. With a program called [email protected], Vickers detected 200 strings of three or more words in Edward III that matched phrases in Shakespeare’s other works. Usually, works by two different authors will only have about 20 matching strings. “With this method we see the way authors use and reuse the same phrases and metaphors, like chunks of fabric in a weave,” says Vickers. “If you have enough of them, you can identify one fabric as Scottish tweed and another as plain gray cloth.”

Among Shakespeare’s recycled bits of phrases: “come in person hither,” “pale queene of night,” “thou art thy selfe,” “author of my blood” and even the whole phrase “lilies that fester smell far worse than weeds.” Other matching strings are less compelling, but are nevertheless an essential part of distinguishing the author’s linguistic fingerprint, says Vickers. The professor also matched more than 200 strings of words between Edward III and Kyd’s earlier works — at this point in his career, he had only three plays to his name. According to Vickers, Kyd should get top billing on the play — about 60% of Edward III was likely written by him; the remaining 40% by Shakespeare. Using the plagiarism software, Vickers has also attributed four more anonymous plays to Kyd.

So why would the Bard, at this stage in his career — aged 32 and well established by the time Edward III was published in 1596 — need to collaborate on a play Simply because, as literature scholars have documented, the London theaters of the day were competing for audiences and had to churn out material as quickly as possible to stay ahead of one another. To do so, they often used groups of authors to write playbooks in a matter of weeks, paying each author by the scene. The theater companies would then often advertise themselves, rather than the authors, on the published playbooks.

“In Edward III, it’s quite a typical arrangement; Shakespeare writes three scenes near the beginning and one later on, presumably to guarantee some kind of continuity,” says Vickers. “It’s a very good play, but it suffers from some inconsistencies — characters who appear in some of Shakespeare’s scenes don’t appear later on.”

Vickers and his colleagues hope that by using plagiarism software, which they’re currently applying to a study of British playwright John Ford’s works, scholars may yet be able to settle many of the literature world’s greatest authorship questions. But don’t try this at home — this isn’t something just anyone can do. Vickers has spent more than four decades studying Shakespeare, and he’s devoted countless hours over the past two years reaching his verdict on Edward III. “You have to go on hunches — you can’t just feed in all the numbers on every play and sit back,” he says. “But what I’m hoping to do is bring about a marriage between human reading and machine reading. If you distrust computers, you won’t advance at all; if you have just computers and know nothing about literature, you’re likely to go wrong as well.”

While Vickers says his research proves the co-authorship of Edward III beyond a doubt, he’s yet to convince all of his fellow Shakespeare experts. Says Stanley Wells, chairman of the Shakespeare Birthplace Trust, the largest Shakespeare preservation group in Britain, “I’m not yet sure we’ve reached the stage yet that we can be sure of authorship without attacking it from many different angles,” such as investigating metrics, classical allusions and signature abbreviations. “One of the problems of this sort of thing is that it’s not easy to pronounce on the evidence without doing all the work again yourself.”

Scholars have applied other quantitative analysis techniques to authorship studies over the past century. For example, statisticians have been able to distinguish the writings of James Madison and Alexander Hamilton in the Federalist Papers by manually calculating occurrences of a set of marker words. But dramatic writing, with its more constrained nature, has until now proven more difficult to crack.

Although Vickers believes he’s the first scholar to use plagiarism software on authorship studies, he’s not the first to use software of any kind to analyze the linguistic patterns of literature. Homerian scholar Martin Mueller of Northwestern University, who has lauded Vickers’ work, has used the “search and display” function in his computerized database to analyze Homer’s works. With the tool, he’s been able to highlight distinctive phrasal repetitions in the author’s prose — which may come as no surprise to those who found the Iliad and the Odyssey a bit repetitive. Here again, like with Vickers’ work, computers are coming in handy to help prove what smart scholars have long sensed, but they’re not making any literary discoveries on their own. At least not yet.

Read “The Mystery of Shakespeare’s Identity.”

See the top 10 fiction books of 2008.

Share