Reviewed by Annabel
Imagine that you train a computer to read and analyse books, input a mix of hundreds and ask it to predict which books are most likely to be bestsellers, and amongst the results, it gives one book a 100% score. Well in essence, that’s what this book is about and, no, I’m not going to tell you which book the computer picked!
It’s an absolutely fascinating journey to get there, even thrilling, involving many books and authors that have spent months and months in the bestseller lists – we’re all familiar with most of them, (even if we haven’t read them): The Da Vinci Code, The Girl with the Dragon Tattoo, Gone Girl and the phenomenon that was Fifty Shades of Grey all feature along with many titles by James Patterson, John Grisham, Danielle Steel and Stephen King, just to mention a few.
Lest you think that this book is only concerned with those books that typically appear on the supermarket shelves, there are plenty of other books that would usually be described in more literary terms that will be found within The Bestseller Code’s pages. Anthony Doerr’s Pulitzer-winning All the Light We Cannot See for instance was one of the bestsellers selected.
Archer, a former editor at Penguin, decamped to study at Stanford where she worked with Jocker who ran Stanford’s Literary Lab. He’d been using the computer to analyse The Book of Mormon for evidence of multiple authorship. They decided to apply the science to bestsellers and this project was born, using the New York Times bestseller lists as their benchmark definition of which books make the grade.
They made a list of five thousand books, of which 500 were bestsellers. The next thing was to teach the computer to ‘read’ and analyse the texts. The following chapters take the different elements of a novel in turn, and throw up some fascinating insights. First, they taught the machine to recognise nouns and to categorise them into themes to see if certain topics were more prevalent in bestsellers.
If you recommend a book to a friend or if you are a writer yourself and you mention your work, the first question you’ll be asked is likely going to be, “What is it about?” It is rarely – unless you are a biographer – who is it about, or where is it about, or when is it about. An interest in subject is what comes first. Which begs the question, what is the killer topic?
The computer identified several, but it is more complicated that. A novel may tick the boxes for high-scoring topics – but they must be in the right proportions. Around three of the top topics should make up around 30% of the novel, then several secondaries etc. Analysed like this, sex and drugs and rock’n’roll score minuscule amounts of <0.003% – these topics sell to niche markets.
There is a tendency, especially with first-time authors, to cram too much into their novels, to have too many topics jostling for position as the key theme of a novel. This is one reason why John Grisham and Danielle Steel craft bestsellers. They mainly concentrate on a single signature theme.
Telling the heart of a story with fewer topics implies focus. It implies lack on unneeded subplots. It implies a more organized and precise writerly mind. It implies experience.
Next, they look at plot trajectory. Their work broadly confirms Steven Booker’s assertion that there are only seven plots that work, but again it’s more complicated than that. Within the overall arc, there need to be regular ups and downs – that bookish roller-coaster ride – and they offer a fascinating graph charting the plots of Fifty Shades of Grey and The Da Vinci Code. They also examine whether it’s better to end on an up note (not necessarily applicable in a series).
Authorial style is the next subject to be considered. The computer analysed punctuation use, ellisions such as I’d and you’re, the frequency of adjectives and adverbs, the frequency of words like the, of, very, that. This threw up some fascinating statistics about author gender. Analysis of her linguistic fingerprint was how J.K. Rowling got outed as Robert Galbraith. Stylometrics as it has become known, could identify with reasonable success whether an author is male or female, but in the case of bestsellers – it can also identify their writing background: whether the author comes from a professorial background or journalistic one – guess which most bestselling authors come from?
This section also looks at the impact of some first lines, and it was hilarious to see Jane Austen compared with Jackie Collins – not obvious – but so similar when looked at side by side!
Character is the final novel component to be examined – through the verbs attributed to the protagonists. One clear area that the computer agreed with writing professors is over dialogue tags; anything other than ask or say is just distracting. But there is one verb that differentiates bestsellers – it’s the one that drives the action and thence plot. A character has to have a need, to want to do something, to be active not passive. This is where all the ‘girl’ novels scored so highly.
The reports for each book subjected to this analysis rain to fifteen pages, and it took a huge amount of computer power to do the work. The last main chapter brings it all together, and tells us which book the machine thought was the perfect bestseller. I’d steeled myself not to cheat – and when I found out which title it was, I was truly surprised!
Appended is the list of the top 100, of which I’ve read just eleven. It’s actually an intriguing list, with plenty to explore for all tastes which is a nice result.
Could you take all the information in this book, and use it to write a bestseller? There is plenty of good advice within, so you could try, but Archer and Jocker’s book is not a creative writing manual. It is an extremely entertaining and forensic examination of the world of novel writing seen through the eyes of the least snobbish reader you could ever imagine – the computer. Absolutely fascinating.
Annabel is one of the Shiny Eds and is still waiting for inspiration to strike before she can write that novel!
Jodie Archer and Matthew L Jockers, The Bestseller Code (Allen Lane, 2016, Penguin paperback July 2017) ISBN 9780141982489, paperback, 256 pages.