Since this is a reprint by popular demand – gosh – of the first book in a series that will, eventually, contain at least ten, there’s a very good chance that you already know what happens after this book, which is more than I did when I wrote it.
By now, I’ve had a couple of forays into rating and ranking:
- “Borda Beats Batman” created a meta-list of the top-10 Batman graphic novels.
- “Top-10 Caped Crusader Creators” looked at the best Batman writers.
Any rating or ranking system1 has problems – it oversimplifies a complex question. But as a society we do love a top-10 list.
I thought I would illustrate one of those problems here by comparing my ratings with those from Goodreads. And I am on a Discworld marathon at the moment, so I will try it out of the works of Sir Terry Pratchett (STP). If you want, you can just skip to the results.
I talked about forming meta-ratings from a set of rankings before. I like the approach because it takes a step away from the idea of asking individuals to “rate” items. It’s much harder to get consistent ratings from people, than it is to ask them “Do you prefer A to B?” That’s why an optometrist doesn’t show you six lenses and ask you to rate them from 1-10, she just asks you “Better or worse?”
If you take enough “Better or worse?” answers you can then create your own rating. That’s what I am doing. I am taking a bunch of peoples’ ranked lists, e.g., their top 10, and using these to create my own meta-rating (and hence my own ranking as well).
The process is a lot like an election. We seek to combine individual preferences into a collective decision.
Having just been through one of the most interesting elections in living memory, i.e., Trump v Biden, this seems something worth commenting on. A lot of people on both sides of the fence have complained that the US system for electing the president is unfair. In 2016 the system elected Trump despite him losing the popular vote. In 2020 Biden won, but Trump is still calling foul.
It’s worth saying again that any election system must have flaws. There’s math that proves it. Arrow’s Impossibility Theorem shows that every possible election system can lead to a result that someone can call unfair. Kenneth Arrow (1921-2017) proved in his revolutionary theorem that a few simple properties that everyone could agree would make an election fair can’t co-exist together in a real system.
Having said that, there have been centuries of effort devoted to deriving better systems, and the one I am using here – a modified Borda count – has a long precedent going back to a French mathematician – Jean-Charles, Chevalier de Borda – from the 18th century. My modification is described here.
Essentially, I take a set of ranked lists (in this case the 41 Pratchett Discworld novels), and a book ranked number \( n \) receives \( 1/n^\alpha \) points. When we total the points we get a rating. When we sort the books in ratings-order, we get a ranking.
Our input data is a set of ranked lists of the 41 Pratchett Discworld novels. Just as in the Batman data the data isn’t perfect:
- most people don’t rank all of the books;
- the rankings are based on different peoples’ (or groups’) tastes;
- some people only provide a “to read” list with no ranking;
- the lists were published at different times, some before the latest novels were finished.
One of the great things about the modified Borda count is that it deals with all of those issues.
The ranking data I have used is here. Each file includes a source. In total there are 11 inputs. Note that Goodreads itself is one of the sources, but that
- it has no more precedence than any of the other rankings; and
- I only use the ranking data, not the actual scores.
Each file includes a comment at the start giving its source URL, as well as the year in which it was published.
The Julia code to create the meta-ranking from these is essentially the same as the code for Batman. It isn’t very profound code, but it does include all the other types of meta-rating procedures I’ve talked about before. And I plan to add a few more in the future.
I want to compare my ratings to a “standard” and I don’t know a better one than Goodreads2. The scores there are an average of in some cases hundreds of thousands of users’ scores.
Goodreads kindly collects a list of all the Discworld books here. The listing includes all of the discworld novels, plus a number of other Discworld books such as The Science of Discworld and its sequels (though I am just looking at the novels here).
The other great thing they do is provide an API to access their data (for non-commercial uses). I guess they were thinking of app developers when they created this, but it’s terrific for data scientists; so thanks hugely!!!
It’s a godsend because scraping Goodreads web pages isn’t easy – they are fairly complex. The API, on the other hand, is pretty simple.
My Julia code starts from a
list of the Discworld books. Careful
with this file – it has lots of comments in it, indicated by a
“#”. Most good CSV readers (including
CSV.jl) will allow you to
ignore these when reading the file.
The titles of the books are used to construct the Goodreads query. However, the titles contain characters such as exclamation marks, e.g., “Guards! Guards!” These are reserved for special meanings in a URL so we have to replace them with an encoding, e.g., “!” goes to “%21”. You can find out more about this at https://en.wikipedia.org/wiki/Percent-encoding or just try out this Julia function. It’s ugly. I should do this with an array and input the data from an encoding file, but I wanted it to be stand alone and for it to be blindingly clear how it works (and it doesn’t have to be fast). Not it assumes that the input hasn’t been partially encoded already or the first replacement may mess it up (in math terms the function is not idempotent).
function convert_to_url( s::AbstractString ) # convert reserved symbols into codes s = replace( s, "%" => "%25") # need to do this one first s = replace( s, "+" => "%2B") # need to do this one 2nd s = replace( s, " " => "+") s = replace( s, "!" => "%21") s = replace( s, "#" => "%23") s = replace( s, "\$" => "%24") s = replace( s, "&" => "%26") s = replace( s, "'" => "%27") s = replace( s, "(" => "%28") s = replace( s, ")" => "%29") s = replace( s, "*" => "%2A") s = replace( s, "," => "%2C") s = replace( s, "/" => "%2F") s = replace( s, ":" => "%3A") s = replace( s, ";" => "%3B") s = replace( s, "=" => "%3D") s = replace( s, "?" => "%3F") s = replace( s, "@" => "%40") s = replace( s, "[" => "%5B") s = replace( s, "]" => "%5D") return s end
Once we have the titles converted to URL strings we construct the query string using the following code. Note I separate it into two steps: the base is the same for any of my queries, and the final URL incorporates the specifics for a particular query:
url_base = "https://www.goodreads.com/book/title.xml?key=$(goodreads_key)" url = "$(url_base)&author=$(author)&title=$(title)&format=xml"
Then we query Goodreads using the following code:
using HTTP try global r = HTTP.get(url) catch println(" $url failed.") end
The response body
r.body is XML, which I then parse (more on how to
do that in a future post).
From the XML I extract everything I can, and it’s all summarised in
but for the moment I am only using the
Rating field. The file
includes everything on the Goodreads list, not just the novels, but I am
only playing with them here.
OK, you stuck with it for this long, so here are the results. First is a list of my meta-ratings for each book. The following figure shows the books with their ratings sorted from top to bottom. You might have to resize your browser to see everything or if you mouse over a bar you can see the actual results. The graph (created with the PlotlyJS package in Julia) can also be zoomed if you want.
You’ll probably find you disagree with some of the details here – I did. But overall I was really pleased. And at the very least I can now recommend, to friends who haven’t read any Pratchett, where they might like to start, i.e., with Small Gods rated at 6.35. Small Gods also has the advantage that it is a stand-alone novel, not a sequel.
The lowest rating is The Shepherd’s Crown. More on that later.
Once we have ratings we can start to look for patterns. STP wrote for a LONG time – for almost four decades (1983-2015) – so we can look at trends via the average ratings per decade.
The trend shows some features most fans won’t find surprising. He always had his hits and “misses” (even the misses weren’t bad), but his standard overall was notably consistent over the first three decades, the 80s, 90s and 00s.
But in the final decade the ratings decline. Sadly STP contracted a rare form of Alzheimer’s (posterior cortical atrophy) in 2007. Although he was still amazingly productive, and a wonderful writer, it must have been much harder for him particularly during the years near his death in 2015.
We can look at other patterns as well. The Discworld books are loosely grouped into subseries. My book_data.csv file also lists the subseries for each book, so we can also plot these averages. Again there aren’t too many surprises here: the series based around Death, the Watch, and the Witches are often cited as his best.
We might get a minor surprise by how well the Ancient Civilisations subseries does here, but there are only two books in this group, and Small Gods, the highest rated book, is one of them so we can see that as a small sample effect.
The one surprise for me was that the Tiffany Aching books rate poorly. I liked this series. The problem, I think, is that 3 of 5 of these books are from STP’s last decade. And as I commented earlier, this was an era when writing was more of a challenge for him.
Finally, I want to do the comparison that I started out thinking about: my meta-ratings compared to Goodreads. The following plot shows each book on the two scales. Mouse over them to see the titles.
If the two approaches to rating were fundamentally the same3, we would expect to see a diagonal line. We don’t. There are quite a few books that sit near that theoretical line (roughly from Eric up to Night Watch), but there are also a few clusters that don’t.
That fact lies at the heart of why I think Goodreads (and all ratings systems like them) are flawed. Goodreads relies on its participants voting with common standards, but we don’t have such a thing. Your five stars might be my three. And there are systematic problems in star systems: inflation, expectation, subjectivity, etc., etc.
We can see this most clearly with the The Shepherd’s Crown. This was STP’s last book. It was published after his death. His long-time assistant Rob Wilkins stated publicly in the afterword:
The Shepherd’s Crown has a beginning, a middle, and an end, and all the bits in between. Terry wrote all of those. But even so, it was, still, not quite as finished as he would have liked when he died.
I haven’t read it and I won’t for a while for reasons I outlined here. People I trust have told me it definitely isn’t his best novel, and that isn’t surprising. He didn’t get a chance to polish it, and I guess that shows us how important is the old adage “write easy, edit hard.”
So my meta-rating of 0.67 at the bottom of the list of all the ratings seems reasonable, but on Goodreads the book is scored at a strong 4.36. How come?
It’s as simple as this. Readers love STP. I know I do4. Almost everyone who read The Shepherd’s Crown knew where it came from. They knew the story about his illness. And many of them rated the book not based on its quality as a book, but on their feelings about STP. They gave it a nostalgia rating. Many of them said so explicitly. Here are just a few quotes from the reviewers:
If you are new to Discworld or Terry, don’t start with this one. He was very ill and this was, I think it’s safe to say, not written well, but by Jingo it was told well.
This is really a four-star book but I am giving it five anyway because it was his last and because he was a wonderful author who gave me an incredible amount of enjoyment over the years.
Ignore the star rating. Like many other people reading the last book of a deeply loved and missed author, objectivity is impossible.
This really isn’t a five star book, but I can’t bring myself to give Terry Pratchett’s final Discworld novel anything but five stars.
It feels wrong to give this a star rating, because my response to it – like that of all Pratchett’s regular readers – is far more nuanced and conflicted than that. In terms of actual quality, it’s no higher than a three, but the emotional response it elicited from me deserves a five.
I totally support everyone who did this. I probably would do so myself. But there’s the rub. If people rate based on their own criteria (be it nostalgia or any other feature) rather than purely on quality, then this is one (just one) indication that these types of scores aren’t a rational foundation upon which we can build useful meta-ratings.
Another example of ratings not actually rating the thing you want is Hogfather. This gets a very strong 4.58 in Goodreads, but only 2.23 in my meta-rating. My guess about this one comes down to the TV mini-series. Personally, that series was so good it overwrote my memory of the book. It has some of my favourite actors: Marc Warren, Tony Robinson, Ian Richardson and Nigel Planer, and they put together a wonderful movie. Pratchett himself has a cameo. I can’t say this for sure, but it seems very likely that often we rate a book more highly if we enjoyed it on the screen.
What do you do instead? Well, I just showed you. Get users to provide rankings not ratings. A ranking is a more robust, more consistent way to get user input.
At its simplest, just ask “Better or worse?” Don’t ask readers to give a number, just ask them, for instance, “Was this book better or worse than Equal Rites?” Ask enough questions like that, and we can build a much more accurate picture of which Discworld books people really liked.
This didn’t happen when I wrote the last post about STP, but reading the comments people wrote about him left me close to tears (again). I don’t weep, or cry or otherwise strongly emote as a general rule. But some things are just bad. The loss of STP is one of them. He is sorely missed.
Apart from that, the point I am leading up to (or maybe ran past in the mist) is that I don’t believe the methodology behind most modern ratings is valid or useful. They assume too much about people that just isn’t true. We use ratings more and more. Every web page5, every online service, every ride share company, and every online social network lets you give stars, or likes, or some indication of value. Even universities now ask students to rate their teachers on a regular basis. But they are (almost) all meant to be an absolute score, and none are. If we swapped to relative metrics we would get much more informative results.
A quick thanks again to Goodreads for making their data accessible.
Just a quiet thanks to the people who have been helping me edit these blogs, notably my wife.
Formally a ranking is just an ordering of a set of items, whereas a rating is a numerical score assigned to each item. Often they are confused because a ranking can be thought of as a score (e.g., from 1-10) and a rating automatically implies a ranking. But there are subtle differences, notably, a ranking has no absolute scale (1 is better than 2, but when comparing two lists we have no information about whether the two 1’s are comparable. Also, a ranking conveys nothing about how close two participants are, they could be neck and neck or miles apart.↩
I didn’t say Goodreads’ ratings were good. It’s often assumed that with so many inputs the output must be valid, but I think there are some problems with such scores. The problems in that type of rating system are one of the topics here. But at least this one is better than any other I am aware of.↩
Goodreads rates from 1-5 and my ratings aren’t bounded, so they will never be identical, but one could be related to the other by a simple linear transform, in which case they are essentially identical except for “units”.↩
- You know what I mean. Don’t get weird about this. ↩
- But not this one. ↩