Spoilers Warning


This is a blog about data science and superheroes. I think there is some serious data science hidden in the pages of comics, and that’s what we’re here to discuss.

I also think math and stats are the under-represented underpinnings of good data science. Drew Conway’s data-science Venn diagram expresses that best I think.

Hacking and domain knowledge without mathematics is Drew’s Danger Zone.

I’ve read and watched superhero comics and movies for years, so I have just a little domain expertise. I’m going to do some coding here – mostly in Julia – you can decide if it counts as hacking skills.

The third crucial part is the math component, which leads to my title: aleph zero is (mathematically speaking) the cardinality of the set of natural numbers, which is a pretty mathsy way of saying “infinity”. So the name, aleph-zero-heroes, loosely translates as infinite heroes, a named inspired by my first topic, the Marvel Cinematic Universe (MCU) and its six infinity gems. And the symbol we use for infinite cardinalities is the Hebrew letter aleph, so aleph-zero is written \(\displaystyle \aleph_0.\)

Why superheroes? I’ll give some more background in the “about page” of the blog, but broadly speaking, stories, legends, myths, and fictions form a key part of the human experience, and so understanding these better will never be wasted time. The Marvel Universe is particularly interesting because of it vast scope.

In honour of the six infinity gems, I plan to get started with some pieces about Marvel’s Cinematic Universe (a tiny fraction of all that Marvel have published). The authors of this cinematic franchise have wrought something remarkable both in the superhero genre, and movies in general. And whether its underlying structure was unconscious or deliberate, it’s also mathematically interesting. Or maybe it is so successful because it is mathematically interesting.

The Marvel Cinematic Universe (MCU) 

First the basics: the core of the MCU is the the series of 20 movies (as of Feb 2019), starting with Iron Man in 2008, and ending in The Avengers: Infinity War. There are more movies planned for the immediate future though. The full list is given below. If you’re a fan you probably don’t need me to tell you the list, but I will discuss the code for extracting this info from OMDb in my next post, and provide the data there for those who want it. And you can look up details of their financial success of the movies at Box Office History for Marvel Cinematic Universe Movies.

#TitleRelease Year1PhaseRuntime (mins)IMDb RatingMetascore
1Iron Man200811267.979
2Iron Man 2201011247.057
3The Incredible Hulk200811126.861
5Captain America: The First Avenger201111246.966
6The Avengers201211438.169
7Iron Man 3201321307.262
8Thor: The Dark World201321127.054
9Captain America: The Winter Soldier201421367.870
10Guardians of the Galaxy201421218.176
11Guardians of the Galaxy Vol. 2201731367.767
12Avengers: Age of Ultron201521417.466
14Captain America: Civil War201631477.875
15Black Panther201831347.488
16Spider-Man: Homecoming201731337.573
17Doctor Strange201631157.572
18Thor: Ragnarok201731307.974
19Avengers: Infinity War201831498.568
20Ant-Man and the Wasp201831187.170
21Captain Marvel220194128
22Avengers: Endgame220194
23Spider-Man: Far From Home220194

The movies are already the most successful movie franchise ever. They have almost grossed more than the two next most successful franchises (Harry Potter and Star Wars) combined. the only other franchise with more films is James Bond, which has 26 movies (and comes 4th in the revenue list), but they have been making Bond movies since 1962, and the MCU dates only from 2008. The rate of production of movies in the MCU has been more than five times that of the Bond series. And three more movies will be released this year, followed by (most likely) another three in 2020 so Bond won’t keep his spot for long. The franchise has been so successful that I expect there will be another decade at least to come, so maybe another 30 movies. Can you imagine a movie franchise of 50 movies?

So the MCU is successful, who cares? Why look at what are often considered just escapist ephemera? I talk about this in more detail in this blog’s “about page”, but to summarise: it is cogently argued in Harari’s book Sapiens that the modern Home Sapiens is set apart from our cousins by the ability to tell stories. Stories, myths, legends, narratives, or fictions – whatever names you use – allow us to build social constructs larger than Dunbar’s number (around 150). Stories allow us to change and adapt as a species more quickly than our genes possible could. Stories let us plan ahead, and consider hypothetical situations – fictions.

Stories have made us the dominant species on the planet in the blink of an eye (on evolutionary time scales). So, to follow the reasoning further, it is not just the cleverness, or intellectual value of a story that it important, but also its reach. Popular fiction, often ignored by serious academics, is vastly more important than serious thinkers would have you believe. And what is the most popular fiction on the planet at this point? Well, we could argue that the MCU fits that bill.

The MCU also extends far beyond this set of movies to a truly amazing set of media including:

  • Graphic novels (the films are based on a large corpus of such, but there are also direct tie ins that specifically link to the movies – often preludes to the movies).

  • Novelisations of the movies, tie-in novels and guidebooks.

  • Short films: the Marvel one-shots, e.g., “The Consultant”, ….

  • TV series: e.g., “The Defenders” (a series of 5 interwoven sub-series), “Agents of SHIELD”, “Agent Peggy Carter”, “Cloak and Dagger”, …

  • Theme park exhibits.

  • Video and board games.

  • The usual merchandising, e.g., toys, posters, and so on.

The MCU involves countless (aleph-one for those who care) writers and artists, all connecting into one universe of interconnected plot lines.

And this doesn’t even include other Marvel movies and content, e.g., the X-Men series, and the series of previous Spider-Man movies, both notable successes in their own right.

The MCU is remarkable both for its scope, but also because despite obsessive attention from dedicated fans, there are relatively few consistency gaffs3, though there are a few:

  • The technology of the tapes Quill has (from his Mum) predate his abduction by Yondu (in 1988).
  • They fluff the notional timing of Spider-Man: Homecoming relative to the other movies.
  • Captain America’s shield does a cameo in Iron Man 2, at the same time as it is supposedly frozen in the ice.

Plus a few more. But I can’t imagine doing better myself.

MCU Literature 

There is also a vast literature written about the MCU by journalists, fans and even academics, up to an including one university course at the University of Baltimore. I’ve read a lot, but definitely not everything. Here’s a quick list of links to other people’s analysis of the MCU and comics in general.

Why JUST the MCU? 

There is no need to tell me that the picture at the start of this article – Spider-Girl – doesn’t come from the MCU, but from the larger Marvel Universe. Given the number of commentaries on gender bias in comics and the MCU, maybe there should be a Spider-Girl movie. I’d definitely watch it.

Regardless, I’m going to start with the MCU. And what’s more, I’m looking at the movies (the “cinematic” part of the universe). I’d like to look at more, but there is so much stuff out there that even cataloguing it isn’t easy. And as part of this work I’m going to have to sit down and watch everything again. Not that I mind that, but the current movies alone represent more than 44 hours of viewing. If I tried to add in, even just Agents of SHIELD I wouldn’t have time to sleep. (That’s a joke. Who has time to sleep anymore?) But maybe we can make some side-trips, for a good cause later on.

I guess that isn’t a good enough excuse though, and I will look at the whole corpus of Marvel or DC or something else at some point, but the MCU is appealing scientifically because it avoids some of the problems of the wider ‘verse, for instance:

  • The same person may adopt multiple hero personas over time (Julia Carter is first Spider-Woman and then Arachne).

  • The same hero persona may be embodied in different people who take on the mask and the mantle of that hero (Captain America was given life by Steve Rogers, William Naslund and Jeff Mace).

  • Reboots: stories, origins, and events have been rewritten by various authors to update them for current audiences, take account of technological developments in the real world, or allow authors to express their artistic, social or ethical ideas. But reboots create huge problems in, for instance, constructing timelines. One approach adopted is to say events happen in parallel universes, but this really doesn’t help analysis. And sometimes universes cross-over as well.

  • Time travel, a common theme of many sci-fi and comics stories can lead to very complex relationships.

  • Although datasets exist for many comics, these are often very coarse grained, for instance the Marvel Chronology Project records which characters appear in which comics, but little else.

All of these issues are (so far) mostly absent from the MCU. Not that it is trivial to analyse the MCU, even so.


So that’s the quick version of what I want to do with this blog, and why I think the MCU is an interesting starting point. No maths today. More later, though I am not planning to dump anything too hard on you.

Next week, I plan to work on a timeline of the MCU, but with a twist, so stay tuned.


  1. The items in the table are not in "release year" order. They are in the order they are set in the MCU timeline (more on this next week).
  2. Not released at the time of writing.
  3. There are plenty of physics/chemistry/biology mistakes in comics and the MCU. Let's not go down that road. It doesn't go anywhere I want to go.