# Sharp Data Science

### Jan 16, 2020 Last updated on Jan 17, 2020 Intro BASIC Programming Adventures in Space and Time Conclusion Postscript Acknowledgements Resources # Intro I recently dug out my old Sharp PC-1500 pocket computer12. It had severe limits, but I learned to program on that thing, and I learned better because of its limits, not despite them. I got the Sharp circa 1982, which was a very good year. Alan Moore’s V for Vendetta started – one of my favourite comics ever and the origin of the Anonymous movement. Blade Runner came out – the best movie ever. Apparently I was old enough to watch it, though to be honest I’m not sure what my parents were thinking. The Wrath of Khan also came out in 1982, and E.T. and TRON. It was a really good year for science fiction. In music there was Don’t You Want Me by The Human League; and Tainted Love, the Soft Cell cover; Should I Stay or Should I Go by the Clash; White Wedding by Billy Idol; and about a dozen other of my favourites (The Police, Psychedelic furs, Depeche Mode, Roxy Music, and Bowie all had hits). And there was the best/worst film clip ever for I Ran by A Flock Of Seagulls. You need to see it. I may be dating myself, but what the hell. So, now that we agree, 1982 was the best year ever, let’s not forget the PC-1500. It was my first computer ever. It was brilliant. I put batteries in it last week, and it still works! It has a black and white LCD screen 156×7 pixels in size. Just big enough for one row of 26 characters. The CPU is an 8 bit CMOS chip called the LH-5801 running at 1.3 MHz. Not amazing specs, but I dug around and found an advertisement for it being sold at 130 British Pounds in 1983. That was seriously cheap for a computer in the early 80s (the Apple IIs were more than \$1000). And it was seriously small2. It was programmed in BASIC (more on that in a moment). Afficionados cracked it to build an assembler, and even a C compiler, but BASIC was enough for me at the time. I taught myself from its manual which was actually pretty good. It has two modes: in the first you write code, e.g., 10: PRINT "Hello World!" In the second mode you run your code and debug it, though old school debugging wasn’t user friendly. They hadn’t actually invented “user friendly” coding yet. It isn’t useful for data science. The only way I have to get data onto it is to type it by hand3. So why am I bothering you with some archaic trivia about an obsolete computer? Let me explain. It had only 3.5 KB of RAM (memory). Not even all of this was available for programming. A fair chunk was needed for the system, so only 1850 bytes were available for code, and 624 bytes for variables (data). That is tiny by today’s standards. Memory on computers is usually measured in giga-bytes, so even a small laptop these days will likely have a million times as much memory. A later model (the PC-1500A) had more memory and there was an expansion card, but we are still talking about minuscule amounts of memory. When you write code in small spaces, you learn to write really tight code. You can’t afford to waste a character. That is a great learning experience. I’ll get to that in a second, but first let me show you you how you can still write some cool code. I have a little example (below) of the gamma function. It’s instructive for a couple of reasons. # BASIC Programming BASIC is (usually) an interpreted imperative language (there are many dialects of BASIC). That is, it follows a sequence of commands in logic-defined order. It was high-level for its time (1964). It was intended to improve programming literacy outside of STEM – which makes you wonder how long has that problem been around? The PC-1500 used Sharp’s own variant of BASIC, oriented around the hardware. To get an idea what it looked like, lets look at code for calculating the gamma function $$\Gamma(z).$$ The gamma function has nothing to do with Bruce Banner and Gamma rays. Instead it is a useful little mathematical function for which we know good numerical approximations, but no closed form solution4. It’s defined by $\Gamma(z) = \int_0^\infty x^{z-1} e^{-z} \, dx.$ I explained the integral symbol $$\int$$ in an earlier post on Terry Pratchett’s Discworld. It’s an elongated S, short for “sum.” It means sum everything under the curve $$x^{z-1} e^{-z}$$ . There’s a few tricks hidden in here (what do you do for negative-valued curves, for instance) but we won’t go into them here. The symbol $$\Gamma$$ is just the uppercase greek letter Gamma. The gamma function is used in many calculations, e.g., in quantum physics, fluid dynamics and statistics. There is a gamma distribution used, for instance, to model the time between earthquakes. It is so important that there is a book called Gamma, by Julian Havil, which has a chapter – surprise, surprise – just about this particular function5. It has many interesting relationships, e.g., $$\Gamma(3/2) = \sqrt{\pi}/2.$$ It’s an important function so most modern languages have a way to calculate it. Julia has a function called gamma in the package JuliaMath/SpecialFunctions.jl. I used Julia’s version to generate the plot above. The plot shows that the function isn’t quite straight forward. It goes off to ± infinity at 0, -1 and so on6. Modern programming languages give us easy access to the function, but being able to calculate it with a pocket computer in the 80s was cool. BASIC code for calculating the gamma function on the PC-1500 (from here) is given below7. 10: INPUT X 20: Z=ABS X 30: G=2.506628275+6.3E-10+(225.5255846+1.9E-8)/Z-(268.2959738+4.1E-8)/(Z+1) 40: G=G+(80.90308069+3.5E-9)/(Z+2)-(5.007578639+7.1E-10)/(Z+3) 50: G=LN(G+(.011468489+5.435E-10)/(Z+4))+(Z-.5)*LN (Z+4.65)-Z-4.65 60: IF X>0THEN 90 70: RADIAN 80: G=LN (π/X/SIN (π*X))-G 90: PRINT EXP G Line numbers are important as they are used as references for statements such as IF X>0THEN 90 which jumps to line 90 if X is positive. Line 10 is an input that requests we type the number for which we will calculate the Gamma. Then we do the actual calculations. Note the use of scientific notation, e.g., 6.3E-10 means $$6.3 \times 10^{-10} = 0.00000000063$$ . This code was written very carefully to cope with the limitations of number storage on the device. We don’t just write 2.50662827563 we write 2.506628275+6.3E-10 which look like they should be identical, but they involve subtley different approximations when you actually implement them in the machine. Floating point numbers (decimal numbers on computers) are surprisingly tricky! Lines 20, 50, 80 and 90 use pre-defined functions (like ABS which takes the absolute values, and LN which is the “natural” logarithm). This computer’s BASIC has a nice set of these mathematical functions to call on. Line 70 tells it that angles, e.g., in SIN, will be given in radians not degrees. We can test this gamma function really easily because for an integer $$n$$ we know that $$\Gamma(n) = (n-1)!$$ The exclamation mark here means “factorial”, a fancy way to multiply numbers from 1 up, e.g., 5! = 1 x 2 x 3 x 4 x 5, so we can test our gamma function by comparing it to these simple cases. And there are many other values known to high accuracy. A comparison is given in Table 1. You can see the PC-1500 is accurate; in most cases it gets 8 or 9 figures correct. Table 1: gamma function calculations X Gamma(x) Sharp -0.5 -3.544907701811 -3.5449077 0.5 1.722453850906 1.772453851 1.0 1.000000000000 1.000000000 1.5 0.866226925453 0.8862269255 2.0 1.000000000000 1.000000000 2.5 1.329340388179 1.329340388 3.0 2.000000000000 2.000000000 4.0 6.000000000000 5.999999999 5.0 24.000000000000 24.00000002 BASIC was designed in the 60s, and took off in a big way. It was showing its limits in the 80s and it is widely maligned these days. Other languages like C took over, and few people use BASIC any more, even for teaching. But that isn’t fair. David Brin has a great article “Why Johnny can’t code” from 2006, which outlines why languages like BASIC (if not BASIC alone) were a core part of computer programming education. The logical thought process involved in such languages is at the heart of code design, whatever fancy new paradigm and language you adopt. So I don’t advocate BASIC as a useful language for doing data science, but it was great for learning. Julia is also great for learning, as well as in many other ways! Julia has exactly what BASIC had, in terms of learning how to program. You can sit and type a logical sequence of operations, and see them work. And it has a lot more to offer. # Adventures in Space and Time So Julia has one big plus in my book, but there is another. Languages such as Matlab encourage users to be lazy about number representations. Matlab users typically represent all (real) numbers as double-precision (64 bit) floating point numbers (using the IEEE Standard 754). These are flexible and convenient but not ideal for all purposes. Being careful about your numbers is an important part of writing good code. The most obvious case is image data, and anyone working on images in Matlab is probably familiar enough with using integer types to avoid the common problems, but there is a continuous drift towards the convenient option in such languages because, for instance, functions don’t specify the types for input arguments. That’s great when you are getting prototypes working quickly. It’s terrible when you are trying to work with large datasets. And it isn’t always easy to add types to Matlab once you have started. Other languages don’t have serious control of number types at all. The usual trade-off is you spend more time to code the critical parts of a program in C or FORTRAN. So you save space (memory) but use up your coding time. The beauty of Julia is that you code your prototypes quickly, and can directly add types after the fact. You get the best of both worlds. So that brings us back to the Sharp. When you learned to code in such a small space you always have this in the back of your mind. You think about whether a number should be an integer or a float, and how many bits it needs. Everyone should try programming in one of these (rather limited) devices at least once. Part of the care is in use of space, and part is in knowing about the limits of number representations. Floating point numbers are used to represent decimals, but they are approximations. And they aren’t at all simple, as we saw above. I’ll write more on floating point numbers in the future, but for the moment note that you can have such oddities as +0 and -0, which are different! There is much more information here. For some quick insights try (in Julia) typing things like using Printf @printf("%.12f\n", Float32(.1) ) which converts 0.1 into a 32-bit floating point number, and then prints out the decimal version of the approximation, which in this case is not 0.1, it is 0.100000001490. You can see why I say that the gamma function calculations were accurate (above) when just inputting a number like 0.1 generates errors of a similar magnitude! The BASIC gamma function code above is amazingly clever in how it uses floating point. # Conclusion Poets work inside constraints — Haiku8 have an exact number of syllables for each line. Constraints are where you find beauty, and this is true of writing code in small spaces. Such code has to be elegant. It has to be efficient. Learning to write such code is well worth your time, even if your typical application works with terabytes of data. Especially if your typical application works with terabytes of data. Oh, and 1982 was the best year ever :) Oh, and sorry if my koan on short programs was too long. # Postscript I only just (Jan 17) came across this post which has Terry Tao’s first paper “Perfect Numbers” (1983). Terry (a Field’s Medalist and winner of the inaugural Riemann prize and fellow South Australian), wrote this paper when he was 8! The paper is primarily a BASIC program that calculates perfect numbers (a perfect number is one whose factors add up to itself). I don’t know if it was implemented on a pocket computer but it is small enough that it could have been. # Acknowledgements Thanks go out to Sylvia and Jono for editing this one. # Resources General information on the Sharp PC-1500: https://www.old-computers.com/museum/computer.asp?st=1&c=965 http://www.rskey.org/pc1500 http://pocket.free.fr/html/sharp/pc-1500_e.html http://www.vintage-computer.com/sharppc1500.shtml http://www.aldweb.com/articles.php?lng=en&pg=25 http://www.aldweb.com/articles.php?lng=en&pg=26 https://www.tramsoft.ch/sharp_erweiterungen/index_en.html Advertisements and reviews from the 80s: Australian business has really taken to Sharp, ELECTRONICS Australia, April, 1983. Calculator or Computer, Popular Mechanics, Aug, 1982. Portable computers, New Scientist, April, 1983. Programming: The Manual http://www.kaibader.de/tag/lh5801/ http://www.gelhaus.net/cgi-bin/page.py?loc:8bit/+content:sharp_pc.html https://rkixmiller.dudaone.com/old-hardware-emulated-pockemul-sharp-pc1500 Pocket Computer Programs If you want a play, but don’t want to buy your own pocket computer, PockEmul can apparently emulate it, thouhgh I haven’t tried it. Examples Learning BASIC Like It’s 1983 Floating point weirdness http://www.rskey.org/~mwsebastian/miscprj/results.htm https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/ Footnotes The Radio Shack TRS-80 Pocket Computer PC-2 was a rebadging of the Sharp, sold in America. ↩ It’s 195 x 86 x 25 mm in size. I don’t know anyone with pockets this big. But most computers were pretty clunky back then. And it was an incredibly light 375 grams with the batteries in (4 AAs). ↩ It has a 60-pin expansion port that apparently can interface to a tape drive, but good luck with that. ↩ “Closed-form expression” is a mathematical way of saying that we know how to compure a function with a finite number of “standard” operations. Standard operations include standard arithmetic – +,-,x,/ – and certain functions such as trigonometric functions like cosine. There isn’t a real standard for what is allowed, but it for most computational purposes it isn’t a big deal. ↩ There are many works on the gamma function. The classic Handbook of Mathematical Functions edited by Abramowitz and Stegun has a chapter on the topic, as does Bell’s Special Functions for Scientists and Engineers. Abramowitz and Stegun also includes a set of tables precisely because it is hard to calculate (without a computer). ↩ The gamma function can also be extended to the complex plane, for those who care. ↩ This code is a little limited. It will return an error if you try to calculate Gamma(-1), for instance, which is expected, but it has trouble with smaller values as well. ↩ There is actually a question on StackExchange asking people to write an executable haiku that outputs a haiku, so it isn’t just poets. ↩

Sharp Julia BASIC

# Aleph-Zero-Heros

This is a blog about (mathematical) data science of large-scale hybrid narratives and superheros in particular.

The title of the blog comes from a perhaps obscure piece of mathematics: aleph zero is (mathematically speaking) the cardinality of the set of natural numbers, which is a pretty mathsy way of saying “infinity”. So the name, aleph-zero-heros, loosely translates as infinite heroes.

The blog is going to be focused on superhero narratives, hence the title, but it is broader than just that. We’re going to look at large-scale hybrid narratives. A hybrid narrative (to me) is a story (fiction or non-fiction) that involves multiple forms of media. We can get an idea by considering simple examples of old-school hybrids

• picture books and graphic novels