Posts in category 'hacking'
- (https://b-ark.ca/eaioyc)
The ai thought leader on Twitter is now one of my favourite things (and also makes me want to play around with GPT2)…
- (https://b-ark.ca/UiEMeu)
After finishing my DIY keyboard project (http://b-ark.ca/MIYO2i) I’ve been trying to decide what I want to try as my next hardware hacking project, and this is definitely on the list…
Fun with themes
For kicks I wanted to build a dark mode for my blog, which led me down the garden path of CSS custom properties and easter eggs…
One of the many things that attracted me to tech, back in the day, was the total DIY freedom of hacking computers to do whatever I wanted. And when you’re a kid, it’s even more fun because you aren’t looking at your pet projects through the lens of “value” or “product market fit” or “differentiators”. You just… do stuff, simply because it’s fun!
Or, put more simply: You play. And as adults, we have play beaten out of us. And that is just a darn shame.
Well, one of the fun things about running your own blog on your own server with software you control is that it’s a wonderful place to play! Heck, the re-design of this blog started off as just me screwing around for the fun of it.
So, while on vacation, I thought it would be fun to build a dark mode theme for my blog using the technique outlined in this post (which it turns out is one of many).
If you want to see the results… well, first off, if your OS is set to use a dark mode theme, you might already be seeing it! Otherwise, the little lightbulb icon in the navbar toggles the themes.
In addition, if you poke around in my site, you might find an easter egg that enables a couple of additional retro themes designed to honour the computers of the past that inspired me and lead me to where I am today!
Continue reading...On Algorithms
So, generally speaking, I’ve typically adhered to the rule that those who develop software should be aware of various classes of algorithms and data structures, but should avoid implementing them if at all possible. The reasoning here is pretty simple, and I think pretty common:
- You’re reinventing the wheel. Stop that, we have enough wheels.
- You’re probably reinventing it badly.
So just go find yourself the appropriate wheel to solve your problem and move on.
Ah, but there’s a gotcha, here: Speaking for myself, I never truly understand an algorithm or data structure, both theoretically (ie, how it works in the abstract, complexity, etc) and practically (ie, how you’d actually implement the thing) until I try to implement it. After all, these things in the abstract can be tricky to grok, and when actually implemented you discover there’s all kinds of details and edge cases that you need to deal with.
Now, I’ve spent a lot of my free time learning about programming languages (the tools of our trade that we use to express our ideas), and about software architecture and design, the “blueprints”, if you will. But if languages are the tools and the architecture and design are the blueprints, algorithms and data structures are akin to the templates carpenters use for building doors, windows, etc. That is, they provide a general framework for solving various classes of problems that we as developers encounter day-to-day.
And, like a framer, day-to-day we may very well make use of various prefabbed components to get our jobs done more quickly and efficiently. But without understanding how and why those components are built the way they are, it can be very easy to misuse or abuse them. Plus, it can’t hurt if, when someone comes along and asks you to show off your mad skillz, you can demonstrate your ability to build one of those components from scratch.
Consequently, I plan to kick off a round of posts wherein I explore various interesting algorithms and data structures that happen to catch my attention. So far I have a couple on the list that look interesting, either because I don’t know them, or because it’s been so long that I’ve forgotten them…
Data Structures
- Skip list
- Fibonacci heap
- Red-Black tree
- Tries
- Radix/PATRICIA Tries
- Suffix Tries
- Bloom filter
Algorithms
- Various streaming algorithms (computations over read-once streams of data):
- Heavy hitters (finding elements that appear more often than a proscribed freqency)
- Counting distinct elements
- Computing entropy
- Topological sort
And I guarantee there’s more that belong on this list, but this is just an initial roadmap… assuming I follow through, anyway.
Sometimes You Actually Have To Rewrite It
So, a couple years back I started doing some subcontracting work for a buddy of mine who runs a little ColdFusion consultancy. As part of that work, I took ownership of one of the projects another sub had built for one of his client, and the experience has been… interesting.
See, like PHP and Perl, ColdFusion has the wonderful property of making it very easy for middling developers to write truly awful code that, ultimately, gets the job done. And so it is with this project. My predecessor was, to be complementary, one of those middling developers. The codebase, itself, is a total mess. Like, if there was a digital version of Hoarders, this code might be on it. But, it does get the job done, and ultimately, when it comes to customers, that’s what matters (well, until the bugs start rolling in).
Of course, as a self-respecting(-ish) developer, this is a nightmare. In the beginning, I dreaded modifying the code. Duplication is rampant, meaning a fix in one place may need to be done in many. Side effects are ubiquitous, so it’s difficult to predict the results of a change. Even simple things like consistent indentation are nowhere to be found. And don’t even dream of anything like automated regression tests.
Worse, feeling no ownership of the code, my strategy was to minimally disturb the code as it existed while implementing new features or bug fixes, which meant the status quo remained. Fortunately, around a year ago I finally got over this last hump and made the decision to gradually start modernizing the code. And that’s where things got fun.
One of the biggest problems with this code is that data access and business logic are littered throughout the code, with absolutely no separation between data and views. And, remember, it’s duplicated. Often. So the first order of business? Build a real data access layer, and do it such that the new code could live beside the old. Of course, this last requirement was fairly easy since there was no pre-existing data access layer to live beside…
So, in the last year, I’ve built at least a dozen CFCs that, slowly but surely, are beginning to encompass large portions of the (thankfully fairly simple) data model and attendant business logic. Then, as I’ve implemented new features or fixed bugs, I’ve migrated old business logic into the new data access layer and then updated old code to use the new object layer. Gradually, the old code is eroding away. Very gradually.
Finally, after a year of this, after chipping away and chipping away, finally, while there’s still loads of legacy code kicking around (including a surprising amount of simply dead code… apparently my predecessor didn’t understand how version control systems work–if you want to remove code, remove it, don’t comment it out!), the tide is slowly starting to turn. More and more often, bugs that need to be fixed are getting fixed in one place. New features are able to leverage the object layer, cutting down development time and bugs. And some major new features coming down the pipe will be substantially easier to build with this new infrastructure in place. It’s really incredibly satisfying, in a god-damn-this-is-how-it-should-be sort of way.
The funny thing is, this kind of approach goes very much against my natural instincts. Conservative by nature, I’m often the last person to start rewriting code. However, if there’s one thing this project has taught me (along with a couple wonderfully excited, eager co-workers), it’s that sometimes you really do have to gut the basement to fix the cracks in the foundation. And sometimes, you just gotta tear the whole house down.
Fun with Open Data
A Simple Problem
So, recently I found myself struggling with a very simple problem: I kept missing our darn garbage pickup date. Yeah, sure, the city sends out a little printed schedule, and it usually makes it to my refrigerator, but given that requires I actually pay attention to the silly thing, it doesn’t end up helping me much.
Now, in the past, I’ve solved this problem by manually punching the pickup dates into Google Calendar. This works, but it’s tedious, error prone, and whenever the new schedule comes out, I have to do it all over again. And if it wasn’t clear, I’m lazy. Real lazy. So last time around I simply didn’t get around to doing it.
And so I miss the pickup dates.
Obviously what I really wanted was an iCalendar formatted file that I could just subscribe to in Google Calendar, at which point I could move on with my life. But, alas, to my knowledge no such resource exists.
An Idea Is Formed
Well, I found myself discussing this with my officemate, Steve, and he pointed out that the pickup schedule is, in fact, available online in raw form as part of the City of Edmonton’s Open Data Catalog. The Open Data Catalogue is a remarkable resource. In it you can find a staggering amount of data about the city, provided in a simple, machine-readable form, and browseable online. It’s really very cool, and it feels like a resource just waiting to be tapped.
Anyway, back to the topic at hand, it turns out the pickup schedule is available right in the catalogue, albeit in a fairly raw form. At this point, the answer seemed obvious: write a tool which could consume this data, and produce an iCal file as output. I could then subscribe to the generated calendar, and voila! No more missed garbage days!
Why Can’t Things Be Easy?
Unfortunately, things got a little tricky once we started digging into the data. It turns out that the city is subdivided into a series of zones. Each of these zones then has one or more garbage pickup days, and so one’s individual pickup schedule varies based on your geographic location in the city. As such, the pickup schedule data is organized as a simple table as follows:
Zone Day Date So, now we have a problem: how do we determine the zone and day for a given household?
Well, it turns out the city provides the zone/day data as a geographic overlay, in the form of a KML data file. KML is a rather complex XML dialect which can be used to represent geographic data, and is used in, among other placed, Google Maps, Google Earth, and so forth. But, it’s a standard, and there are standard tools for handling it, so a gold star for the city!
Alright, so now we have the geographic data, we just need to parse that file, and then based on household location, identify the appropriate zone and day, and extract the correct schedule information. This should be easy…
A Solution
The first question was one of language. I knew I needed a few things:
- Access to a decent web framework, so I could quickly slap together a (preferably REST) web service.
- Something to parse the KML data.
- Something to do point-in-polygon tests, so that, given the boundaries of a zone and a household lat/long, I could determine if the household was in that zone.
- A CSV library would be handy (this is the form the schedule data takes).
- A library for generating iCalendar files.
And, in addition, given this was a quick learning project, I figured it’d be nice to choose something I haven’t written much code in. In the end, I settled on Python, and in particular:
- CherryPy - A simple, lightweight web framework with support for REST built in.
- libkml - A KML library with (poorly documented) Python bindings.
- RosettaCode RayCasting Algorithm - Code that implements the standard raycasting point-in-polygon test (I could’ve written this myself, but… why?)
- Python’s standard CSV module.
- vobject - A python library for parsing and generating iCalendar files.
Now, it turns out the hardest bit was in ingesting the KML file. The libkml bindings, while functional, are awful. They aren’t terribly pythonic, they’re horribly documented, and the examples and tests provide no coverage of the API. As a result, I ended up reading a lot of SWIG binding definition files, in order to determine the method calls available in the object model provided. But, eventually, I was able to figure out how to extract both the zone metadata (name, day, etc), and the polygon definitions, and with that, it was fairly easy to write a small library to identify a household zone based on their latitude and longitude.
After that, the rest was really pretty easy: the code determines the user’s zone, downloads the schedule data in CSV form, identifies the rows for that household, and then spits out an iCalendar file containing that schedule data. Voila, done! The service is available here:
http://api.b-ark.ca/garbage/schedule?latitude=&longitude=
Just enter your lat/long, and you should get a valid iCal file back. Nice! And even better, it took me roughly a day to slap the whole thing together.
Getting Fancy
Upon posting about this on Google+, another friend of mine pointed out that it was rather onerous to expect the user to go look up the lat/long of their household in order to use the service. This got me thinking of alternatives… wouldn’t it be great if the user could just provide their address, and the service could do the rest?
Well, happy days, it turns out Google offers a free REST API for geocoding (the technical term for the process of mapping an address to a latitude/longitude pair (or vice versa)). Just hit the service with a street address, and they’ll hand you back geographic data in the form of a JSON or XML document. Nice!
About ten lines of code later (with the help of Python’s minidom module), I had a service which could take an address, get the lat/long automatically, and return the appropriate iCal file. You can hit this version here:
http://api.b-ark.ca/garbage/schedule?address=
Just append your address to the URL, and you could get an iCal file back. Thanks Google!
Update:
BTW, if you want to take a look at the (quite simple) code, you can check it out on github.