Posts in category 'hacking'

  • Where Pharo Falls Short

    Well, as you might imagine from the title, I figured I’d take a bit of a break from my continual gushing about Seaside to examine some of the areas where I think Pharo/Squeak unfortunately falls behind as a development environment. Of course, keep in mind, I wouldn’t be using these tools if I didn’t think they were an overall win, despite their shortcomings. But perspective is always a good thing, and it’s important to see the bad as well as the good.

    So, with that said, where to begin… well, as I’ve mentioned previously, in general, Smalltalk implementations make use of an image paradigm for storing and managing code. In this world, from the outside, the image is a monolithic blob of binary data, but contained within is essentially a snapshot of the entire Smalltalk environment. Open that image with a VM, and you’re presented with a completely self-contained world, including a windowing system, editors, file managers, and so forth. And in the case of Pharo or Squeak, that entire world is open to unlimited poking and prodding, as the deepest bowels of the system are themselves written in Smalltalk and available for inspection.

    However, this metaphor has influenced Smalltalk in ways that, I think, have proven a detriment to it’s adoption. For example, in general, there is no other way to edit Smalltalk code, save through the editor(s) provided by the environment. Now, granted, because that editor is deeply tied into the system, it’s capable of browsing code in some very impressive ways. But it means the user is given absolutely no flexibility to select a tool of his or her choice. And in the case of Pharo/Squeak, those tools can be a bit primitive (and occasionally buggy), providing little in the way of customizability (well, unless you want to hack the code, which you are, of course, free to do), while failing to provide facilitates that one often takes for granted (macro facilities, multiple copy/paste buffers, regexp-based search/replace… the list goes on). In fact, the Pharo/Squeak editor is little more than a basic Notepad-style editor with syntax highlighting and some primitive auto-indentation capability.

    Additionally, much as there is no option for editors, the choice of code management tools is extremely limited. The current tool of choice for version control in Pharo/Squeak is Monticello, a form of distributed version control system. Unfortunately, compared to, say, git, Monticello is decidedly primitive. Now, granted, there are those attempting to implement Git for Pharo/Squeak, but those projects are only just beginning, and one will still be limited to working in the Pharo/Squeak environment, and the tools available there.

    Lastly, the Pharo/Squeak VM itself can sometimes be rather… frustrating. The VM itself is single-threaded, which means that any long-running piece of code, if not invoked in a background process (implemented as green threads), will hang the VM. Fortunately, the Alt-. hotkey exists to interrupt such operations so they can be terminated, but if the operation is sufficiently nasty (say, accidentally looping and inspecting items in a collection, creating a very large number of windows), it can be very unpleasant to clean up. Moreover, the image itself is only saved upon demand (ie, there are no automatic saves to a separate backup file, like in many editors), and so if something catastrophic does happen to take down the VM, one’s work can be lost.

    Unfortunately, in the end, despite all these shortcomings, the damn language and libraries are so good, I just can’t help but work with it. While I’m sure much time is wasted dealing with the aforementioned issues, so much is gained from sheer productivity that the win is clearly there. Plus, I must admit, Smalltalk is just plain fun to write. In fact, I haven’t had this much fun in years!

  • Persistence in Squeak

    Ah, deliciously punny post title. You’ll see, assuming you make it to the end of this thing… don’t worry, I won’t blame you if you don’t.

    Over the last week or so, I’ve been working on a little toy project, partly to fill a need I have, and partly to fiddle around with developing a web application in Seaside and Smalltalk, and specifically the Squeak implementation of Smalltalk.

    Now, there are many parts needed to build a functional web-based application. Obviously you need a web server to actually serve the application. You need some sort of language to implement the application in. You need a framework in which to actually build that application (okay, sure, back in the days of the wild west, people built their own, but you’d be a fool to do that today given the plethora of frameworks available which simplify the web development process). And last but not least, in all probability, you need a data persistence solution.

    Of course, the first thing that comes to mind when turning ones thoughts to persistence is a good old fashioned relational database, which has been the cornerstone of data persistence for many a decade now. But when one is working in a deeply object-oriented language like Smalltalk, working with a relational database becomes rather cumbersome due to the rather substantial impedance mismatch between relational and object-oriented data modeling. As a result, we as an industry have turned to tools such as automated object-relational mappers (eg, Hibernate, etc) to try and ease the pain of this mismatch, but in general, the results aren’t what I would call pretty.

    Which is why, during my first hack at leveraging a persistence solution for my little Seaside application, I decided to try something entirely different: an object-oriented database called Magma. Unfortunately, it didn’t go too well.

    On Magma

    Magma is a very interesting project. As a persistence solution, it really aims for the same space occupied by Gemstone/S: to act as a completely transparent persistence solution for object-oriented data models. By that I mean the idea is that you hand Magma an object graph, and it persists it to it’s own custom data format on disk. When you pull it back out, Magma reifies parts of the object graph you’re interested in, and when you modify the graph, Magma spots the changes and reflects them back into the persistent store.

    Of course, on the face of it, this seems like absolute magic. You simply work with your objects. When you want to persist a change, you just do something like:

    session commit: [ model doStuff ].
    

    And voila, everything just, well, works. Of course, persistence is about more than just simple object storage, in that you also need to be able to query the data model, and be able to do so in an efficient manner. To that end, Magma provides a few specialized collection objects, such as the MagmaCollection class, which provide interfaces for applying indexes, querying, sorting, and so forth.

    So, on it’s face, Magma looks like a fantastic solution! The transparent persistence model makes it dead easy to manipulate your data model, and you no longer have to jump through all the object-relational modeling hoops that one would normally have to deal with.

    But, alas, it’s just not that easy.

    Unfortunately, Magma has one serious fault that rules it out for all but the most basic data-driven applications: It’s slow. Additionally, because Magma absolutely requires per-attribute indexes for any collection you want to query, the number of indexes in a data model can grow substantially, particularly in data mining/exploration tools. Worse, Magma steps on a rather nasty performance problem in Squeak whereby large numbers of files in a single directory (as in, thousands) causes the FileDirectory class to bog down… and guess what happens when you create a large number of indexes? That’s right, a lot of files get created in a single directory, and so you get utterly dismal performance when any index is initially opened.

    And as if that weren’t enough, in order to really squeeze decent performance out of Magma, you must start tweaking what are called “read strategies”. See, when you start reifying an object graph, you have to make a decision on how deep to go before you stop. After all, if you have a deep tree of objects, unless you plan to traverse that whole tree at some point, it’s a waste of time to load the whole thing all at once. So the “read strategy” dictates at what depth various parts of the object graph are read. But ultimately, what this equates to is deep micromanagement of the database behaviour, and, quite frankly, I have absolutely no interest in that.

    Thus, after many days of fighting, I’ve decided to throw out Magma. Which is rather painful, as I already have an object model built up assuming it’s use. Fortunately, the very nature of Magma means you don’t really tailor the object model too tightly to the database, but things do leak through here and there, and the model itself must, to some extent, be designed to facilitate querying, traversals, and so forth. Thus, any movement away from an RDBMS will necessitate rethinking my data model.

    A Way Forward?

    So what now? Well, I’ve decided to take the hit and switch to a solution based on Glorp, an object-relational mapping system for Smalltalk, and PostgreSQL, that venerable RDBMS. Of course, this will likely come with it’s own issues, first and foremost one of installation…

    Unfortunately, while Squeak package management has taken a step forward with Monticello, the management of dependencies between packages, and inconsistencies between platforms (eg, Pharo vs Squeak) means that things are a lot harder for the user than they need to be. In this particular case, the original Glorp port is rather old. So the folks developing SqueakDBX have worked to port over the latest version to Squeak, with some success. Unfortunately, their installation script doesn’t appear to work in Pharo. So I had to resort to pulling in their loader classes and then manually executing the installation steps by hand. Tedious, to say the least.

    But, on the bright side, I have a Pharo image that seems to have a functional Postgres client and Glorp install, so I can start fiddling with those tools to see if they can meet my needs.

    Which brings me back to the double entendre. Because returning to the Squeak world has reminded me of one thing: Occasionally the tools get in your way as much as they clear it out for you, and so sometimes you really do need to be incredibly… yes, I’m gonna say it, get ready… here it comes… persistent.