Purely Functional Data Structures 427
Purely Functional Data Structures | |
author | Chris Okasaki |
pages | 220 |
publisher | Cambridge University Press |
rating | 8/10 |
reviewer | Andrew Cooke |
ISBN | 0521663504 |
summary | Functional programming for grown-ups. |
In Okasaki's introduction he says that the "[...] benefits of functional languages are well known, but still the vast majority of programs are written in imperative languages such as C. This apparent contradiction is easily explained by the fact that functional languages have historically been slower than their more traditional cousins, but this gap is narrowing."
Indeed, OCaml has a reputation for being "as fast as C," yet it contains automatic memory management and supports object-oriented, as well as functional, programming. It's also probably the most widely used functional language outside academia (except perhaps Lisp/Scheme).
I mention OCaml not just because it's fast, free and popular, but because Okasaki uses a related language - ML - in his book. The ML family of languages are the standard strict, functional languages (Standard ML of New Jersey is perhaps the reference implementation but see also standardml.org). Okasaki also includes an appendix with examples in Haskell, which is the standard lazy functional language.
The difference between lazy and strict languages is the order in which code is evaluated. Most languages are strict. Unlike most languages, Haskell only evaluates something when it is absolutely necessary. Each parameter to a function, for example, is passed as a "thunk" of code, not a value. If the value is not required inside the function, the parameter is left unused; if it is required (say as part of a result that needs to be displayed) then the thunk is evaluated. This evaluation may trigger a whole slew of evaluations of functions that "should" have been called earlier (from a Java programmer's point of view).
Laziness is both good and bad. The bad side is obvious: the order in which code is executed my be very different from the order in which the program was written and some serious book-keeping is necessary in the compiler to juggle the thunks of code and their final values. The reordering of code could cause mayhem for IO operations, for example (in practice, of course, Haskell includes a solution to this problem).
The good side is that laziness can help make programs more efficient and, while the definition of ML doesn't include laziness, individual ML implementations -- including OCaml and SML/NJ -- include it as an extra.
Much of Purely Functional Data Structures (the second of three parts) focuses on how to use laziness to make data structures efficient. Lazy evaluation allows book-keeping actions to be postponed, for example, so that the cost of maintaining the data structure in an efficient form can be averaged across several read/write operations (improving worst case limits - avoiding a very slow response if the data happen to be in a "bad" order).
An understanding of how the efficiency of algorithms is rated (the big-O notation) is one piece of knowledge that this book does assume, along with a basic grasp of what Stacks, Queues, Trees, etc, are.
This lazy boost in efficiency is needed because, even though functional languages may be getting faster, it's not always possible for them to implement the efficient algorithms used in imperative (non-functional) programming.
But I'm getting ahead of myself, because I haven't described what a functional language is, or why it is useful. These are the topics of the first part of the book, which explains how functional languages, which make it impossible to change variable values by direct assignment, support persistent data structures. This section is fairly brief, and while it's a good refresher course for someone who's not had to worry about such things since studying at university, it's not sufficient as an initial introduction to functional programming in general.
There's a good explanation of functional programming in the Wikipedia, but, in all honesty, I don't see how anyone can really "get it" without writing functional code (just as I, at least, couldn't understand how OOP worked until I wrote code that used objects).
So forgive me for not telling you why functional programming is good (This paper is one famous attempt), but perhaps a better question to focus on is "Why should you spend the time to investigate this?" The best answer I can give is that it leads to a whole new way of thinking about programming. Describing functional programming as "excluding assignment to variables" doesn't do justice to the consequences of such a profound change (one I found almost unimaginable - how can you program without loop counters, for example?).
There's a practical side to all this too - learning new ways of thinking about programs makes you a better programmer. This ties in closely with the final part of Okasaki's book, which explores a few fairly esoteric approaches to data structures. Who would have thought that you can design data structures that parallel the way in which you represent numbers? Some of this is pretty heavy going - I can't say I understood it all, but I'm taking this book with me on holiday (it's slim - just over 200 pages) and I'll be bending my brain around some of the points in the last few chapters as I lie in the sun (here in the southern hemisphere it's late summer).
So just who would benefit from this book? It seems to me that it's most valuable as a second book on functional programming. There are a bunch of texts (and online guides) that can get you started in functional programming. This one goes further. It shows how to exploit the features of functional languages to solve real problems in applied computing. Admittedly, they are problems that have already been solved in imperative languages, but you might find that you, too, come to enjoy those famous benefits of functional languages. The algorithms in this book let you enjoy those benefits without paying the price of inefficiency.
Andrew Cooke last reviewed for Slashdot The Aardvark is Ready for War . You can purchase Purely Functional Data Structures from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
O'Camel (Score:4, Interesting)
Functionals (Score:3, Interesting)
This was a great review (Score:5, Interesting)
I guess what I'm saying is that I've used languages like Perl and Python considerably and ignored the functional aspects of the language, probably much to my disadvantage. I think a good study of a purely functional language could really improve my perl, python, or ruby.
Re:Functionals (Score:4, Interesting)
I agree that functional programming languages are quite useful, but speaking as a coder who learned functional programming just last year in class, I can say that functional languages are a lot more complicated than procedural languages.
Sure, once you get good at it you can bang out a functional program easily, and maintenance can be a breeze once you know how to write the code. However, reading and understanding code that ISN'T yours can be damn near impossible sometimes, especially when you're a newbie.
I'm not discounting the use of functional languages, I'm just saying they are harder to learn than procedural languages.
Re:This was a great review (Score:5, Interesting)
> language could really improve my perl,
> python, or ruby.
Right on. Here's a quote from Yukihiro Matsumoto, creator of Ruby [ruby-lang.org]:
There's an Artima article [artima.com] where he gives some of his reasons for this idea. That whole series of interviews with him is pretty good.
Re:The memories... (Score:3, Interesting)
Just to clear up, this is because of the lazy evaluator. Because code is executed until it needs to be, you never know what's happening when. For example:
System.out.println("I have reached this point");
would be meaningless in a functional program, because from where was this code executed? Hence, not only have you got to re-learn how to write your programs, you have to re-learn how to debug them!
(As a footnote, I'd say the advantages (lazy evaluation, advanced pattern matching functions, less code, recursion, etc.) outweigh the disadvantages (hard to debug) by a mile!)
Re:The memories... (Score:3, Interesting)
If you get the right mindset, recursive solutions as employed in scheme are very easy to reason about and get correct. The reason is that you can use formal proofs to _prove_ the correctness of your code. You can use the mathematic principles of induction to prove that your code is correct, but only in a purely functional atmosphere.
It takes a little getting used to if you are an imperative programmer, but its worthwhile.
However, I will say that the indentation practices of most Schemers is dreadful, and is one of the reasons why tab characters should not be directly equivalent with 8 characters. You see, if you make a tab equivalent to "arbitrary indentation of one level", then the user can set their own tab stops, and when a statement gets unwieldly and deep, you can just shorten the indentation to 1 or two spaces. But when you need some whitespace to view the algorithm better, you can expand it to 4
or 5, or even 8.
Re:Functionals (Score:3, Interesting)
People I have met who started out on functional and can write a lisp program as quick as I can write C often have serious problems with even the simplest program imperative languages just as I will get stumped by something stupid when trying to use lisp.
As in natural languages it is normally your first language that you are going to be comfortable with the most although you can obviously learn others. In coding its a bit more complex since you may find that your first language was terrible and pick up another one but people will generally stay in the same paradigm of imperative, functional or OO. It is a challenge to cross over once your mind works in a certain way.
Re:Just got this book (Score:2, Interesting)
I tried to read that part of the tutorial a couple of times but I got my brain fused ever time.
Cool languages, but why... (Score:2, Interesting)
I enjoy learning new programming languages but because of stuff like this, I wonder why I should. I still will because I have done non-trivial stuff in them very easily but it is a big downer. At least they let authors write neat books for us geeks to take on vacation.
Re:Just got this book (Score:4, Interesting)
No no no... I just got the book a few days ago and have been reading it. I've forced myself to play around with OCaml and various functional languages for the last month or so to try out the paradigm, and I must say I'm impressed by the compact expressivity, and the safety of these languages.
I love the idea of writing programs that can't crash (well, they can hang but they can't segfault), and being able to so elegantly describe an algorithm that I might as well just be writing the math.
I'm also not exactly upset by the type-inference (I never have to specify the type of almost *ANYTHING* despite the fact that there's no implicit casting). Also being able to make functions that are polymorphic in extremely complex ways allows me to keep algorithms much more general. Functors, for those that havne't used them, are simply awesome. Think templates but done right.
We need more good book reviews like this. Slashdot should hire the guy
Cheers,
Justin Wick
Functional Programming missed the boat (Score:5, Interesting)
Network effects are what rules the programming tools industry. Network effects are whim to fashion. Fashion is ruled by those with the legitimacy to glamorize.
Re:I heard this mentioned on some previous article (Score:2, Interesting)
Re:Functionals (Score:3, Interesting)
I'm not sure I'd consider most functional languages complicated. Look at the number of types and basic functions in something like SML or OCaml... a few basic types, and a few functions to act on them... There's really not a lot there to learn.
If anything many functional programs are simpler because they directly map to the mathematics of the algorithm given... especially for recursive algorithms!
I believe that often times it's more difficult to program in, but that's hard the same as "complicated".
Sure, once you get good at it you can bang out a functional program easily, and maintenance can be a breeze once you know how to write the code. However, reading and understanding code that ISN'T yours can be damn near impossible sometimes, especially when you're a newbie.
Actually one of the main problems that is noticed within the functional community is that because there's a lot of recursion and tight coupling between different parts of algorithms, they can be hard to maintain. That's why a lot of people are excited about monads [ed.ac.uk]. They allow the use of modularity in a very cool, and appropriately mathematical way! You should check them out of you haven't already, they greatly simplify the task of writing things like interpreters, etc.
I'm not discounting the use of functional languages, I'm just saying they are harder to learn than procedural languages.
I'm not really sure that's true. I know people who's first programming language was functional, and they found it equally hard to move to a procedural language - worrying about things like state, ordering, pointers, globals, memory management, bounds checking, interrupts and callbacks and non-persistent data structures, and even manually-implemented laziness.
Advanced features of any language are difficult to learn, but I'd suggest that anyone who knows a little math can learn to write a small but useful functional program as easy or easier than they could in something like C. Not to mention many algorithms are so naturally recursive, but tail-recursion optimization doesn't seem to work well on many C compilers.
It's certainly different though!
Cheers,
Justin Wick
Re:A coder's hobby (Score:3, Interesting)
Re:Purely *Functional* Data Structures (Score:5, Interesting)
I didn't get a degree, but I did take the hard way of learning comp-sci. I spent years of my time studying the various texts and papers that students *should* be studying. Some people complain that, "well you can't be a *true* comp-sci professional because you didn't pay for this piece of paper." I just shake my head at their insecurity and offer to help them solve whatever their current problem is.
There are my credentials. Take them or leave them. My only recommendation is that you don't underestimate what I can do, or what I have done.
Re:The memories... (Score:2, Interesting)
After just a few weeks of programming seriously in Lisp, I found it easier to debug than languages I've used for years. The reason for this is that everything is a function. You can unit test practically everything. You can try it out from the interactive loop.
My typical hard to find bugs at the beginning were typically because I was using the language wrong, not because I was using the syntax incorrectly. A good example of this would be using a counter combined with a while loop instead of recursion or my favorite, the loop facility.
Took Okasaki's data structures course (Score:5, Interesting)
The course was pretty mind-blowing. He knows his stuff. It was a bit freaky to watch him grading programming assignments by just reading them, not running them, and yet never missing a mistake.
I would recommend the book not just as an introduction to advanced data structures in functional languages, but as a guide to some of the more interesting data structures of the last fifteen years, regardless of implementation language.
-- Phil Gross
Re:Purely *Functional* Data Structures (Score:5, Interesting)
Yes, I can. And that's my point. I learned how when I was given my first book on data structures. It was a little weird at first, but if you're going to do things right, you have to know the math behind them. In fact, modern computers long ago made me give up the desktop calculator. Trying to develop for the processing power of today requires numbers far beyond what my old desktop calculators could do. It's too bad, because it was so convenient to not have to switch windows. And yes, I'm too cheap to get a decent scientific calculator.
Put out a publication comparing your methods with other known methods.
You mean like this one [javagaming.org]? Looking back at it, I should have taken the time to clean up the english. I was so excited about my algo, that I just pushed it out.
Keep an eye on Java Developers Journal for an article on logging. They wouldn't let me publish a pure research paper, but I was able to squeeze in a dissertation on using ThreadLocals for multiplexing a stream.
Does anyone know any *good* comp-sci journals? Dr. Dobbs looks promising, but I had already promised an article to JDJ.
Letting you in on a little secret (Score:4, Interesting)
Encapsulation is a restriction not a benefit.
Just as non-open source software allows your data to be held hostage by the class producer authors.
Reusability is poor because while both Accounting and Genealogy software might use a person object neither is interested in holding the other's baggage.
An object hierarchy becomes a house of cards, built on excruciating desicions of what to include and exclude at each level, and how much baggage to bring along.
Really the only good that ever came out of OOP was a straightforward way of passing around references to the same data, and easily creating and destroying records from classes.
<a href="http://www.lua.org/home.html">Lua</a&g t; is probably the best syntactically designed language out there in terms of both expressive power and readability.
It has an easy to read vb/pascal like syntax without semicolons.
It has full functional powers
No silly restrictions such as immutable variables.
Tables as a replacement to both lists, and classes,
essentially fully associative keyed lists
and since any value can be a function a Table can act as a "method" container.
Powerful extention mechanisms to define class inheritance/relationships within the language itself (rather than the compiler).
great calling convention with multiple assignment/return values.
Lua is a great introductory language, in that it is exceptionally readable. While you will immidiately gravitate to using functional paradigms, you're not forced to using them exclusively. It has elegant OOP and imperative paradigms as well.
Re:Functional Programming missed the boat (Score:3, Interesting)
Furthermore, I think it's also the result of software firms wanting to segment many areas of a program into smaller compartments. OOP makes this easier by enforce abstract and interface types, such that there is reasonable expectation that if someone codes under a given construct, it will work as needed. Based on UML diagrams, it is also far easier to break part programs using an OOP language and blueprint them. Can you imagine the nightmare of doing a large project like Doom 3 in Scheme, with so many functions defined on-the-fly and passed to and from other functions?
Scheme and others have their place in the world in teaching newcomers to think recursively. This is a very useful way of thinking in any language. Common problems often have very elegant solutions in Scheme, and diving into the mindset of functional programming is essential for any serious programmer. But I believe the nature of the language holds it back. Granted, once you know what you're doing, it's not so difficult. But it would seem somewhat unintuitive to plan and design large commercial projects around these languages.
Good book? Yes. Useful? Not really. (Score:3, Interesting)
I'm speaking from experience as both a functional and imperative programmer. While I have enjoyed the book, I didn't find it to be something that I keep returning to over the years.
Prefer databases (Score:3, Interesting)
Relational tables are more stable as requirements change. Relational tables are mostly designed around the quantity of relationships between things, such as one-to-many, many-to-many, etc. They are NOT dictated by how they are actually used in a given application for the most part. This at first sounds bad, but it is good because normalized tables transcend specific uses, and are thus more flexible and change-friendly. Relational tables are to describe nouns and facts about nouns, not reflect specific tasks or usages. Thus, you don't end up with "pointer messes".
For example, if you want to represent a graph with dedicated data structures, you might be tempted to make a list of lists, where one list is the ID's (or pointers) to other nodes (links). But if one is later required to put weights on these links, then a list is no longer appropriate, and you have to overhaul your structures and code that references them. However, a (properly normalized) relational database would use a many-to-many table to represent the links. Adding a weight factor is a trivial column addition. Existing code still works without change (as long as it does not reference or need weight info of course).
However, SQL and tradition has "bulked up" databases beyond what they need to be for many applications. A light-duty "local" relational engine to complement or replace "big-iron" RDBMS would be nice. I used to use "nimble table engines", and they were easy-to-use and relatively quick. SQL is not the ideal relational language, especially for smaller DB engines. I would like to see the industry explore alternative relational languages. Then we could get away from the pressure to use dedicated data structures. I find them archaic, to be frank. Let's move on.
I have implemented some of these in Ocaml (Score:4, Interesting)
Red-black binary tree
Skew-binomial heap
Real-time catenable deque
They're buried in a library containing a lot of other goodies that I haven't ported to all the platforms where Ocaml runs. The data structure modules are pure Ocaml, though- so, you can just lift them. The library is BSD licensed (two-clause), so take all the liberties you want as long as you give me props in your distribution and you can cope with the fact that you get NO WARRANTY from me. (It would be nice if you told me you were using it too-- that would help motivate me to care about timely release of updates.)
* The real-time deque is not technically a pure functional data structure since it uses lazy evaluation for handling concatenation, but- to be fair, a lot of the algorithms in Okasaki's book have a similar property. He is, of course, careful to distinguish the difference between pure and non-pure functional data structures.
The tyranny of suckage: why Ocaml is not popular (Score:2, Interesting)
Here's an example of each:
- (language) lame support for imperative programming -- no equivalent to 'break' for loops, and even no equivalent to 'return' to allow you to return a value from the middle of a loop. Of course, imperative programming isn't the emphasis of OCaml, but it means that that "benefit" of having imperative features available when you need them isn't really quite as strong as you might like
- (implementation) useless compiler errors -- numerous mistakes will give you the unadorned response "Syntax error", and because the language is so lacking in redundancy, mistakes can be syntactically valid for a long way, causing the syntax error to show up 20 lines later; similarly, typecheck errors themselves can be hard to decipher, since the compiler doesn't show you where the inferred types are coming from, leaving you to track them down on your own
I base this opinion on having used Ocaml twice, working with several other programmers on IFCP entries, and from occasionaly pair programming with my officemate, who is probably the only game developer in the world using Ocaml.
Re:Purely *Functional* Data Structures (Score:3, Interesting)
Still, I wouldn't bet on it. It could be that you can only go so far with borrowed aesthetics. But it may be farther than I would think.
Re:Functionals (Score:3, Interesting)
Quicksort in Haskell Quicksort in C
Re:Functionals (Score:1, Interesting)
Later, I tried to learn a number of other languages but Python, a non-functional language, was the only one I could learn. Python doesn't have any gotos and doesn't in any way resemble calculator basic, but it's still very good for beginners. Later, I learned Haskell, which seemed very flexible and high-level but I/O was a whole new language, and very difficult to use at that. Haskell was the best for math and theoretical stuff, though. Its use of static typing while also having generic functions and parametric polymorphism helped avoid many bugs. Lazy evaluation is great but sometimes it can be hard to grasp the flow of a program.
I also learned Io, a language where everything is an object and a prototype. Unlike something like Java, which some advocates of Functional, procedural, or "table-oriented" programming think all OO languages are like, there is no difference between classes and objects, anything about objects can be changed at runtime, and not everything has to be put in objects. Io and other object-oriented programming languages are in some cases better for larger projects, where you might otherwise need to resort to using variable prefixes or internal use of a module system within a module in order to prevent name colision. There are also times when in defining a datatype in a functional language, it seems like it would be more useful to have mutable variables, something which object orientation handles nicely, even if sometimes variables are mutated when new ones should be made.
It would really be inappropriate to say that functional programming is always better or object orientation is always better. Depending on the situation, you need to choose which pardigm is best.
Daniel Ehrenberg
Re:O'Camel (Score:2, Interesting)
The big difference between OCaml and Lisp/Scheme is that even though both are *strongly* typed, that is, you can never, say, reference memory address pi becaus a float is simply not a pointer, they work it in different ways. Lisp/Scheme is dynamically typed, like Python, and stores the types with the values, which are checked at runtime. OCaml is statically typed, which means it does not need to store types in tags, but analyzes your program at compile time to ensure "type safety".
On the one hand, this gives a performance win. On the other hand, it makes programming slightly more painful at times. The expression "if x=3 then 5 else 2.0" is not acceptable to the OCaml compiler, because it cannot determine its type; the expression (if (eq? x 3) 5 2.0) is perfectly acceptable to a Scheme compiler. The advantage of this, it is claimed, is that most programs without well-defined types are buggy anyhow, so it helps you write good code.
Note that the same performance win can be had from not having the language be very precise about types at all, and randomly allowing dereferencing of integers and multiplication of pointers. The archetype for that, of course, is C, which is considered *weakly* typed (a.k.a. "wrongly typed"). This has the disadvantage that programs can behave unpredictably and cause system crashes very easily. If an OCaml or Scheme program ever segfaults, let alone causes a bus error, that's a bug in the compiler. If a C program does that, well you all know what that means...
Just to confuse you a little more, there's yet another distinction to be made, which is between explicitly and implicitly typed. A language such as C or Java is explicitly typed, meaning that you have to declare variables and functions as being of a certain type, although the explicit types in C are mostly sugar and can always be overridden. Python and Scheme are implicitly typed, that is, you don't mention any types, which is not a big deal, since there really is only one type. All right, backtrack. I just said they had type tags, and now I say there's only one type? The answer to that is one of terminology. From a theoretical perspective, if any slot can contain any value, there's only one type (like Java's Object).
Now OCaml is statically, strongly *and* implicitly typed. That means that it will check at compile time that there can be no type errors, and you don't even have to mention the types. Say you write a funtion in OCaml like this: let f x = 2 * x . Then it will derive for you that it has type Int -> Int. This is accomplished through a piece of higher math called Hindley-Milner type inferencing.
If you are interested in this sort of stuff, I strongly recommend Lambda the Ultimate (lambda.weblogs.com), which is always full of programming language reading material, both on a very abstruse theoretical and on a very practical level.