Interview with Jesper Louis Andersen about Erlang, Haskell, OCaml, Go, Idris, the JVM, software and…

This is part II of the interview with Jesper Louis Andersen. You can read part I here. This part of the interview is mostly about Erlang…

Interview with Jesper Louis Andersen about Erlang, Haskell, OCaml, Go, Idris, the JVM, software and…

Interview with Jesper Louis Andersen about Erlang, Haskell, OCaml, Go, Idris, the JVM, software and protocol design — PART II

This is part II of the interview with Jesper Louis Andersen. You can read part I here. This part of the interview is mostly about Erlang, one of my favorite languages. If you want to learn Erlang, take a look at Spawned Shelter, a website I made for Erlang newcomers.

What are the advantages of the Erlang VM over the JVM and vice versa?

From a perspective of history, the choice of building the BEAM VM for Erlang was the correct one. Massive concurrency was less on the radar for many people, and Ericsson needed a platform which they controlled. Furthermore, the BEAM can exploit it is executing functional languages only: the GC needs no generation forward set for instance.

The roll-your-own-design decision has proven to be very valuable, even though compared to the JVM, the OTP team is far smaller. My guess is that for every hour sunk into BEAM, there is at least 15–20 hours of work in the JVM. In turn, there are things which you cannot do efficiently on the BEAM. It is still (2015) bytecode interpreted and has no JIT, which means raw execution of computationally intensive tasks is about 10–20 times slower than typical well-written Java. Projects such as Quasar/Pulsar and Akka promises Erlang-style concurrency on the JVM, but they are recently developed whereas the BEAM has been in production for many years.

The key differing design criteria comes from the design space originating in Bjarne Däcker’s thesis. Most notably the soft real-time constraints and the need for seamless hardware interaction, but also the need for running very large software systems in which feature interaction is complex. It turns out in such a world that the major problems are rarely raw execution speed, but rather how parts of the system operate as a coherent whole. Many of the problems in the design space requires a different approach than sheer execution brute force, especially in a multicore world. Had fast execution been important, it would have been addressed a long time ago. But it turns out every major release of BEAM provides a far more important set of new features. There is a lot of power in having a large industrial company backing the system, as the new features tend to be operational/industrial in nature.

The key difference in implementation is that the BEAM is built from the ground up as a resource sharing system, much like in an operating system. In an OS, two processes A and B are isolated and gets a fair share of the resources. If B is badly written, this has considerably less impact on A. Suppose for instance B has bad GC productivity and allocates a lot. Then the GC of B has to run far more often and B has to pay: either in lower throughput, or worse latency. At the same time, A will keep on running, without B having any bad impact on its operation. The BEAM isolates resources in such a way that a B application cannot directly impact an A application from a resource standpoint.

The contrast are systems where one large shared heap is used. They hedge both A and B on the same GC heap, hoping it is fast enough to power through. But clearly, a badly written B can affect a well written A far more. It matters in very-large-scale development since you cannot hope every part of the system is perfectly written.

If the Erlang VM uses asynchronous I/O how does it do to present a normal api to the developer? Why aren’t callbacks needed like in Node.js?

Node.js uses a cooperative scheduling algorithm as seen in old operating systems such as MacOS 9, Windows 95 running legacy 16-bit code, MS-DOS and so on. The method, in which the program explicitly yields the CPU for the next task, has a number of advantages: it is easy to adapt existing languages and systems to the method. It is highly efficient in throughput. And it allows you to “pack” lots of work into a single process.

The weakness of the cooperative model is its fragility in server settings. If one of the tasks in the task queue monopolizes the CPU, hangs or blocks, then the impact is worse throughput, higher latency, or a deadlocked server. The fragility has to be avoided in large-scale systems, so the Erlang runtime is built preemptively. The normal code is “instrumented” such that any call which may block automatically puts that process to sleep, and switches in the next one on the CPU. Furthermore, Erlang being functional, any process must loop by calling functions. An internal funcall counter measures “reductions” and once 2000 of these has been used, the process is forced off the CPU and the next one is switched in. The process is entirely automatic and follows the modern idea of time-sharing in operating systems: AmigaOS, UNIX, Windows NT+, and so on.

Note that in Erlang there is no way to call blocking operations at all, since they are all handled by the runtime. This means the model is coherent and I can use any library I like without fear of it doing something bad. As an example, BEAM uses its own PCRE Regex library which cooperatively yields expensive NFA traversals. It also breaks up expensive crypto-computations, long-running GCs, and costly term serializations. It even includes a blocking monitor built-in so you can get notified if a monopolization has happened. Calls to foreign code written in C is harder though, since you have to take its blocking behavior into account.

You can get much of the same in any other language by using an appropriate framework. However, this breeds fragmentation: library code in one framework is not usable in another framework without adaptation. And code written for no framework has to be inspected for eventual blocking behavior. In turn, code is colored in different colors, and you can’t mix them.

A far better approach is seen in Quasar/Pulsar: take existing bytecode and rewrite it by instrumentation, much like in Erlang. You can insert forced preemptions into loops, and protect any potentially blocking call by a lightweight fiber switch. If you add your own I/O manager underneath, you almost have a full implementation of the Erlang model. But it takes a serious amount of work to get the lower parts to run fast in this model, since you have to reimplement all of the Erlang VM inside Java.

In practical term what do you gain from using a VM like the Erlang VM that has a preemptive scheduler?

Preemption is an exchange in which you sacrifice explicit control for gains in productivity. First, you don’t have to adapt existing code to your model, which saves time. In languages without proper abstraction barriers, calling the wrong function at the wrong time can block a scheduler. Second, in a post-DevOps world where you run your own systems, productivity is directly tied to the maintenance overhead of existing systems. The lower fragility of the model means you have more time to actually write new code rather than fix old problems. You can afford to let bad behavior live for longer in the system since its impact simply gracefully degrades the system a bit.

Most importantly however, you don’t have to worry about every nook and cranny of the code. Rarely run code can be written slightly less performance-oriented since its cost is automatically amortized over the whole of the program by the scheduler. And even in the critical paths, the smearing operation of the scheduler helps a lot to alleviate spikes. Long running background jobs can simply be started without much worry.

Another gain is had from the observation “N starts out small and then grows” which is to say that over time, a piece of software gets to handle gradually more and more data. Without a preemptive scheduler, this growth very often ends up having a considerable latency or throughput impact. Erlang systems errs on the side of maintaining good low-latency for slightly worse throughput in these situations. As load increases on a system, this is often the desirable behavior as it is less likely to affect other systems directly. The system simply degrades gracefully rather than abruptly.

You wrote:

In Haskell, Type Classes are often the abstraction way to go. In ML, the key is to use modules.

Don’t you miss type classes or ML modules in Erlang?

I do miss them a lot, but they both rely on efficient type worlds in order to function. Adding them to an untyped language like Erlang is going to be rather hard. A far more reasonable way to go at Erlang programs is to look inside the Elixir and Clojure communities. Clojure’s “procotols” would be a good addition I think, and it fits the Erlang design space far more. Another good place for inspiration is Scheme48’s implementation of modules, which took inspiration from Standard ML’s functors among other things. Scheme48 being a R5RS (Revised^5 Report on the Algorithmic Language Scheme) implementation also shares lots of its design space with Erlang.

It is often said a Haskell type class construction can be implemented with an ML module. The correspondence to Erlang would be to use a process. Processes forces code to modularize and isolate, since the communication can only be done through exchange of messages. Granted, it happens at runtime, but used correctly it yields emergent behaviors in the source code: the same module is executed by many isolated processes at runtime, and their interaction defines the behavior of the program. The literature on swarm intelligence provide more information in this area.

Is it possible to implement a rich static type system and also have message passing at the same time?

Yes it is! Most such systems introduces some kind of “box” which can be typed as in a “box containing type X”. The box is often a single-message mailbox, or a channel. The type system doesn’t even need to be rich for this to work out, as Go shows. It works because the box threads a type with it, so when you put things into the box or extract things from the box, you can statically reason about the type of the extracted/inserted value.

Haskell has several highly interesting communication models which enjoy rich typing. There is also John Reppy’s ConcurrentML, which takes communication a step further: not only are channels first class values which can be manipulated, events are first class. CML has an algebra for manipulating events, and combining them. Once you call sync on the event, then the scenario plays out. It is akin being able to have receive-clauses as values which you can join: sync join(Recv1, Recv2). The most mind-altering operation in Reppy’s calculus is the withNack which is an event firing if another event is picked. It is often used for cancellation situations.

Most of these models assume a non-distributed setting though. They assume that communication cannot fail, and that code cannot be upgraded as it is running. CMLs event resolution in a distributed setting is likely to run into the CAP theorem for instance. Most type systems assume total static knowledge/control, something you rarely have in a distributed system. Hence you have to verify interactions between distributed agents, which somewhat defeats the holistic aspect of static typing.

What do you think about monads in erlang? Are they useful? Have you tried out erlando?

I do use a monad construction in Erlang from time to time, when the code is going to flow better with a monad, but I rarely use tools such as erlando when doing so. The packages assumes a monad is a module whereas I often use a record for my monad work. To me, monads have always been a tool best used in special cases, and to understand formal logics. Personally, I much prefer the hybrid approaches of OCaml and Erlang, where the languages are imperative rather than the approach of Haskell and purity above all. The latter “forces” monads upon you in ways I don’t always appreciate.

The monad story in Erlang is somewhat weaker than I would like it to be. In Haskell, it is nice because the compiler can use the types to implicitly infer what monad is to be used and then use that monad. A lot of the power comes from the ability to interweave the monadic bind into your code. A Haskell weakness is when you have monad transformer stacks where your code has to change whenever you reorder to stack, but perhaps the work on Free’er monads by Kiselyov can amend this weakness. I feel the productivity is slightly less so in e.g., OCaml (4.02), which needs explicit reference to the monad in order to use it. In Erlang, simple monads are easy to comprehend. But once you raise a complex system out of abstraction, you really need a proper type system to guide you, which you don’t get in Erlang. In turn, Erlang code tends to be less abstraction heavy.

I don’t necessarily think it is a bad idea to have code which uses relatively few abstractions. My own code style, even in OCaml and Haskell, tend to rely on simple principles rather than high levels of abstraction. If you introduce many abstractions, you are also introducing the need for readers to have that knowledge. It is possible to go overboard here, and end up with code which only a few people in the world can read and understand. It might be very efficient and very elegant, but you won’t get many programmers looking at the code. Hence, I tend to use a couple of abstractions when they really help the code, but I refrain from using them in most other situations, preferring to rewrite code through simpler means.

What do you think about Idris Erlang?

Sam Elliot wrote a thesis on this subject[0], and I take that it is this work you are asking about. Among the contributions is a proof-of-concept Erlang-backend for Idris. It allows you to take Idris modules and compile them as Erlang modules. A proper backend would definitely need more work, and it should target Erlang’s Core rather than concatenating strings together to form an Erlang program. Yet, one has to remember what the goals of a thesis are, which is to say it is not about producing production quality backends for languages. But the thesis also explores another path which was perhaps a bit overlooked: how do we type message communication?

Looking at the Erlang message model, and squinting your eyes a lot, you get what is essentially IP/UDP communication of messages, of arbitrary size. If you know a Process ID, you know it’s “IP” and you have a capability to send that PID a message. You also have the property that message loss might occur but it is very unlikely unless something is going seriously wrong. Since the risk of failure is low, you can usually handle it by restarting from a known good state, rather than implementing complex recovery schemes.

This model is extremely flexible and you can build almost any kind of messaging on top of it. But on the other hand, generality weakens the meta-theoretic properties of any protocol: we can’t say anything about the well-formedness of messages.

What Elliot points out in his thesis is the notion of constraining our communication to certain limited patterns. By giving up the general messaging, we can suddenly define a typed world in which we understand what is going on. In turn, we can prove well-typed patterns enjoy desirable properties. With enough work, we may be able to capture essentially all the healthy patterns of communication, while rejecting the bad ones. If this could lead to removal of subtle concurrency/distribution mistakes, we have come a long way.

Another thesis I have to mention in passing is Simon Fowlers on monitoring communication patterns in Erlang/OTP[1]. Fowler explores the idea of monitoring messaging at runtime through an altered gen_server construct. One of his major observations is that many Erlang message patterns are not captured by a typical two-party session type, let alone a multi-party session type. In particular multi-party session types doesn’t allow for the dynamic introduction of new members nor for the termination of members while the session is ongoing. Fowler proposes several interesting solutions and paths in his thesis on how to adapt the system to handle these problems.

All in all, these observations seem to suggest we need some more research work in the area to understand the full interplay between a language such a Idris, and a highly concurrent ecosystem such as the Erlang BEAM VM. It would seem Erlang is a more generic fabric on top of which we could patch in restrictions which improves the quality of our systems through automatic fault removal. In any case, typing concurrent/distributed programming in a very general communication model such as Erlang seems to be very hard.

We also need to explore messaging patterns in Erlang which are typeable in some type system. Usually such patterns have additional desirable structure embedded in them because the type system enforces rigidity. By researching in this area, one may hope to improve the understanding of why certain patterns are used, or not.

[0] http://lenary.co.uk/publications/dissertation/

[1] http://simonjf.com/writing/msc-thesis.pdf

Since it is very common to use a pool of gen servers in Erlang: Wouldn’t it be a good idea for Erlang/OTP to have a default gen_pool implementation?

Erlangs tenet is “provide tools, not solutions”. That is, provide the tooling for building a pool, but don’t provide a pool. The reason has to do with the fact pools are not alike. Over the last 10 years I’ve come across perhaps 5 different pool implementations and the key observation is they are all valid such implementations. How to hand out resources from the pool differ: some do round-robin proxying. Some queue requests and hand them out in FIFO order. Some uses LIFO order. Some are distributed Job-Idle Queue implementations, whereas some can only run on a single machine. Some provide automatic health checks and reconnections. Some block the caller when there is no work, others errors, and some queues the caller for a while, before erroring.

The other observation is how such pools handle failure, reclaim stale resources, handle errors in the pool implementation itself etc. Again, it is not clear what the implementation should be, and what would be the correct operation. It is mostly a function of context in which the pool is used.

My experience is that this problem space requires either chaos monkeys, concuerror or QuickCheck+PULSE to remove faults, and it is hard to get entirely right. But such tools require a specification of “correct operation”, and there are several such specifications.

Had Erlang/OTP provided a default gen_pool implementation, then people would use it even if it doesn’t fit their problem space. This tend to create subtle hard-to-find faults.