The cake is a lie

Andy Tanenbaum and his Microkernel

I attended a lecture by Andy Tanenbaum last week, following is my summary and criticism of his ideas.

Andy talked about his baby OS called Minix. It’s Unix-based but it has a microkernel, instead of a monolithic kernel that Linux employs. So, the philosophy of the microkernel seems to be that the kernel should be as minimalistic as possible. Andy wants to reduce the codebase in kernel space to avoid bugs (less code = less bugs, surely a fallacy). So, instead of having things like drivers and modules in kernel space (where the code has virtually unlimited control over the system), it’ll be moved to user-space with some fancy messaging protocol to communicate with kernel-land and administrator-set access limitation per-module. So, if your videocard driver has a bug, it can’t crash the whole system because it doesn’t have that much power. At best, it can crash its own data in userland. This makes the system much more secure, but it also creates some overhead (Andy claims it’s insignficant, but it’s yet another fallacy), of about 6% to 14%, depending on the procedure. A lot of the overhead is constant-time, so it’s not so bad with more complicated procedures.

The neat thing about Minix is that it has a “resurrection server” which looks over all the modules and drivers and decides when to kill them (if they’re in an infinite loop or somesuch), or when to resurrect them (if they died). And you can do little shell scripts to tell it how to behave for each module. So if your ethernet driver chokes on a packet and dies, the resurrection server can resurrect it without the loss of a lot of bandwidth. Benchmark showed impressive results like a loss of only 30% of bandwidth if it dies every 1 second, and almost no bandwidth loss at 10 seconds. This is a great idea as a backup system (like for the Mars rover or somesuch), but at the same time, I think this would promote administrators and driver developers to be more lazy. Why make stable drivers when the resurrection server can just reload it if it dies, right? (I realize this is a fallacy as well, but a reality nonetheless.)

So, how small is this microkernel? Andy predicts that a fully functional version of the kernel will be ~7,000 lines of code (in comparison, there’s ~6 million lines of code in the linux kernel 2.60). It’s not that 5.99 million lines of code disappear, they’re simply moved over to userspace. I have a few problems with this.

  1. Drivers: If it’s all in user-space, and access is limited by the administrator, how much more of a hassle will it be to install all of your drivers if you have to manually specify which system calls each driver is allowed to use? Wont this encourage administrators to just allow everything to all drivers (Windows syndrome)? I suppose some installation procedures could be written to handle this and “guess” the required system calls, but still unreliable. Hard to automate.
  2. Diminishing the power of bugs: The easiest bugs to squash are the ones that are right up in your face every step of the way. The hardest are the ones that are unnoticeable and hard to find. Moving bugs to userland and automating recover (via resurrection server) makes bugs a lot less noticeable, a lot less likely to get fixed. Hiding bugs isn’t the solution to good code. Surely this is a good idea, if not a necessity, for mission-critical situations like the Mars rover (where I’m sure similar systems exist). But for average usage, this may prove to be counterproductive to development of good code.
  3. Andy’s argument for application: He kept saying that the overhead created by this system is insignificant due to the virtually unlimited resources that desktops have today. As computer scientists, we would naturally reject that argument for the sake of optimizations, but let’s play along. Towards the end of the lecture, Andy listed the applications to be things like embedded devices, cellphones, minimalistic devices (third-world laptops), and mission-critical devices (Mars rover example, again). I completely agree with this list of applications (as stated in argument #2), but: These devices don’t have unlimited resources. They are very limited, and such overhead would definitely be something that needs to be considered. This is clearly a faulty argument

Argument holes aside, it’s a “good” system. It’s definitely something that needs to be considered for the mainstream, but with large grains of salt. Hats off to Andy Tanenbaum for being one of the most productive people I know of, as this is all one of his many hobbies while he teaches Computer Science at Vrije Universiteit, Amsterdam.

No comments yet. Be the first.

Leave a reply

This is a captcha-picture. It is used to prevent mass-access by robots. (see: www.captcha.net)

You must read and type the 5 chars within 0..9 and A..F, and submit the form.

  

Oh no, I cannot read this. Please, generate a