Virtually the best blog on the web!
Open Source
Best Resources to Learn about Linux Kernel Internals
Feb 7th
The Source
The best resource is the kernel source.
- I prefer reading the cross-referenced lxr, which makes it easy to follow the code.
- Linux Networking Subsystem: Desktop Companion to the Linux Source Code. A commentary Linux networking code.
- The Linux Kernel & the File System Subsystem : An Architectural Overview. Talks about VFS in the Linux.
- Code Commentary On The Linux Virtual Memory Manager by Mel Gorman.
Books
Obviously, it’s not that easy to dive into thousands of lines of code. I suggest starting with reading the books explaining Linux kernel in general.
- Understanding Linux Kernel. This is one of the first books to provide in-depth explanations. Vastly improved over multiple editions, the current one is a very good read.
- Robert Loves’ book. Love is a core developer, who implemented pre-emptive kernel and other features. I haven’t personally read this, but I have seen good reviews of this.
- If you are specifically looking for networking aspects, this is an excellent book on understanding linux networking internals.
- For device drivers, similarly, this book is very useful.
Other resources
- LWN’s kernel page has lots of great articles explaining kernel internals
- TLDP’s TLK. Somewhat outdated, but useful.
- Linux Journal’s KernelKorner has some excellent articles, most of which are freely available online.
- The Linux Kernel Hackers’ Guide, compiled by Michael K. Johnson of Red Hat fame. Includes among other
documents selected Q/A’s from the linux-kernel mailing list.
HOWTOs
- The LinuxKernel HOWTO by Brian Ward.
- A completely new Kernelhacking-HOWTO at http://www.kernelhacking.org/.
- Various Kernel HOWTOs on specific questions, such as the BogoMips mini-HOWTO by Wim van Dorst.
- Various Linux HOWTOs at TLDP
Lists of links
- There’s a huge index of links at http://jungla.dit.upm.es/~jmseyas/linux/kernel/hackers-docs.html that has links to many resources.
- Another list of links maintained by Chris Gould. Mostly unsorted
Firefox + Google = Evil ?
Jun 6th
Firefox released a 3.0 sneak peek version, and I was happy to get the latest incarnation. I see quite a few UI improvements, and I believe they improved the performance as well. But, I have a big gripe: Why does Firefox come with Google as the default search provider? Second problem: Why doesn’t the list of search providers include Microsoft’s Live search? (see left Figure). 80% of FF money comes from Google. No wonder, FF loves Google!
Google prides on the “Do no evil” motto, and I find it peculiar for Firefox to not include Live search in the first 5 search providers. On the other hand, MS finally left its monopoly practices and allows you to pick your search provider. Live search is not the default search btw! You have to pick one, when you install XP/Vista. Also, Google is one of the top options.
Come on FF, you guys are open source and do the right thing
- Let the user select the search provider
- Add Live search as one of the options along with other popular search providers like AOL search and Ask.
This actually shows a growing problem in high profile open source projects. Gone are the days when Linux kernel was developed by people who were never paid for the work they were doing. Today, all the high profile open source projects, Linux kernel, KDE, Gnome, Firefox, OpenOffice are funded by many companies. I wonder how many decisions in the kernel are influenced by IBM and Redhat!


Finding the Absolute Path of a Running Process
Oct 26th
A friend of mine recently asked me about finding the absolute command path of a process given the pid. Ofcourse, it’s very easy to do it for a particular OS. Doing it platform independently is actually a little tough. There’s no straight-forward POSIXy way of doing this (as far as I know). One can certainly do some /proc magic, but that won’t be portable. My suggestion was to just use ps. This works for both Solaris and Linux. So, you get the output from
ps -p <pid> -f UID PID PPID C STIME TTY TIME CMD draganm 17198 17193 0 20:40:40 ? 0:00 csh -c /usr/lib/ssh/sftp-server
and simply do some string manipulation to get the required string (shown in bold above). Any other ideas folks?
Woes after Killing rpm/yum
Sep 14th
Don’t ever kill rpm/yum, while it’s running, especially when it’s doing that rpmdb transaction. rpm won’t release the lock and leaves some temporary files probably related to the transaction. How dumb? How difficult is it to write a signal handler, that cleans up the crap? I ran into this problem quite often. I start rpm or yum and do an update or something, and then it freezes forever. After losing my patience, I kill it, and I can’t use the rpm tools anymore. The fix: simple, use brute-force
rm -rf /var/lib/rpm/__db* rpm --rebuilddb
Network Programming in the Kernel – my Linux Journal Article
Aug 29th
Ok, it’s official. Grab the latest copy of LJ to read our article about network programming in the kernel. Ravi and I explained in pretty good detail on how to create a network connection and read/write data from sockets in the kernel-mode. The sample code shows a basic FTP client that connects to a given IP address and downloads a file. The code can be downloaded from LJ FTP site.
There is, ofcourse always a debate on whether this should be done in the kernel, but I think there are situations, where you may want to do this. I have explained a few reasons in the kernelnewbies thread.
I think the article will be freely available online in LJ archives after a few months. Until then, you can devour the code at LJ download site, if you are not subscribed to LJ.
Is TLDP dying?
Aug 26th
TLDP, if you don’t know already is The Linux Documentation Project. It’s one of my favourite OpenSource projects, as I grew up with it. In the dark ages, when we had to muddle with XF86Config to get X working, TLDP had some cool HOWTOs to help newbies. I fondly remember the days when I scrolled through the HOWTOs in Lynx. I have learnt a lot by reading the HOWTO/Guides, and it compelled me to contribute a HOWTO for NCURSES.
Ok, back to the topic. Recently, there was a discussion about the lack of author response, and outdated HOWTOs on the TLDP discuss mailing list. As always, somebody proposed to setup a Wiki to correct all the problems, and had the audacity to say tldp might be dying. Natually, this started a flame war with others pointing out that this has been discussed at lenght many times, and it’s tough to find solutions. I have contributed my share of the flames
and posted a summary of the Wiki war. Stein Gjoen also did a review of Wiki, and concluded that Wiki is not yet ready for TLDP.
In my opinion, TLDP is a great resource, and even though some HOWTOs are painfully out-dated, there is still a lot of documentation that is of high-quality and well-maintained. Having a Wiki might help to bring a few HOWTOs back from dead as mentioned on the discussion list, but it won’t be a replacement for current state of affairs.
Checkpointing and Prelinking
Jul 29th
I am using blcr (a kernel level checkpointing facility) for my research and it’s quite cool. You don’t have to modify your application, and it’s used as follows
cr_run my_app cr_checkpoint --term my_app_pid # creates a contex.pid file and kills the process cr_restart context.pid # viola ! start from where it was checkpointed
In theory, you can move the checkpoint files to another machine with the same kernel, but I was experiencing segfaults. I contacted one of the developers Paul (He is a nice guy, he’s been helping me a lot with patches etc.), and he told me that the libraries on both machines should be the same and should load to the same addresses. The problem is that prelink is messing up the libraries, and obviously the libraries on both machines were not the same. So, I ran a prelink -u, to undo the prelinking, and checkpoint/restart is working !
Btw, I am working on a fault-tolerant (ft) scheduler for large-scale systems. It’s still a pipe dream. Currently, I have blcr integrated with Torque+Maui, and I can checkpoint/restart serial jobs with my ft scheduler. I am working on mpi jobs and hopefully, I will have some stuff out to the public by end of september.
The UNIX Haters Handbook
Jul 22nd
To all those people who looked at me weirdly, when I said UNIX is cool, here’s your chance to fight back
UNIX Hater’s Handbook Enjoy ! The book basically captures the eternal conflict between “Worse is Better” and “Do the Right Thing” also known as New Jersey vs MIT paradigms.
Btw, this is a pretty old handbook and most of the gripes are fixed long ago.
A few amusing tidbits below.
Mail version SMI 4.0 Sat Apr 9 01:54:23 PDT 1988 Type ? for help. "/usr/spool/mail/chris": 3 messages 3 new >N 1 chris Thu Dec 22 15:49 19/643 editor saved “trash1” N 2 root Tue Jan 3 10:35 19/636 editor saved “trash1” N 3 chris Tue Jan 3 14:40 19/656 editor saved “/tmp/ma8” & ? Unknown command: "?" & fs2# add_client usage: add_client [options] clients add_client -i|-p [options] [clients] -i interactive mode - invoke full-screen mode [other options deleted for clarity] fs2# add_client -i Interactive mode uses no command line arguments
About the source,
/* You are not expected to understand this */
originally appeared in UNIX V6.
If a mail is deferred, the errorr is
Mail Queue (1 request)
--QID-- --Size-- -----Q-Time----- --------Sender/Recipient--------
AA12729 166 Thu Mar 26 15:43 borning
(Deferred: Not a typewriter)
bnfb@csr.uvic.ca
I can only imagine how the user must have felt.
About X,
If the designers of X Windows built cars, there would be no fewer
than five steering wheels hidden about the cockpit, none of which followed
the same principles—but you’d be able to shift gears with your
car stereo. Useful feature, that.
Marcus J. Ranum
Digital Equipment Corporation
I just couldn’t help laughing when I read this,
A shining example is Sun’s Open Windows File Manager, which goes out of its way to display core dump files as cute little red bomb icons. When you double-click on the bomb, it runs a text editor on the core dump. Harmless, but not very useful. But if you intuitively drag and drop the bomb on the DBX Debugger Tool, it does exactly what you’d expect if you were a terrorist: it ties the entire system up, as the core dump (including a huge unmapped gap of zeros) is pumped through the server and into the debugger text window, which inflates to the maximum capacity of swap space, then violently explodes, dumping an even bigger core file in place of your original one, filling up the file system, overwhelming the file server, and taking out the File Manager with shrapnel. (This bug has since been fixed.) ....
About C and C++,
Q. Where did the names “C” and “C++” come from?
A. They were grades.
Jerry Leichter
First Impressions of Open Solaris
Jul 10th
I have decided to nuke my Windows XP partition (I haven’t used Windows in last two months) and install Open Solaris. Open Solaris requires a Solaris distribution to be already installed, so as per the instructions, I downloaded Solaris xpress community release and burnt the CDs.
Initial Blues
I booted with the first CD and was surprised at how sluggish the installation was. I think Solaris was scanning for various devices. Unfortunately, none of my three ethernet cards were configured. The installation options were quite simple. When it asked me for networking, I checked the box ‘enable’. I thought it will just setup the hostname etc., but Solaris went into what looked like an infinite loop probably trying to bring the interfaces up. I gave up and rebooted my machine.
Solaris doesn’t like Linux fdisk partitions
So, I skipped the network setup and was at the partition setup. Solaris was not able to recognize the paritions created by Linux and tried to take the whole hard disk. I was initially confused, booted into Linux and looked for information on the net. I couldn’t find any specific reasons except that Solaris fdisk is a bit wierd. Solaris fdisk probably got confused with the number of paritions (13 in total) I had.
The installation
Finally, I have decided to just install it on a new disk and went on with the installation. Installation was pretty smooth and I chose to install everything. I booted Solaris and X is setup with the wrong resolution (800×600) and was greeted with CDE.
Grub problems
Solaris installed Grub on my second hard disk and when I tried to multi-boot using my existing Linux partitions on the first disk, booting failed. Grub kept on saying that No such partition (hd0, 0, a). After a bit of searching, I found various resources on Solaris multi-booting. However, my problem was a little wierd. When I installed Solaris, I setup a partition on the second hard disk as the active partition as Solaris was complaining about active partitions. As a result, Solaris Grub conf had (hd0, 0) (pointing to the second hard disk actually) in it. When I booted using Linux Grub conf after setting the /boot as the active partition, the (hd0, 0) no longer pointed to my second hard disk. I fixed the /boot/grub/menu.lst in Solaris and everything is working fine.
Conclusion
Well, you can’t do much without networking. I wanted to install Open Solaris, but downloading it in Linux and mounting the partitions etc. is full of pain. I have already spent/wasted four hours doing all this. I was really unimpressed at the long wait at the start of the installation while probing the devices. Fedora core 3 takes 20 minutes to install everything on my 3GHz machine and everything is configured without a hitch. I checked Sun’s hardware compatibility list and didn’t find my network cards. They have a long way to go, if they want more people to start using it. I would like to do some hacking in Open Solaris, but somebody has to write a driver for my network card. May be, I will
GPL vs CDDL – what’s the real story?
Jul 7th
There seems to be a lot of confusion and FUD surrounded around Sun’s CDDL. Ulrich Drepper of Redhat (glibc maintainer) equated Open Solaris to a trap. There are rebuttals and explanations.
There are two prevalent themes from these opposing camps. CDDL proponents say it’s not the CDDL that won’t play with GPL, it’s GPL that won’t play with CDDL and GPL proponents are saying it’s incompatible with GPL, end of story. So, what’s wrong with CDDL? or more importantly what can we gain from it? Here are my observations
- The CDDL, ofcourse, is incompatible with GPL. What that means is that you cannot rip-off Open Solaris code and put it in Linux kernel. This is due to GPL’s restriction (the infamous viral nature of GPL).
- On the other side, CDDL code can be combined with other non-GPL licenses (for example BSD license). This is because CDDL is file-based. So, you can have part of your code licensed under CDDL and have other files licensed under another license. You can even have the other files as proprietary. That’s exactly what Sun wanted as they want to keep some code (like zfs) proprietary. Note that the modifications you do to files licensed under CDDL have to be open sourced.
- The IP protection from CDDL is interesting. Some discussion about this on TechTarget. I am not sure how useful this is, but it adds an interesting twist though.
- But, why should we use GPL? Various reasons are discussed in detail by David Wheeler. The cautionary tale of XFree86 is a must read.
- Meanwhile, people are worried about the proliferation of OSI licenses. CA is trying for a single template license, but this has its own problems.
In my opinion, Sun got what it wanted with CDDL. They cleverly prevented the rip-off of Solaris code, but the CDDL is not going to help in bringing open source community to Open Solaris.