Apache Mahout Cookbook Review

Initially I’ve posted this on Goodreads  but I think it’s worth cross posting here :).

I have recently finished reading Apache Mahout Cookbook  so I can say a thing or two about it now.

The first thing to notice is that the introduction covers in detail how to install and run Mahout. Also, it gives a kinda shallow overview of Hadoop that the reader, in my opinion, should already know about. Skipping that might make some of file manipulation and steps look like magic. Overall, it gives enough to start off with your very first Mahout run.

Chapter 3 covers some info on how to import data to Hadoop and Mahout from external sources. Again, nothing really special and that’s the stuff you would be able to google out in a few mins but it is very convenient for a reader who might not know what to google for :).

Chapters 1 to 3 focused on setup and infrastructure in quite a detailed way so probably most first time Linux/Hadoop/Mahout users will find that VERY useful. More experienced Hadoop or Linux users might want to skip that right away. The rest of the book covers actual Machine Learning stuff for Mahout.

Regarding Machine learning algos presented in the book, I doubt that there can be anything special said about them. They are standard approaches used widely in industry. What makes their presentation in the book nice is that Java based and command line ways to do that are always shown. That’s very convenient. What I also found very nice, was references to papers from some of the approaches which are very useful for more advanced users who might want to read about the method in depth.

What I didn’t like in the book, were overly detailed examples like a wizard for class creation. Even more frustrating, this was presented in screenshots multiple times and I find that just a waste of space. It would have been better if that was just described somewhere in an appendix. I would expect that even beginners with Mahout should be able to create a class by themselves without any instructions.

Some of the code isn’t really presented nicely and becomes hard/annoying to read.

To conclude, I found this book useful and helpful. It might not reveal the details behind the Machine Learning approaches used but that isn’t really a goal of the book (actually to do that, you would need to buy a few 500 page books). Recipes themselves in the book I found clear and useful, though sometimes more practical examples could be given.

Book is good for an aspiring Mahout user. Some irritating stuff along the way but explanations are good and examples are ready to use - hence cookbook. Not a book to cover machine learning in depth.