Updates

Latest Tweet



What's New?

Check out for latest innovation, a computer based training video collection


Like this Page

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition Review by E. Kohlwey

Excellent Beginning Text for Software Engineers

I chose this book after looking at a number of options. I was not disappointed. The text is clearly written for individuals with an bachelor-level education in computer science. The author prefers pseudocode and text explanations of algorithms to equations, and when he does use equations they use clear, commonly understandable notation rather than the terse greek alphabet soup preferred by many of the more mathematically oriented authors.

It should be pointed out that about 10% of the text of this book is devoted simply as a user manual for an open source MLA package called Weka. When I first realized this I almost flipped; I really didn't want a book that was devoted to gaining a surface understanding of a particular implementation of a set of algorithms. After reading through, I can tell you it is not. All the algorithms are explained well enough that you could implement them and work out simple examples on paper.

I should note also that Weka, as well as a lot of the algorithms in this book, don't parallelize well (or obviously). This is an excellent point to get your feet wet and do some exploratory analysis, but if you're past that point and want to learn about crunching big (TB+) data you should look elsewhere.

One area that the text does not cover (and, for many software engineers this is not a fault) is some of the mathematics behind some of the algorithms the author proposes. For instance, in the author's description of linear regression using SGD he glosses over the math of actually calculating the gradient by saying "there's a matrix inversion involved and its available in prepackaged software." I'm not saying this is bad, because if you're a good software engineer the first thing you'll do it look for an existing implementation that you can alter to fit your needs, so he's right. It just may not be what mathematicians or more theory-oriented computer scientists expect.