Services: Data Mining Project Assessment, Data Preparation For Data Mining, Data Mining Model Development, Data Mining Model Deployment, Data Mining Course: Overview for Project Managers, Data Mining Course: Overview for Practitioners, Customized Data Mining Engagements
Insight 1: Find Correlated Variables Prior to Modeling Topic: Data Understanding and Data Preparation Sub-Topic: Feature Selection Insight 2: Beware of Outliers in Computing Correlations Topic: Data Preparation Sub-Topic: Outliers Insight 3: Create Three Sampled Data Sets, not Two Topic: Modeling Sub-Topic: Sampling Insight 4: Use Priors to Balance Class Counts Topic: Modeling Sub-Topic: Decision Trees Insight 5: Beware of Automatic Handling of Categorical Variables Topic: Data Understanding and Data Preparation Sub-Topic: Feature Selection and Creation Insight 6: Gain Insights by Building Models from Several Algorithms Topic: Modeling Sub-Topic: Algorithm Selection Insight 7: Beware of Being Fooled with Model Performance Topic: Data Evaluation Sub-Topic: Model Performance
Upcoming Data Mining Seminars A Practical Introduction to Data Mining Upcoming courses (nationwide) Data Mining Level II: A drill-down of the data mining process, techniques, and applications Data Mining Level III: A hands-on day of data mining using real data and real data mining software Anytime Courses Overview for Project Managers: Train project managers on the data mining process. Overview for Practitioners: Train practitioners (data analysts, project managers, managers) on the data mining process.
Mr. Abbott is a seasoned instructor, having taught a wide range of data mining tutorials and seminars for a decade to audiences of up to 400, including DAMA, KDD, AAAI, and IEEE conferences. He is the instructor of well-regarded data mining courses, explaining concepts in language readily understood by a wide range of audiences, including analytics novices, data analysts, statisticians, and business professionals. Mr. Abbott also has taught applied data mining courses for major software vendors, including Clementine (SPSS), Affinium Model (Unica Corporation), Model 1 (Group1 Software), and hands-on courses using S-Plus and Insightful Miner (Insightful Corporation), and CART (Salford Systems).
By Abbott Analytics
By External Sources
The Cartoon Guide to Statistics
by Larry Gonick, Woollcott Smith (Contributor)
Paperback - 240 pages, February 25, 1994.
I like cartoons, and this book provides a good, easy, and visual way to learn statistics. For those who are new to statistics, this is a good read to understand the basic ideas.
How to Lie With Statistics
by Darrell Huff, Irving Geis (Illustrator)
Paperback reissued November, 1993.
A classic from the 50s, and still relevant today. You'll look at newspaper graphics the same again after learning some of these deceptive practices!
Data Preparation for Data Mining
by Dorian Pyle
Paperback - 540 pages, March 15, 1999.
Excellent resource for the part of data mining that takes the most time. If I were to buy one data mining book, this would be it. Best book on the market for data preparation.
Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management
by Michael J. A. Berry, Gordon S. Linoff
Paperback - 672 pages April 5, 2004.
Good overview of data mining from the CRM perspective. Better for "big picture" people than for technical analysts.
Neural Networks for Pattern Recognition
by Christopher M. Bishop
Paperback. November, 1995.
Excellent book for neural network algorithms, including some lesser known varieties. Described as "Best of the best" by Warren Sarle (Neural Nework FAQ).
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
by Eibe Frank, Ian H. Witten, Jim Gray
Paperback - 416 pages. October 13, 1999.
Best book I've found in between highly technical and introductory books. Good coverage of topics, especially trees and rules, but no neural networks.
Pattern Recognition and Neural Networks
by Brian D. Ripley, N. L. Hjort (Contributor)
Hardcover. October, 1995.
Ripley is a statistician who has embraced data mining. This book is not just about neural networks, but covers all the major data mining algorithms in a very technical and complete manner.
Sarle calls this the best advanced book on Neural Networks, and I almost agree (see Hastie, Tibsharani, and Friedman).
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)
by Trevor Hastie, Rob Tibsharani, Jerome Friedman
Hardcover. 2001.
By 3 giants of the data mining community, I have read most of the book and can't think of a significant conclusion I disagree with them on. Very technical, but very complete. Topics covered in this book not usually covered in others such as kernel methods, support vector machines, principal curves, and many more. Has become my favorite technical DM book.
Book has 200 color figures/charts—first data mining book I've seen that makes use of color, and this book does it right
Health Club Survey Analysis, Part I: Successful application of data mining by Abbott Analytics
TDWI Data Science Bootcamp Seminar (Austin, TX / Virtual Classroom): September 20 - 22, 2021
PAW for Business (Virtual Classroom): May 20 - 25, 2021
Vafaie, H., D.W. Abbott, M. Hutchins, and I.P. Matkovsky, Combining Multitple Models Across Algorithms and Samples for Improved Results (PDF), The Twelfth International Conference on Tools with Artificial Intelligence, Vancouver, British Columbia, November 13-15, 2000.