Fr. 185.00
Peter Bruce, Peter C Bruce, Peter C. Bruce, Kenneth C. Lichtendahl, Nitin R. Patel, G Shmueli...
Data Mining for Business Analytics Concepts, Techniques, and - Applications in
Englisch · Fester Einband
Versand in der Regel in 1 bis 3 Wochen (kurzfristig nicht lieferbar)
Beschreibung
Informationen zum Autor Galit Shmueli, PhD, is Distinguished Professor at National Tsing Hua University's Institute of Service Science. She has designed and instructed data mining courses since 2004 at University of Maryland, Statistics.com, Indian School of Business, and National Tsing Hua University, Taiwan. Professor Shmueli is known for her research and teaching in business analytics, with a focus on statistical and data mining methods in information systems and healthcare. She has authored over 70 publications including books.Peter C. Bruce is President and Founder of the Institute for Statistics Education at Statistics.com. He has written multiple journal articles and is the developer of Resampling Stats software. He is the author of Introductory Statistics and Analytics: A Resampling Perspective (Wiley) and co-author of Practical Statistics for Data Scientists: 50 Essential Concepts (O'Reilly).Inbal Yahav, PhD, is Professor at the Graduate School of Business Administration at Bar-Ilan University, Israel. She teaches courses in social network analysis, advanced research methods, and software quality assurance. Dr. Yahav received her PhD in Operations Research and Data Mining from the University of Maryland, College Park.Nitin R. Patel, PhD, is Chairman and cofounder of Cytel, Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University. He is a Fellow of the Computer Society of India and was a professor at the Indian Institute of Management, Ahmedabad, for 15 years.Kenneth C. Lichtendahl, Jr., PhD, is Associate Professor at the University of Virginia. He is the Eleanor F. and Phillip G. Rust Professor of Business Administration and teaches MBA courses in decision analysis, data analysis and optimization, and managerial quantitative analysis. He also teaches executive education courses in strategic analysis and decision-making, and managing the corporate aviation function. Klappentext Data Mining for Business Analytics: Concepts, Techniques, and Applications in R presents an applied approach to data mining concepts and methods, using R software for illustrationReaders will learn how to implement a variety of popular data mining algorithms in R (a free and open-source software) to tackle business problems and opportunities.This is the fifth version of this successful text, and the first using R. It covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, recommender systems, clustering, text mining and network analysis. It also includes:* Two new co-authors, Inbal Yahav and Casey Lichtendahl, who bring both expertise teaching business analytics courses using R, and data mining consulting experience in business and government* Updates and new material based on feedback from instructors teaching MBA, undergraduate, diploma and executive courses, and from their students* More than a dozen case studies demonstrating applications for the data mining techniques described* End-of-chapter exercises that help readers gauge and expand their comprehension and competency of the material presented* A companion website with more than two dozen data sets, and instructor materials including exercise solutions, PowerPoint slides, and case solutions www.dataminingbook.comData Mining for Business Analytics: Concepts, Techniques, and Applications in R is an ideal textbook for graduate and upper-undergraduate level courses in data mining, predictive analytics, and business analytics. This new edition is also an excellent reference for analysts, researchers, and practitioners working with quantitative methods in the fields of business, finance, marketing, computer science, and information technology."This book has by far the most comprehensive review of business analytics methods that I h...
Inhaltsverzeichnis
ContentsForeword by Gareth James xixForeword by Ravi Bapna xxiPreface to the R Edition xxiiiAcknowledgments xxviiPART I PRELIMINARIESCHAPTER 1 Introduction 31.1 What Is Business Analytics? 31.2 What Is Data Mining? 51.3 Data Mining and Related Terms 51.4 Big Data 61.5 Data Science 71.6 Why Are There So Many Different Methods? 81.7 Terminology and Notation 91.8 Road Maps to This Book 11Order of Topics 11CHAPTER 2 Overview of the Data Mining Process 152.1 Introduction 152.2 Core Ideas in Data Mining 16Classification 16Prediction 16Association Rules and Recommendation Systems 16Predictive Analytics 17Data Reduction and Dimension Reduction 17Data Exploration and Visualization 17Supervised and Unsupervised Learning 182.3 The Steps in Data Mining 192.4 Preliminary Steps 21Organization of Datasets 21Predicting Home Values in the West Roxbury Neighborhood 21Loading and Looking at the Data in R 22Sampling from a Database 24Oversampling Rare Events in Classification Tasks 25Preprocessing and Cleaning the Data 262.5 Predictive Power and Overfitting 33Overfitting 33Creation and Use of Data Partitions 352.6 Building a Predictive Model 38Modeling Process 392.7 Using R for Data Mining on a Local Machine 432.8 Automating Data Mining Solutions 43Data Mining Software: The State of the Market (by Herb Edelstein) 45Problems 49PART II DATA EXPLORATION AND DIMENSION REDUCTIONCHAPTER 3 Data Visualization 553.1 Uses of Data Visualization 55Base R or ggplot? 573.2 Data Examples 57Example 1: Boston Housing Data 57Example 2: Ridership on Amtrak Trains 593.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots 59Distribution Plots: Boxplots and Histograms 61Heatmaps: Visualizing Correlations and Missing Values 643.4 Multidimensional Visualization 67Adding Variables: Color, Size, Shape, Multiple Panels, and Animation 67Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering 70Reference: Trend Lines and Labels 74Scaling up to Large Datasets 74Multivariate Plot: Parallel Coordinates Plot 75Interactive Visualization 773.5 Specialized Visualizations 80Visualizing Networked Data 80Visualizing Hierarchical Data: Treemaps 82Visualizing Geographical Data: Map Charts 833.6 Summary: Major Visualizations and Operations, by Data Mining Goal 86Prediction 86Classification 86Time Series Forecasting 86Unsupervised Learning 87Problems 88CHAPTER 4 Dimension Reduction 914.1 Introduction 914.2 Curse of Dimensionality 924.3 Practical Considerations 92Example 1: House Prices in Boston 934.4 Data Summaries 94Summary Statistics 94Aggregation and Pivot Tables 964.5 Correlation Analysis 974.6 Reducing the Number of Categories in Categorical Variables 994.7 Converting a Categorical Variable to a Numerical Variable 994.8 Principal Components Analysis 101Example 2: Breakfast Cereals 101Principal Components 106Normalizing the Data 107Using Principal Components for Classification and Prediction 1094.9 Dimension Reduction Using Regression Models 1114.10 Dimension Reduction Using Classification and Regression Trees 111Problems 112PART III PERFORMANCE EVALUATIONCHAPTER 5 Evaluating Predictive Performance 1175.1 Introduction 1175.2 Evaluating Predictive Performance 118Naive Benchmark: The Average 118Prediction Accuracy Measures 119Comparing Training and Validation Performance 121Lift Chart 1215.3 Judging Classifier Performance 122Benchmark: The Naive Rule 124Class Separation 124The Confusion (Classification) Matrix 124Using the Validation Data 126Accuracy Measures 126Propensities and Cutoff for Classification 127Performance in Case of Unequal Importance of Classes 131Asymmetric Misclassification Costs 133Generalization to More Than Two Classes 1355.4 Judging Ranking Performance 136Lift Charts for Binary Data 136Decile Lift Charts 138Beyond Two Classes 139Lift Charts Incorporating Costs and Benefits 139Lift as a Function of Cutoff 1405.5 Oversampling 140Oversampling the Training Set 144Evaluating Model Performance Using a Non-oversampled Validation Set 144Evaluating Model Performance if Only Oversampled Validation Set Exists 144Problems 147PART IV PREDICTION AND CLASSIFICATION METHODSCHAPTER 6 Multiple Linear Regression 1536.1 Introduction 1536.2 Explanatory vs. Predictive Modeling 1546.3 Estimating the Regression Equation and Prediction 156Example: Predicting the Price of Used Toyota Corolla Cars 1566.4 Variable Selection in Linear Regression 161Reducing the Number of Predictors 161How to Reduce the Number of Predictors 162Problems 169CHAPTER 7 k-Nearest Neighbors (kNN) 1737.1 The k-NN Classifier (Categorical Outcome) 173Determining Neighbors 173Classification Rule 174Example: Riding Mowers 175Choosing k 176Setting the Cutoff Value 179k-NN with More Than Two Classes 180Converting Categorical Variables to Binary Dummies 1807.2 k-NN for a Numerical Outcome 1807.3 Advantages and Shortcomings of k-NN Algorithms 182Problems 184CHAPTER 8 The Naive Bayes Classifier 1878.1 Introduction 187Cutoff Probability Method 188Conditional Probability 188Example 1: Predicting Fraudulent Financial Reporting 1888.2 Applying the Full (Exact) Bayesian Classifier 189Using the "Assign to the Most Probable Class" Method 190Using the Cutoff Probability Method 190Practical Difficulty with the Complete (Exact) Bayes Procedure 190Solution: Naive Bayes 191The Naive Bayes Assumption of Conditional Independence 192Using the Cutoff Probability Method 192Example 2: Predicting Fraudulent Financial Reports, Two Predictors 193Example 3: Predicting Delayed Flights 1948.3 Advantages and Shortcomings of the Naive Bayes Classifier 199Problems 202CHAPTER 9 Classification and Regression Trees 2059.1 Introduction 2059.2 Classification Trees 207Recursive Partitioning 207Example 1: Riding Mowers 207Measures of Impurity 210Tree Structure 214Classifying a New Record 2149.3 Evaluating the Performance of a Classification Tree 215Example 2: Acceptance of Personal Loan 2159.4 Avoiding Overfitting 216Stopping Tree Growth: Conditional Inference Trees 221Pruning the Tree 222Cross-Validation 222Best-Pruned Tree 2249.5 Classification Rules from Trees 2269.6 Classification Trees for More Than Two Classes 2279.7 Regression Trees 227Prediction 228Measuring Impurity 228Evaluating Performance 2299.8 Improving Prediction: Random Forests and Boosted Trees 229Random Forests 229Boosted Trees 2319.9 Advantages and Weaknesses of a Tree 232Problems 234CHAPTER 10 Logistic Regression 23710.1 Introduction 23710.2 The Logistic Regression Model 23910.3 Example: Acceptance of Personal Loan 240Model with a Single Predictor 241Estimating the Logistic Model from Data: Computing Parameter Estimates 243Interpreting Results in Terms of Odds (for a Profiling Goal) 24410.4 Evaluating Classification Performance 247Variable Selection 24810.5 Example of Complete Analysis: Predicting Delayed Flights 250Data Preprocessing 251Model-Fitting and Estimation 254Model Interpretation 254Model Performance 254Variable Selection 25710.6 Appendix: Logistic Regression for Profiling 259Appendix A: Why Linear Regression Is Problematic for a Categorical Outcome 259Appendix B: Evaluating Explanatory Power 261Appendix C: Logistic Regression for More Than Two Classes 264Problems 268CHAPTER 11 Neural Nets 27111.1 Introduction 27111.2 Concept and Structure of a Neural Network 27211.3 Fitting a Network to Data 273Example 1: Tiny Dataset 273Computing Output of Nodes 274Preprocessing the Data 277Training the Model 278Example 2: Classifying Accident Severity 282Avoiding Overfitting 283Using the Output for Prediction and Classification 28311.4 Required User Input 28511.5 Exploring the Relationship Between Predictors and Outcome 28711.6 Advantages and Weaknesses of Neural Networks 288Problems 290CHAPTER 12 Discriminant Analysis 29312.1 Introduction 293Example 1: Riding Mowers 294Example 2: Personal Loan Acceptance 29412.2 Distance of a Record from a Class 29612.3 Fisher's Linear Classification Functions 29712.4 Classification Performance of Discriminant Analysis 30012.5 Prior Probabilities 30212.6 Unequal Misclassification Costs 30212.7 Classifying More Than Two Classes 303Example 3: Medical Dispatch to Accident Scenes 30312.8 Advantages and Weaknesses 306Problems 307CHAPTER 13 Combining Methods: Ensembles and Uplift Modeling 31113.1 Ensembles 311Why Ensembles Can Improve Predictive Power 312Simple Averaging 314Bagging 315Boosting 315Bagging and Boosting in R 315Advantages and Weaknesses of Ensembles 31513.2 Uplift (Persuasion) Modeling 317A-B Testing 318Uplift 318Gathering the Data 319A Simple Model 320Modeling Individual Uplift 321Computing Uplift with R 322Using the Results of an Uplift Model 32213.3 Summary 324Problems 325PART V MINING RELATIONSHIPS AMONG RECORDSCHAPTER 14 Association Rules and Collaborative Filtering 32914.1 Association Rules 329Discovering Association Rules in Transaction Databases 330Example 1: Synthetic Data on Purchases of Phone Faceplates 330Generating Candidate Rules 330The Apriori Algorithm 333Selecting Strong Rules 333Data Format 335The Process of Rule Selection 336Interpreting the Results 337Rules and Chance 339Example 2: Rules for Similar Book Purchases 34014.2 Collaborative Filtering 342Data Type and Format 343Example 3: Netflix Prize Contest 343User-Based Collaborative Filtering: "People Like You" 344Item-Based Collaborative Filtering 347Advantages and Weaknesses of Collaborative Filtering 348Collaborative Filtering vs. Association Rules 34914.3 Summary 351Problems 352CHAPTER 15 Cluster Analysis 35715.1 Introduction 357Example: Public Utilities 35915.2 Measuring Distance Between Two Records 361Euclidean Distance 361Normalizing Numerical Measurements 362Other Distance Measures for Numerical Data 362Distance Measures for Categorical Data 365Distance Measures for Mixed Data 36615.3 Measuring Distance Between Two Clusters 366Minimum Distance 366Maximum Distance 366Average Distance 367Centroid Distance 36715.4 Hierarchical (Agglomerative) Clustering 368Single Linkage 369Complete Linkage 370Average Linkage 370Centroid Linkage 370Ward's Method 370Dendrograms: Displaying Clustering Process and Results 371Validating Clusters 373Limitations of Hierarchical Clustering 37515.5 Non-Hierarchical Clustering: The k-Means Algorithm 376Choosing the Number of Clusters (k) 377Problems 382PART VI FORECASTING TIME SERIESCHAPTER 16 Handling Time Series 38716.1 Introduction 38716.2 Descriptive vs. Predictive Modeling 38916.3 Popular Forecasting Methods in Business 389Combining Methods 38916.4 Time Series Components 390Example: Ridership on Amtrak Trains 39016.5 Data-Partitioning and Performance Evaluation 395Benchmark Performance: Naive Forecasts 395Generating Future Forecasts 396Problems 398CHAPTER 17 Regression-Based Forecasting 40117.1 A Model with Trend 401Linear Trend 401Exponential Trend 405Polynomial Trend 40717.2 A Model with Seasonality 40717.3 A Model with Trend and Seasonality 41117.4 Autocorrelation and ARIMA Models 412Computing Autocorrelation 413Improving Forecasts by Integrating Autocorrelation Information 416Evaluating Predictability 420Problems 422CHAPTER 18 Smoothing Methods 43318.1 Introduction 43318.2 Moving Average 434Centered Moving Average for Visualization 434Trailing Moving Average for Forecasting 435Choosing Window Width (w) 43918.3 Simple Exponential Smoothing 439Choosing Smoothing Parameter 440Relation Between Moving Average and Simple Exponential Smoothing 44018.4 Advanced Exponential Smoothing 442Series with a Trend 442Series with a Trend and Seasonality 443Series with Seasonality (No Trend) 443Problems 446PART VII DATA ANALYTICSCHAPTER 19 Social Network Analytics 45519.1 Introduction 45519.2 Directed vs. Undirected Networks 45719.3 Visualizing and Analyzing Networks 458Graph Layout 458Edge List 460Adjacency Matrix 461Using Network Data in Classification and Prediction 46119.4 Social Data Metrics and Taxonomy 462Node-Level Centrality Metrics 463Egocentric Network 463Network Metrics 46519.5 Using Network Metrics in Prediction and Classification 467Link Prediction 467Entity Resolution 467Collaborative Filtering 46819.6 Collecting Social Network Data with R 47119.7 Advantages and Disadvantages 474Problems 476CHAPTER 20 Text Mining 47920.1 Introduction 47920.2 The Tabular Representation of Text: Term-Document Matrix and "Bag-of-Words" 48020.3 Bag-of-Words vs. Meaning Extraction at Document Level 48120.4 Preprocessing the Text 482Tokenization 484Text Reduction 485Presence/Absence vs. Frequency 487Term Frequency-Inverse Document Frequency (TF-IDF) 487From Terms to Concepts: Latent Semantic Indexing 488Extracting Meaning 48920.5 Implementing Data Mining Methods 48920.6 Example: Online Discussions on Autos and Electronics 490Importing and Labeling the Records 490Text Preprocessing in R 491Producing a Concept Matrix 491Fitting a Predictive Model 492Prediction 49220.7 Summary 494Problems 495PART VIII CASESCHAPTER 21 Cases 49921.1 Charles Book Club 499The Book Industry 499Database Marketing at Charles 500Data Mining Techniques 502Assignment 50421.2 German Credit 505Background 505Data 506Assignment 50721.3 Tayko Software Cataloger 510Background 510The Mailing Experiment 510Data 510Assignment 51221.4 Political Persuasion 513Background 513Predictive Analytics Arrives in US Politics 513Political Targeting 514Uplift 514Data 515Assignment 51621.5 Taxi Cancellations 517Business Situation 517Assignment 51721.6 Segmenting Consumers of Bath Soap 518Business Situation 518Key Problems 519Data 519Measuring Brand Loyalty 519Assignment 52121.7 Direct-Mail Fundraising 521Background 521Data 522Assignment 52321.8 Catalog Cross-Selling 524Background 524Assignment 52421.9 Predicting Bankruptcy 525Predicting Corporate Bankruptcy 525Assignment 52621.10 Time Series Case: Forecasting Public Transportation Demand 528Background 528Problem Description 528Available Data 528Assignment Goal 528Assignment 529Tips and Suggested Steps 529References 531Data Files Used in the Book 533Index 535
Produktdetails
Autoren | Peter Bruce, Peter C Bruce, Peter C. Bruce, Kenneth C. Lichtendahl, Nitin R. Patel, G Shmueli, Gali Shmueli, Galit Shmueli, Galit Bruce Shmueli, Galit Patel Shmueli, Luis Torgo, Inbal Yahav, Inbal et al Yahav |
Verlag | Wiley, John and Sons Ltd |
Sprache | Englisch |
Produktform | Fester Einband |
Erschienen | 07.11.2017 |
EAN | 9781118879368 |
ISBN | 978-1-118-87936-8 |
Seiten | 576 |
Themen |
Naturwissenschaften, Medizin, Informatik, Technik
> Mathematik
> Wahrscheinlichkeitstheorie, Stochastik, Mathematische Statistik
Sozialwissenschaften, Recht,Wirtschaft > Wirtschaft > Volkswirtschaft Statistik, Informatik, Theorie, Data Mining, Statistics, computer science, Business & management, Wirtschaft u. Management, Data Mining Statistics, Database & Data Warehousing Technologies, Datenbanken u. Data Warehousing, Theorie der Entscheidungsfindung, Decision Sciences |
Kundenrezensionen
Zu diesem Artikel wurden noch keine Rezensionen verfasst. Schreibe die erste Bewertung und sei anderen Benutzern bei der Kaufentscheidung behilflich.
Schreibe eine Rezension
Top oder Flop? Schreibe deine eigene Rezension.