Share
Fr. 146.00
CD Larose, Chantal Larose, Chantal D Larose, Chantal D. Larose, Chantal D. (Eastern Connecticut State Univ Larose, Chantal D. Larose Larose...
Data Science Using Python and R
English · Hardback
Shipping usually within 1 to 3 weeks (not available at short notice)
Description
Learn data science by doing data science!
Data Science Using Python and R will get you plugged into the world's two most widespread open-source platforms for data science: Python and R.
Data science is hot. Bloomberg called data scientist "the hottest job in America." Python and R are the top two open-source data science tools in the world. In Data Science Using Python and R, you will learn step-by-step how to produce hands-on solutions to real-world business problems, using state-of-the-art techniques.
Data Science Using Python and R is written for the general reader with no previous analytics or programming experience. An entire chapter is dedicated to learning the basics of Python and R. Then, each chapter presents step-by-step instructions and walkthroughs for solving data science problems using Python and R.
Those with analytics experience will appreciate having a one-stop shop for learning how to do data science using Python and R. Topics covered include data preparation, exploratory data analysis, preparing to model the data, decision trees, model evaluation, misclassification costs, naïve Bayes classification, neural networks, clustering, regression modeling, dimension reduction, and association rules mining.
Further, exciting new topics such as random forests and general linear models are also included. The book emphasizes data-driven error costs to enhance profitability, which avoids the common pitfalls that may cost a company millions of dollars.
Data Science Using Python and R provides exercises at the end of every chapter, totaling over 500 exercises in the book. Readers will therefore have plenty of opportunity to test their newfound data science skills and expertise. In the Hands-on Analysis exercises, readers are challenged to solve interesting business problems using real-world data sets.
List of contents
Preface xi
About the Authors xv
Acknowledgements xvii
Chapter 1 Introduction to Data Science 1
1.1 Why Data Science? 1
1.2 What is Data Science? 1
1.3 The Data Science Methodology 2
1.4 Data Science Tasks 5
1.4.1 Description 6
1.4.2 Estimation 6
1.4.3 Classification 6
1.4.4 Clustering 7
1.4.5 Prediction 7
1.4.6 Association 7
Exercises 8
Chapter 2 The Basics of Python and R 9
2.1 Downloading Python 9
2.2 Basics of Coding in Python 9
2.2.1 Using Comments in Python 9
2.2.2 Executing Commands in Python 10
2.2.3 Importing Packages in Python 11
2.2.4 Getting Data into Python 12
2.2.5 Saving Output in Python 13
2.2.6 Accessing Records and Variables in Python 14
2.2.7 Setting Up Graphics in Python 15
2.3 Downloading R and RStudio 17
2.4 Basics of Coding in R 19
2.4.1 Using Comments in R 19
2.4.2 Executing Commands in R 20
2.4.3 Importing Packages in R 20
2.4.4 Getting Data into R 21
2.4.5 Saving Output in R 23
2.4.6 Accessing Records and Variables in R 24
References 26
Exercises 26
Chapter 3 Data Preparation 29
3.1 The Bank Marketing Data Set 29
3.2 The Problem Understanding Phase 29
3.2.1 Clearly Enunciate the Project Objectives 29
3.2.2 Translate These Objectives into a Data Science Problem 30
3.3 Data Preparation Phase 31
3.4 Adding an Index Field 31
3.4.1 How to Add an Index Field Using Python 31
3.4.2 How to Add an Index Field Using R 32
3.5 Changing Misleading Field Values 33
3.5.1 How to Change Misleading Field Values Using Python 34
3.5.2 How to Change Misleading Field Values Using R 34
3.6 Reexpression of Categorical Data as Numeric 36
3.6.1 How to Reexpress Categorical Field Values Using Python 36
3.6.2 How to Reexpress Categorical Field Values Using R 38
3.7 Standardizing the Numeric Fields 39
3.7.1 How to Standardize Numeric Fields Using Python 40
3.7.2 How to Standardize Numeric Fields Using R 40
3.8 Identifying Outliers 40
3.8.1 How to Identify Outliers Using Python 41
3.8.2 How to Identify Outliers Using R 42
References 43
Exercises 44
Chapter 4 Exploratory Data Analysis 47
4.1 EDA Versus HT 47
4.2 Bar Graphs with Response Overlay 47
4.2.1 How to Construct a Bar Graph with Overlay Using Python 49
4.2.2 How to Construct a Bar Graph with Overlay Using R 50
4.3 Contingency Tables 51
4.3.1 How to Construct Contingency Tables Using Python 52
4.3.2 How to Construct Contingency Tables Using R 53
4.4 Histograms with Response Overlay 53
4.4.1 How to Construct Histograms with Overlay Using Python 55
4.4.2 How to Construct Histograms with Overlay Using R 58
4.5 Binning Based on Predictive Value 58
4.5.1 How to Perform Binning Based on Predictive Value Using Python 59
4.5.2 How to Perform Binning Based on Predictive Value Using R 62
References 63
Exercises 63
Chapter 5 Preparing to Model the Data 69
5.1 The Story So Far 69
5.2 Partitioning the Data 69
5.2.1 How to Partition the Data in Python 70
5.2.2 How to Partition the Data in R 71
5.3 Validating your Partition 72
5.4 Balancing the Training Data Set 73
&nb
About the author
CHANTAL D. LAROSE, PHD, is an Assistant Professor of Statistics & Data Science at Eastern Connecticut State University (ECSU). She has co-authored three books on data science and predictive analytics and helped develop data science programs at ECSU and SUNY New Paltz. Her PhD dissertation, Model-Based Clustering of Incomplete Data, tackles the persistent problem of trying to do data science with incomplete data. DANIEL T. LAROSE, PHD, is a Professor of Data Science and Statistics and Director of the Data Science programs at Central Connecticut State University. He has published many books on data science, data mining, predictive analytics, and statistics. His consulting clients include The Economist magazine, Forbes Magazine, the CIT Group, and Microsoft.
Summary
Learn data science by doing data science!
Data Science Using Python and R will get you plugged into the world's two most widespread open-source platforms for data science: Python and R.
Data science is hot. Bloomberg called data scientist "the hottest job in America." Python and R are the top two open-source data science tools in the world. In Data Science Using Python and R, you will learn step-by-step how to produce hands-on solutions to real-world business problems, using state-of-the-art techniques.
Data Science Using Python and R is written for the general reader with no previous analytics or programming experience. An entire chapter is dedicated to learning the basics of Python and R. Then, each chapter presents step-by-step instructions and walkthroughs for solving data science problems using Python and R.
Those with analytics experience will appreciate having a one-stop shop for learning how to do data science using Python and R. Topics covered include data preparation, exploratory data analysis, preparing to model the data, decision trees, model evaluation, misclassification costs, naïve Bayes classification, neural networks, clustering, regression modeling, dimension reduction, and association rules mining.
Further, exciting new topics such as random forests and general linear models are also included. The book emphasizes data-driven error costs to enhance profitability, which avoids the common pitfalls that may cost a company millions of dollars.
Data Science Using Python and R provides exercises at the end of every chapter, totaling over 500 exercises in the book. Readers will therefore have plenty of opportunity to test their newfound data science skills and expertise. In the Hands-on Analysis exercises, readers are challenged to solve interesting business problems using real-world data sets.
Product details
Authors | CD Larose, Chantal Larose, Chantal D Larose, Chantal D. Larose, Chantal D. (Eastern Connecticut State Univ Larose, Chantal D. Larose Larose, Daniel T Larose, Daniel T. Larose, Daniel T. (Department of Mathematical Scie Larose, Larose Chantal D., Larose Daniel T. |
Publisher | Wiley, John and Sons Ltd |
Languages | English |
Product format | Hardback |
Released | 30.04.2019 |
EAN | 9781119526810 |
ISBN | 978-1-119-52681-0 |
No. of pages | 256 |
Series |
Wiley Series on Methods and Ap Wiley Series on Methods and Applications in Data Mining Wiley Methods and Applications |
Subjects |
Guides
Natural sciences, medicine, IT, technology > IT, data processing > Application software Statistik, Informatik, Datenanalyse, Python (Programmiersprache), Statistics, computer science, data analysis, End-User Computing, Computer-Ratgeber, Data Mining & Knowledge Discovery, Data Mining u. Knowledge Discovery, R (Programm), Datenbanken (außer Microsoft), Database software (Non-Microsoft) |
Customer reviews
No reviews have been written for this item yet. Write the first review and be helpful to other users when they decide on a purchase.
Write a review
Thumbs up or thumbs down? Write your own review.