Statistical data cleaning with applications in R / (Record no. 208625)
[ view plain ]
| 000 -LEADER | |
|---|---|
| fixed length control field | 10787cam a2200373 i 4500 |
| 020 ## - INTERNATIONAL STANDARD BOOK NUMBER | |
| International Standard Book Number | 9781118897140 |
| 020 ## - INTERNATIONAL STANDARD BOOK NUMBER | |
| International Standard Book Number | 1118897145 |
| 020 ## - INTERNATIONAL STANDARD BOOK NUMBER | |
| International Standard Book Number | 9781118897126 |
| 020 ## - INTERNATIONAL STANDARD BOOK NUMBER | |
| International Standard Book Number | 1118897129 |
| 040 ## - CATALOGING SOURCE | |
| Transcribing agency | CUS |
| 100 1# - MAIN ENTRY--PERSONAL NAME | |
| Personal name | Jonge, Edwin de, |
| 245 10 - TITLE STATEMENT | |
| Title | Statistical data cleaning with applications in R / |
| Statement of responsibility, etc. | Edwin de Jonge, Mark van der Loo. |
| 260 #1 - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT) | |
| Place of publication, distribution, etc. | [Hoboken, NJ] : |
| Name of publisher, distributor, etc. | John Wiley & Sons Ltd, |
| Date of publication, distribution, etc. | [2018] |
| 300 ## - DESCRIPTION | |
| Extent | 1 online resource |
| 505 0# - FORMATTED CONTENTS NOTE | |
| Formatted contents note | Cover -- Title Page -- Copyright -- Contents -- Foreword -- About the Companion Website -- Chapter 1 Data Cleaning -- 1.1 The Statistical Value Chain -- 1.1.1 Raw Data -- 1.1.2 Input Data -- 1.1.3 Valid Data -- 1.1.4 Statistics -- 1.1.5 Output -- 1.2 Notation and Conventions Used in this Book -- Chapter 2 A Brief Introduction to R -- 2.1 R on the Command Line -- 2.1.1 Getting Help and Learning R -- 2.2 Vectors -- 2.2.1 Computing with Vectors -- 2.2.2 Arrays and Matrices -- 2.3 Data Frames -- 2.3.1 The Formula{u2010}Data Interface -- 2.3.2 Selecting Rows and Columns -- Boolean Operators -- 2.3.3 Selection with Indices -- 2.3.4 Data Frame Manipulation: The dplyr Package -- 2.4 Special Values -- 2.4.1 Missing Values -- 2.5 Getting Data into and out of R -- 2.5.1 File Paths in R -- 2.5.2 Formats Provided by Packages -- 2.5.3 Reading Data from a Database -- 2.5.4 Working with Data External to R -- 2.6 Functions -- 2.6.1 Using Functions -- 2.6.2 Writing Functions -- 2.7 Packages Used in this Book -- Chapter 3 Technical Representation of Data -- 3.1 Numeric Data -- 3.1.1 Integers -- 3.1.2 Integers in R -- 3.1.3 Real Numbers -- 3.1.4 Double Precision Numbers -- 3.1.5 The Concept of Machine Precision -- 3.1.6 Consequences of Working with Floating Point Numbers -- 3.1.7 Dealing with the Consequences -- 3.1.8 Numeric Data in R -- 3.2 Text Data -- 3.2.1 Terminology and Encodings -- 3.2.2 Unicode -- 3.2.3 Some Popular Encodings -- 3.2.4 Textual Data in R: Objects of Class Character -- 3.2.5 Encoding in R -- 3.2.6 Reading and Writing of Data with Non{u2010}Local Encoding -- 3.2.7 Detecting Encoding -- 3.2.8 Collation and Sorting -- 3.3 Times and Dates -- 3.3.1 AIT, UTC, and POSIX Seconds Since the Epcoch -- 3.3.2 Time and Date Notation -- 3.3.3 Time and Date Storage in R -- 3.3.4 Time and Date Conversion in R -- 3.3.5 Leap Days, Time Zones, and Daylight Saving Times. |
| 505 8# - FORMATTED CONTENTS NOTE | |
| Formatted contents note | 3.4 Notes on Locale Settings -- Chapter 4 Data Structure -- 4.1 Introduction -- 4.2 Tabular Data -- 4.2.1 data.frame -- 4.2.2 Databases -- 4.2.3 dplyr -- 4.3 Matrix Data -- 4.4 Time Series -- 4.5 Graph Data -- 4.6 Web Data -- 4.6.1 Web Scraping -- 4.6.2 Web API -- 4.7 Other Data -- 4.8 Tidying Tabular Data -- 4.8.1 Variable Per Column -- 4.8.2 Single Observation Stored in Multiple Tables -- Chapter 5 Cleaning Text Data -- 5.1 Character Normalization -- 5.1.1 Encoding Conversion and Unicode Normalization -- 5.1.2 Character Conversion and Transliteration -- 5.2 Pattern Matching with Regular Expressions -- 5.2.1 Basic Regular Expressions -- 5.2.2 Practical Regular Expressions -- 5.2.3 Generating Regular Expressions in R -- 5.3 Common String Processing Tasks in R -- 5.4 Approximate Text Matching -- 5.4.1 String Metrics -- 5.4.2 String Metrics and Approximate Text Matching in R -- Chapter 6 Data Validation -- 6.1 Introduction -- 6.2 A First Look at the validate Package -- 6.2.1 Quick Checks with check_that -- 6.2.2 The Basic Workflow: validator and confront -- 6.2.3 A Little Background on validate and DSLs -- 6.3 Defining Data Validation -- 6.3.1 Formal Definition of Data Validation -- 6.3.2 Operations on Validation Functions -- 6.3.3 Validation and Missing Values -- 6.3.4 Structure of Validation Functions -- 6.3.5 Demarcating Validation Rules in validate -- 6.4 A Formal Typology of Data Validation Functions -- 6.4.1 A Closer Look at Measurement -- 6.4.2 Classification of Validation Rules -- 6.5 Validating Data with the validate Package -- 6.5.1 Validation Rules in the Console and the validator Object -- 6.5.2 Validating in the Pipeline -- 6.5.3 Raising Errors or Warnings -- 6.5.4 Tolerance for Testing Linear Equalities -- 6.5.5 Setting and Resetting Options -- 6.5.6 Importing and Exporting Validation Rules from and to File. |
| 505 8# - FORMATTED CONTENTS NOTE | |
| Formatted contents note | 6.5.7 Checking Variable Types and Metadata -- 6.5.8 Checking Value Ranges and Code Lists -- 6.5.9 Checking In{u2010}Record Consistency Rules -- 6.5.10 Checking Cross{u2010}Record Validation Rules -- 6.5.11 Checking Functional Dependencies -- 6.5.12 Cross{u2010}Dataset Validation -- 6.5.13 Macros, Variable Groups, Keys -- 6.5.14 Analyzing Output: validation Objects -- 6.5.15 Output Dimensionality and Output Selection -- 6.5.15 Exercises for Section -- Chapter 7 Localizing Errors in Data Records -- 7.1 Error Localization -- 7.2 Error Localization with R -- 7.2.1 The Errorlocate Package -- 7.3 Error Localization as MIP{u2010}Problem -- 7.3.1 Error Localization and Mixed{u2010}Integer Programming -- 7.3.2 Linear Restrictions -- 7.3.3 Categorical Restrictions -- 7.3.4 Mixed{u2010}Type Restrictions -- 7.4 Numerical Stability Issues -- 7.4.1 A Short Overview of MIP Solving -- 7.4.2 Scaling Numerical Records -- 7.4.3 Setting Numerical Threshold Values -- 7.5 Practical Issues -- 7.5.1 Setting Reliability Weights -- 7.5.2 Simplifying Conditional Validation Rules -- 7.6 Conclusion -- Chapter 8 Rule Set Maintenance and Simplification -- 8.1 Quality of Validation Rules -- 8.1.1 Completeness -- 8.1.2 Superfluous Rules and Infeasibility -- 8.2 Rules in the Language of Logic -- 8.2.1 Using Logic to Rewrite Rules -- 8.3 Rule Set Issues -- 8.3.1 Infeasible Rule Set -- 8.3.2 Fixed Value -- 8.3.3 Redundant Rule -- 8.3.4 Nonrelaxing Clause -- 8.3.5 Nonconstraining Clause -- 8.4 Detection and Simplification Procedure -- 8.4.1 Mixed{u2010}Integer Programming -- 8.4.2 Detecting Feasibility -- 8.4.3 Finding Rules Causing Infeasibility -- 8.4.4 Detecting Conflicting Rules -- 8.4.5 Detect Partial Infeasibility -- 8.4.6 Detect Fixed Values -- 8.4.7 Detect Nonrelaxing Clauses -- 8.4.8 Detect Nonconstraining Clauses -- 8.4.9 Detect Redundant Rules -- 8.5 Conclusion. |
| 505 8# - FORMATTED CONTENTS NOTE | |
| Formatted contents note | Chapter 9 Methods Based on Models for Domain Knowledge -- 9.1 Correction with Data Modifying Rules -- 9.1.1 Modifying Functions -- 9.1.2 A Class of Modifying Functions on Numerical Data -- 9.1.2 Exercises for Section -- 9.2 Rule{u2010}Based Correction with dcmodify -- 9.2.1 Reading Rules from File -- 9.2.2 Modifying Rule Syntax -- 9.2.3 Missing Values -- 9.2.4 Sequential and Sequence{u2010}Independent Execution -- 9.2.5 Options Settings Management -- 9.3 Deductive Correction -- 9.3.1 Correcting Typing Errors in Numeric Data -- 9.3.1 Exercises for Section -- 9.3.2 Deductive Imputation Using Linear Restrictions -- Chapter 10 Imputation and Adjustment -- 10.1 Missing Data -- 10.1.1 Missing Data Mechanisms -- 10.1.2 Visualizing and Testing for Patterns in Missing Data Using R -- 10.2 Model{u2010}Based Imputation -- 10.3 Model{u2010}Based Imputation in R -- 10.3.1 Specifying Imputation Methods with simputation -- 10.3.2 Linear Regression{u2010}Based Imputation -- 10.3.3 M{u2010}Estimation -- 10.3.4 Lasso, Ridge, and Elasticnet Regression -- 10.3.5 Classification and Regression Trees -- 10.3.6 Random Forest -- 10.4 Donor Imputation with R -- 10.4.1 Random and Sequential Hot Deck Imputation -- 10.4.2 k Nearest Neighbors and Predictive Mean Matching -- 10.5 Other Methods in the simputation Package -- 10.6 Imputation Based on the EM Algorithm -- 10.6.1 The EM Algorithm -- 10.6.2 EM Imputation Assuming the Multivariate Normal Distribution -- 10.7 Sampling Variance under Imputation -- 10.8 Multiple Imputations -- 10.8.1 Multiple Imputation Based on the EM Algorithm -- 10.8.2 The Amelia Package -- 10.8.3 Multivariate Imputation with Chained Equations (Mice) -- 10.8.4 Imputation with the mice Package -- 10.9 Analytic Approaches to Estimate Variance of Imputation -- 10.9.1 Imputation as Part of the Estimator -- 10.10 Choosing an Imputation Method -- 10.11 Constraint Value Adjustment. |
| 505 8# - FORMATTED CONTENTS NOTE | |
| Formatted contents note | 10.11.1 Formal Description -- 10.11.2 Application to Imputed Data -- 10.11.3 Adjusting Imputed Values with the rspa Package -- Chapter 11 Example: A Small Data{u2010}Cleaning System -- 11.1 Setup -- 11.1.1 Deterministic Methods -- 11.1.2 Error Localization -- 11.1.3 Imputation -- 11.1.4 Adjusting Imputed Data -- 11.2 Monitoring Changes in Data -- 11.2.1 Data Diff (Daff) -- 11.2.2 Summarizing Cell Changes -- 11.2.3 Summarizing Changes in Conformance to Validation Rules -- 11.2.4 Track Changes in Data Automatically with lumberjack -- 11.3 Integration and Automation -- 11.3.1 Using RScript -- 11.3.2 The docopt Package -- 11.3.3 Automated Data Cleaning -- References -- Index -- EULA. |
| 650 #0 - SUBJECT | |
| Keyword | Statistics |
| 650 #0 - SUBJECT | |
| Keyword | R (Computer program language) |
| 650 #7 - SUBJECT | |
| Keyword | MATHEMATICS |
| 650 #7 - SUBJECT | |
| Keyword | MATHEMATICS |
| 650 #7 - SUBJECT | |
| Keyword | R (Computer program language) |
| 650 #7 - SUBJECT | |
| Keyword | Statistics |
| 700 1# - ADDED ENTRY--PERSONAL NAME | |
| Personal name | Loo, Mark van der, |
| 856 40 - ONLINE RESOURCES | |
| url | https://doi.org/10.1002/9781118897126 |
| 942 ## - ADDED ENTRY ELEMENTS (KOHA) | |
| Koha item type | e-Books |
| Home library | Current library | Accession number | Koha item type |
|---|---|---|---|
| Central Library, Sikkim University | Central Library, Sikkim University | E-2701 | e-Books |
