- Abbeel, P., 223
- Abel, E., 66
- Accounting automation avenues and investment management, 265
- Accuracy, data, 7–8
- Actions in holistic workflow framework, 74–78
- production data stage, 77–78
- raw data stage, 74–76
- creating metadata, 75–76
- data ingestion, 75
- refined data stage, 76–77
- Adam optimizer, 222
- Aggregate function, 85, 86, 87
- Aggregation, 78
- Ahmed, F., 225
- AI-based self-driving car,
- about the model, 283, 285
- introduction, 275–277
- algorithm used, 279–280
- environment overview, 277–279
- preprocessing the image/frame, 285–286
- real-time lane detection and obstacle avoidance, 283
- self-driving car simulation, 281
- Alexa, 238
- Altair Monarch, 60, 61f
- Altman, R.B., 161
- Alto, 308
- Amazon, 4
- Amazon Web Services, 99
- Analogue-to-digital conversion, 199
- Analytical input, 201–204
- Analytics,
- big data. see Big data analytics in real time
- and business intelligence in optimization, role, 44–45
- data science, 189
- defined, 189
- descriptive, predictive, diagnostic, and prescriptive, 100
- express, using data wrangling process, 106
- self-service, 50
- AnoGAN, 227
- Anomaly detection algorithm, 227, 244
- Antilock brakes in automobiles, 4
- Anzo, 60, 61, 62f
- Apache Marvin AI, 248
- Architecture of data wrangling, 56–59
- Arjovsky, M., 221, 225
- Array, data structure in R, 125, 136–138
- array() function, 136
- Artés-Rodríguez, A., 55, 67
- Art-GAN, 227
- Artificial control and effective fiduciaries, 264–265
- Artificial intelligence (AI),
-
Artificial intelligence in accounting and finance,
- applications of, 256–257
- in consumer finance, 257
- in corporate finance, 257–258
- in personal finance, 257
- benefits and advantages of, 258–259
- accounting automation avenues and investment management, 265
- active insights help drive better decisions, 261–262
- AI machines make accounting tasks easier, 260–261
- artificial control and effective fiduciaries, 264–265
- build trust through better financial protection and control, 261
- changing the human mindset, 259
- consider the “Runaway Effect,” 264
- fighting misrepresentation, 260
- fraud protection, auditing, and compliance, 262–263
- intelligent investments, 264
- invisible accounting, 261
- machines as financial guardians, 263
- machines imitate the human brain, 260
- challenges of, 265–267
- cyber and data privacy, 267
- data quality and management, 267
- institutional issues, 270
- legal risks, liability, and culture transformation, 267–268
- limits of machine learning and AI, 269
- practical challenges, 268
- roles and skills, 269–270
- changing the human mindset, 258–259
- future scope of study, 272
- introduction, 252–254
- suggestions and recommendation, 271
- uses of,
- AI driven Chatbots, 255–256
- audits, 255
- monthly, quarterly cash flows, and expense management, 255
- pay and receive processing, 254
- supplier on boarding and procurement, 255
- Artificial neural network (ANN), 276
- Artwork, 227
- Arús-Pous, J., 227
- Ashok Leyland, 292
- Association, unsupervised learning for, 237
- Attacks, type, 37
- Audits, 255
- Authentication, data, 35
- Auto-encoders, 150, 176–178
- Automotive industry,
- China, 301
- European Union, 301
- Indian; see also Suppliers network
- on SCM of Indian auto industry,
- COVID-19 on automotive sector, 301–305
- global, 298, 300
- prior pandemic, 294–296
- Japan, 301
- United States, 300–301
- Auxiliary data, 57
- AVERAGEIF(S) function, 28
- AWS, 22
- Backup, data, 35
- Bar graph, 87, 88–89
- Barrejón, D., 55, 67
- Bartenhagen, C., 150
- Batch normalization, concept of, 221
- Bengio, Y., 214
- Berret, C., 67
-
Bessel kernel, 165
- Between-class scatter matrix, 163
- Bhatt, P., 293
- Big data, 17, 45
- challenges of, 113
- cost-effective manipulations of, 54
- processing, 99
- 4 V’s of, 2
- Big data analytics in real time,
- applications in commercial
- surroundings, 196–207
- IoT and data science, 197–204
- predictive analysis for corporate enterprise, 204–207
- aspiration for meaningful analysis, 193–196
- design, structure, and techniques, 191–192
- fundamental infrastructure of, 192
- information management to
- valuation offerings, transition
- from, 195–196
- from information to guidance, 194–195
- insights’ constraints, 207–209
- data, fragmented and imprecise, 208
- extensibility, 208
- implementation in real time scenarios, 208–209
- representation of data, 207–208
- technological developments, 207
- IoT and, 190–191
- overview, 188–190
- visualization tools, 193–196
- Binning method, 103
- Biometric authentication, 246
- Bixby, 238
- Bjerrum, E.J., 227
- Blind Source Separation (BSS), 171
- BMW, 292
- Bors, C., 54–55
- Boston consulting group, 291
- Bottou, L., 221, 225
- Braun, M.T., 54
- Breaching, data. see Data breaching
- #BreakTheChain, 294
- Bridgewater associates, 264
- Brzozowski, M., 224
- Buono, P., 54, 81
- Business insights, 32
- Business Intellectual capacity (BI) programs, 190
- Business intelligence,
- analytics, 11
- benefits of, 195
- data wrangling-based, 190
- effectiveness of, 191
- in optimization, role, 44–45
- possibilities of, 192
- real-time, 193
- tools, 191
- Cab booking, apps for, 238, 240f
- Caffe, 247
- Canny edge extraction, 276
- Capacity planning, 36
- Carreras, C., 55
- Ceusters, W., 67
- c() function, 127–128
- CGANs (conditional GANs), 218–219
- Character type of atomic vector, 126
- Chatbots, 252, 255–256, 257, 258, 260
- Chen, H., 227
- Chen, X., 223
- Cheung, V., 225
- China, COVID-19 on automotive sector, 301
- Chintala, S., 220, 221, 225, 226
- CIFAR-10 dataset, 221, 225
- City operations map visualizations, Uber’s, 46–47
- Civili, C., 66
- class() function, 127–128
- Classification algorithms, 243, 244f
- Classifiers, used, 179
- Classroom, 31–32
-
Cleaning data, 2, 15, 58, 79, 92, 95, 100, 111, 200–201
- Cloud DBA, 22
- Clustering, unsupervised learning for, 237
- Clustering algorithms, 245
- Clustering method, 103, 149
- Clustering technique, 276–277
- Cohan, A., 66
- Colon operator, vectors using, 126
- Column(s),
- addition of, 144–145
- in dataset, changing order of, 82, 83f
- orthonormal matrices, 175
- in relational database, 6, 7
- Complex type of atomic vector, 126
- Compound annual growth rate (CAGR), 290, 295, 306
- Computational modeling, 205
- Computerized reasoning, 253
- CONCATENATE function, 28
- Conditional GANs (cGANs), 218–219
- Conditional-LSTM GAN, 227
- Confirmatory factor analysis, 175
- Conformal Isomap (C-Isomap), 173
- Consolidating data, 100
- Core profiling, types, 79–80
- individual values profiling, 80
- set-based profiling, 80
- Courville, A.C., 214, 225
- Covariance matrix of data, 158, 159, 161, 167, 176
- COVID-19 pandemic, 290, 291, 292, 293, 300
- on automotive sector, 300
- effect on Indian automobile industry, 301–305
- global automobile industry, 298, 300–301
- MSIL during, 296–297
- post COVID-19 recovery, automobile industry scenario, 306
- thump on automobile sector, 294–296
- worldwide economic impact of epidemic, 298, 299t
- Cross-validation folds, data preparation within, 104
- CSV file, data in, 5
- CSVKit, 17, 110, 115, 120
- Customer connection management software, 206
- Custom metadata creation, defined, 6
- Cyber and data privacy, 267
- Cybercriminals, 37, 38, 40
- CycleGANs, 218
- Dash boarding, 11
- Data,
- defined, 2
- design and preparation, 9
- direct value from, 3, 4
- documentation & reproducibility, 111, 114
- extracting insights from, 100
- filtering/scrubbing, 17
- fragmented and imprecise, 208
- indirect value, 3
- input, 5–6
- learnings from, 48
- merging & linking of, 111
- mishandling and its consequences, 39–41
- processing and organizing, 99–100
- quality, 110–111
- representation of, 201, 207–208
- stages
- produced. see Production data
- raw, 4–8, 73, 74–76
- refined. see Refined data
- structuring, 15, 78, 95
- utilization, 92
- warehouse administrator, 21
- workflow structure, 4
-
Data accessing, 58
- Data accuracy, 7–8
- Data administrators, 56, 67, 68, 110, 113, 114, 115, 194
- defined, 20
- goal, 29
- practical problems faced by, 54
- responsibilities, 20, 34–37
- capacity planning, 36
- data authentication, 35
- data backup and recovery, 35
- database tuning, 36–37
- data extraction, transformation, and loading, 34
- data handling, 35
- data security, 35
- effective use of human resource, 36
- security and performance monitoring, 36
- software installation and maintenance, 34
- troubleshooting, 36
- roles, 20, 21–22
- skills required, 22–34
- Data analysis, 206–207
- Data analysts. see Data administrators
- Database administrator (DBA),
- Cloud DBA, 22
- concerns for, 37–39
- responsibility, 21, 34–37
- capacity planning, 36
- data authentication, 35
- data backup and recovery, 35
- database tuning, 36–37
- data extraction, transformation, and loading, 34
- data security, 35
- effective use of human resource, 36
- security and performance monitoring, 36
- software installation and maintenance, 34
- troubleshooting, 36
- role, 20, 21–22
- Database systems, data wrangling in, 66
- Database tuning, 36–37
- Data breaching, 37–39, 40
- laws, 41
- long-term effect of, 42
- phases of, 40–41
- Data cleaning, 2, 15, 58, 79, 92, 95, 100, 111, 200–201
- Data collection, 199, 200
- Data deluge, 110
- Data discovery, 14, 111
- Data enrichment, 15, 59, 78–79, 111
- Data errors, 118–119
- Data extraction, 58
- Data frame, 23, 125, 144–145
- accessing, 145
- addition of column, 144–145
- creation, 144
- data.frame() function, 144
- Data gathering, 17
- Data inconsistency, 101
- Data ingestion, 75
- Data integrity, 191
- Data Lake, 110
- Data leakage, 39
- in deep learning, 101–102
- in machine learning, 101–102, 103–104, 113
- in ML for medical treatment, 93–94
- Data management, defined, 110
- Data manipulation, 117, 118–119
- Datamation, 100
- Datameer, 63, 64f
- Data munging. see Data wrangling
- Data optimization, 13
- Data organization, 111
- Data preparation, 92, 93
- within cross-validation folds, 104
-
Data preprocessing, 92, 93
- performance of, 102
- use of, 100–101
- Data projects, workflow framework for, 72–74
- Data publishing, 16, 59, 95–96, 111
- Data quality and management, 267
- Data refinement, 13
- Data remediation. see Data wrangling
- Data reshaping, 55
- Data science,
- analytics, 189
- applications in production industry, 197–204
- data transformation, 199–204
- inter linked devices, 199
- defined, 188
- IoT and, 189
- Data scientists, role, 20
- Dataset(s),
- CIFAR-10, 221, 225
- columns, changing order of, 82, 83f
- drug trial, 8
- Fashion MNIST, 225
- granularity, 7
- ImageNet, 225
- MIR Flickr, 219
- MNIST, 219, 223
- red-wine quality, 178, 179, 180t
- scope, 8
- structure, 6–7
- temporality, 8
- training and test, 237
- used, 178
- validation, 104
- Wikiart, 227
- Wisconsin breast cancer, 178, 179, 181t
- YFCC100M, 219
- Data sources, 57
- Data structure in R,
- classification, 124–125
- heterogeneous, 138–145
- dataframe, 144–145
- defined, 138
- list, 139–143
- homogeneous, 124, 125–138
- array, 136–138
- factor, 131–132
- matrix, 132–136
- vectors, 125–131
- overview, 123–125
- Data structuring, 58
- Data theft, 40
- Data transformation, 2, 34, 54, 63, 199–204
- analytical input, 201–204
- cleaning and processing of data, 200–201
- information collection and storage, 200
- representing data, 201
- Data validation, 15, 59, 95, 111
- Data visualizations, 45, 48–49
- DataWrangler, 115
- Data wrangling,
- aims, 3
- application areas, 65–67
- in database systems, 66
- journalism data, 67
- medical data, 67
- open government data, 66
- traffic data, 66–67
- defined, 2, 54, 110
- do’s for, 16
- entails, 110–111
- goals, 114–115
- obstacles surrounding, 113–114
- overview, 2–4
- stages, 94–96
- cleaning, 95
- discovery, 94
- improving, 95
- publishing, 95–96
- structuring, 95
- validation, 95
- steps, 14–16, 111–114
-
tools for, 16–17, 59–65, 115–116
- ways for effective, 116–119
- Data wrangling dynamics,
- architecture, 56–59
- accessing, 58
- auxiliary data, 57
- cleaning, 58
- enriching, 59
- extraction, 58
- publication, 59
- sources, 57
- structuring, 58
- validation, 59
- challenges, 55–56
- overview, 53–54
- related work, 54–55
- tools, 59–65
- Altair Monarch, 60, 61f
- Anzo, 60, 61, 62f
- Datameer, 63, 64f
- Excel, 59–60
- Paxata, 63, 64f
- Tabula, 61, 62f
- Talend, 65
- Trifacta, 61, 63
- DDoS attacks, 37
- Decision making, 114
- Decision trees, 246
- Decoder, 177
- Deep Belief Network (DBN), 215
- Deep Boltzmann Machine (DBM), 215
- Deep Convolutional GANs (DCGANs), 218, 220–221
- Deep learning, 8, 20
- -based techniques, for image processing, 246
- data leakage in, 101–102
- in ERP, 91–92, 93
- GANs, 214, 215
- generative and discriminative models, 216–217
- DeepMind, 226, 227
- DeepRay, 226
- De la Torre, F., 168
- De-noising images, 168
- .describe() function, 83, 84f, 86
- Descriptive analytics, 100
- DeShon, R.P., 54
- Diagnostic analytics, 100
- Digital Vidya, 100
- Dijkstra’s algorithm, 173
- Dimensionality,
- curse of, 148
- intrinsic, 148
- reduction. see Dimension reduction techniques in distributional semantics,
- Dimension reduction techniques in distributional semantics
- application based literature review, 150–158
- auto-encoders, 150, 176–178
- block diagram of process, 149
- experimental analysis, 178–181
- classifiers used, 179
- datasets used, 178
- observations, 179, 180t
- techniques used, 178–179
- factor analysis (FA), 150, 175–176
- ICA, 150, 171–172
- Isomap, 150, 172–173
- KPCA, 150, 161, 165–169
- LDA, 150, 161–165
- three-class, 162, 163–165
- two-class, 162
- LLE, 150, 169–171
- overview, 148–150
- PCA, 148, 149, 150, 158–161
- SOM, 150, 173–174
- SVD, 150, 174–175
- Discover cross domain relations with GANs (DiscoGANs), 218
- Discovering data, 14
- Discovery, 94
- Discriminative modeling, generative modeling vs, 216–217
-
Documentation of data, 111, 114
- Double type of atomic vector, 126
- Downey, D., 66
- Dplyr, 116
- Droom, 297
- Drug trial datasets, 8
- Duan, Y., 223
- Dumoulin, V., 225
- DVDGAN, 226
- E-commerce market, 300
- Economist intelligence unit, 194
- E-diagnostics, 292
- EmuguCV, 247
- Encoder, 177
- Energy-based GAN, 222
- Engkvist, O., 227
- Eno, 257
- Enrichment, data, 15, 59, 78–79, 111
- Enterprise resource planning (ERP), 91–92, 93
- Enterprise(s),
- applications, big data analytics in real time for. see Big data analytics in real time
- best practices for, 41
- corporate, predictive analysis for, 204–207
- Esmaeilzadeh, H., 224
- Essentials of data wrangling,
- actions in holistic workflow
- framework, 74–78
- production data stage, 77–78
- raw data stage, 74–76
- refined data stage, 76–77
- case study, 80–84
- core profiling, types, 79–80
- individual values profiling, 80
- set-based profiling, 80
- graphical representation, 86–89
- overview, 71–72
- quantitative analysis, 84–86
- maximum number of fires, 84–85
- statistical summary, 86
- total number of fires, 85–86
- transformation tasks, 78–79
- cleansing, 79
- enriching, 78–79
- structuring, 78
- workflow framework for data projects, 72–74
- Etaati, L., 55
- ETL (extract, transform and load) techniques, 2, 21, 26–27, 34, 54, 66, 71, 117
- Euclidean distance, 161, 172, 173, 174
- European Union, COVID-19 on automotive sector, 301
- Excel, 7, 26, 27, 28, 29, 49, 55, 59–60, 61, 63, 80–81, 99, 100, 115
- Exfiltrate, 41
- Exploratory factor analysis, 175
- Exploratory modelling and forecasting, 11
- Express analytics using data wrangling process, 106
- Extract, transform and load (ETL) techniques, 2, 21, 26–27, 34, 54, 66, 71, 117
- Extruct, 99
- ‘EY Global FAAS,’ 266
- Facebook, 119, 194, 240, 247
- Face recognition, 168, 240
- Factor, data structure in R, 124–125, 131–132
- Factor analysis (FA), 150, 175–176
- factor() function, 131–132
- Fan, H., 224
- Fashion MNIST, 225
- Feature extraction in speech recognition, 169
- Feldman, S., 66
- Fields of record, 6–7
- Fisher GAN, 225
-
#FlattenTheCurve, 294
- Flexible discriminant analysis (FDA), 165
- FlexiGan, 224
- Flipkart, 4
- Floyd-Warshall shortest path algorithm, 173
- Ford, 292, 304t, 305t, 306
- Fraud detection, 240, 241f
- Frequency outliers, defined, 7–8
- Furche, T., 54
- Gaming with virtual reality experience, 246
- GANs. see Generative adversarial networks (GANs)
- Gartner, 190
- GauGAN, 227
- Gaussian kernel, 165, 166
- #GearUpForTomorrow, 294
- Geiger, A., 225
- General Motors, 293
- Generative adversarial networks (GANs),
- anatomy, 217–218
- architecture of, 217f
- areas of application, 226–228
- background, 215–217
- generative modeling vs discriminative modeling, 216–217
- overview, 214–215
- shortcomings of, 224–226
- supervised vs unsupervised learning, 215–216
- types, 218–224
- cGANs, 218–219
- DCGAN, 220–221
- InfoGANs, 223–224
- LSGANs, 222–223
- StackGANs, 222
- WGAN, 221–222
- Generative modeling vs discriminative modeling, 216–217
- Generic metadata, creation of, 6, 76
- Genetic algorithms, 246
- Genomic dataset, 194
- Gen Zers, 272
- Geodesic distance, defined, 173
- Geopandas, 98
- GeoTab, 292
- Ghodrati, S., 224
- Github, 120
- Global automobile industry, 298, 300–301
- Goharian, N., 66
- Gong, B., 226
- Goodfellow, I.J., 214, 225
- Google, 238, 247
- Google analytics, 26
- Google assistant, 236
- Google BigQuery, 99
- Google DatePrep, 115
- Google scholar, 214
- Google sheets, 99
- Google translator, 242
- Gool, L.V., 226
- Gopalan, R., 276
- “Gosurge” for surge pricing, 44
- Gottlob, G., 54
- Gradient penalty, LSGANs with, 223
- Granularity,
- of dataset, 7
- issues, refined data, 10
- Graphical representation, 86–89
- Graphs, creating, 24
- Gross value added (GVA) growth, 299t
- groupby() function, 85, 86–87
- Gschwandtner, T., 54–55
- Gulrajani, I., 225
-
Gutmann, M.U., 224
- GV, 263
- Handling, data, 35
- .head() function, 82, 83f, 85
- Heer, J., 54, 55, 81
- Hellerstein, J.M., 55
- Hero MotoCorp, 294
- Hessian LLE (HLLE), 170
- Heterogeneous data structure, 124, 125, 138–145
- dataframe, 144–145
- defined, 138
- list, 139–143
- creation, 139
- elements, accessing, 140–142
- elements, manipulating, 142
- elements, merging, 142–143
- elements, naming, 139–140
- Hidden layer(s), 176, 177, 178
- Hillel, A.B., 276
- Homogeneous data structures, 124, 125–138
- array, 136–138
- factor, 131–132
- matrix, 132–136
- assigning rows and columns names, 133
- computation, 135–136
- creation, 132–133
- elements, assessing, 134
- elements, updating, 134–135
- transposition, 136
- vectors, 125–131
- arithmetic operations, 129–130
- atomic vectors, types, 125–126
- element recycling, 130
- elements, accessing, 128–129
- nesting of, 129
- sorting of, 130–131
- using c() function, 127–128
- using colon operator, 126
- using sequence (seq) operator, 127
- Honda, 291, 301, 304t, 305t
- Hortonworks, 50
- Hotstar, 4
- Hough line transformation, 286
- Hough transform, 283
- Houthooft, R., 223
- Hsu, C.Y., 67
- Human resource, effective use of, 36
- Hyperbolic tangent kernel, 165
- #HyundaiCares, 294
- Hyundai Motor Company, 290, 293, 294, 297, 304t, 305t
- Hyundai Motor India Ltd (HMIL), 290
- Hyundai Motors, 290, 301, 306
- iAlert, 292
- IBM Cognos Analytics, 100
- ImageNet, 223, 225
- Imagenet-1k, 221
- Image processing, 173
- ML in, 246–248
- frameworks and libraries for, 246–248
- Image sharpening, 246
- Image synthesis, 226
- Image thresholding, 283
- IM (isometric mapping (Isomap)), 150, 172–173
- Independent component analysis (ICA), 150, 171–172
- India Energy Storage Alliance (IESA), 290
- Indian auto industry, suppliers network on SCM of. see Suppliers network on SCM of Indian auto industry
- Individual values profiling
- semantic constraints, 80
- syntactic constraints, 80
- Industrial revolution 4.0, 189, 197
- Industrial sector, predictive analysis for corporate enterprise applications in, 204–207
-
Industry 4.0, data wrangling in
- future directions, 119–120
- goals, 114–115
- overview, 110–111
- steps in, 111–114
- tools and techniques, 115–116
- ways for effective, 116–119
- Informatica cloud, 75
- Information, defined, 2
- Information collection and storage, 200
- Information management to valuation offerings, transition from, 195–196
- Information maximizing GANs (InfoGANs), 218, 223–224
- Information-theory concept, 223
- Information to guidance, 194–195
- Ingestion process, 75
- Integer type of atomic vector, 126
- International organization of motor vehicle manufacturers, 291
- Internet of Things (IoT),
- adoption of, 198
- applications in production industry, 197–204
- data transformation, 199–204
- inter linked devices, 199
- big data and, 190–191
- data science and, 189
- defined, 188
- revenue production, 190
- use of, 194
- Intrinsic dimensionality, 148
- Inverse perspective mapping (IPM), 276–277
- IoT. see Internet of Things (IoT)
- iPython, 24, 25
- Ishida, S., 293
- Isomap (isometric mapping), 150, 172–173
- Japan, COVID-19 on automotive sector, 301
- Japanese ATR database, 169
- Java EE, 21
- JDBC, 21, 27
- Jensen-Shannon divergence, 221
- Jia, X., 226
- Johansson, S.V., 227
- Joins, 79
- Journalism data, 67
- JPMorgan Chase, 257
- JSON, data format, 7
- JSOnline, 116
- Jupyter notebooks, 24
- Just in time (JIT) system, 310–311
- Kamenshchikov, I., 225
- Kandel, S., 54, 55, 81
- Kasica, S., 67
- Kennedy, J., 54, 81
- Kernel matrix, 167
- Kernel principal component analysis (KPCA), 150, 161, 165–169
- Kernel trick, 167, 168
- Khaleghi, B., 224
- Kia, 290, 291, 302, 304t, 305t
- Kim, N.S., 224
- Kitamura, T., 168–169
- Kivy packages, 277
- #0KMPH, 294
- Koehler, M., 66
- Kohonen, T., 173
- Konstantinou, N., 66
- Kotsias, P., 227
- KPCA (kernel principal component analysis), 150, 161, 165–169
- KPMG Worldwide, 209, 291
- Krauledat, M., 225
- Krishnaveni, M., 292
- Kuljanin, G., 54
- Kullback-Leibler divergence, 221
-
Landmark Isomap (L-Isomap), 173
- Lane detection, 277
- Langs, G., 227
- Laplacian kernel, 165, 166
- Large audiences, 32
- Large scale scene understanding (LSUN), 221
- Latent factors, 175
- LatentGAN, 227
- Lau, R.Y., 222
- LDA (linear discriminant analysis), 150, 161–165
- Leakage of data, 93–94, 101–102, 103–104
- Lean manufacturing, 311
- Learning rate decay, 174
- Learnings from data, 48
- Least Square GANs (LSGANs), 218, 222–223
- LeCun, Y., 214
- Lee, H., 222
- Legal risks, liability, and culture transformation, 267–268
- length() function, 141
- Li, H., 222
- Li, Q., 222
- Libkin, L., 54
- Libraries,
- importing, 81–82
- for ML image processing, 246–248
- Lidar, 276
- Lima, A., 168–169
- Linear dimensionality reduction techniques, 178
- Linear dimension reduction techniques, 148, 150
- Linear discriminant analysis (LDA), 150, 161–165
- Linear kernel, 165
- Line graph, 86, 87f
- List, data structure in R, 125, 139–143
- creation, 139
- elements,
- accessing, 140–142
- manipulating, 142
- merging, 142–143
- naming, 139–140
- Listening skills, 33
- list() function, 139
- Liu, K., 224
- Liu, S., 224
- Liu, Z., 226
- LLE (locally linear embedding), 150, 169–171, 172
- Loading, data, 2, 21, 26–27, 34, 54, 66, 71, 117
- Locally linear embedding (LLE), 150, 169–171, 172
- Local smoothing, 103
- Logeswaran, L., 222
- Logical type of atomic vector, 126
- Logistics Regression, disadvantages of, 162
- Loss function, least square, 222–223
- LSGANs (Least Square GANs), 218, 222–223
- Lu, W., 226
- Luk, W., 224
- Ma, L., 226
- MacAvaney, S., 66
- Machine learning (ML) for medical treatment,
- data leakage, 93–94, 101–102, 103–104, 113
- data preparation within cross-validation folds, 104
- data preprocessing performance of, 102 use of, 100–101
- data wrangling, 93–94
- enhancement of express analytics, 106
- examples, 96
-
significance of, 96
- tools and methods, 99–100
- tools for python, 96–99
- use of, 101–104
- data wrangling, stages, 94–96
- cleaning, 95
- discovery, 94
- improving, 95
- publishing, 95–96
- structuring, 95
- validation, 95
- overview, 91–92
- types, 105
- Machine learning (ML) frameworks, in image processing
- application, 236
- frameworks and libraries for, 246–248
- in image processing, 246–248
- overview, 235–236
- solution to problem using, 243–246
- anomaly detection algorithm, 244
- classification algorithms, 243, 244f
- clustering algorithms, 245
- regression algorithm, 244, 245
- reinforcement algorithms, 245, 246
- techniques, applications of, 238, 240–243
- fraud detection, 240, 241f
- Google translator, 242
- personal assistants, 238, 240f
- predictions, 238, 240f
- product recommendations, 242
- social media, 240, 241f
- videos surveillance, 243
- types, 236–238
- reinforcement learning (RL), 236, 238, 239t
- supervised learning (SL), 236– 237, 239t
- unsupervised learning (UL), 236, 237, 239t
- Magrittr, 116
- Mahindra first cull wheels, 297
- Mahindra & Mahindra, 290, 291, 302, 304t, 305t
- Malsburg, C. von der, 173
- Malware attacks, 39
- Mao, X., 222
- Map, defined, 174
- Mapping applications for City Ops teams, Uber, 46–47
- Marketplace forecasting, Uber, 47
- Markovs decision process (MDP), 279–280
- Maruti 800, 308
- Maruti Production System (MPS), 311
- Maruti Suzuki India Limited (MSIL); see also Suppliers network on SCM of Indian auto industry
- competitive dimensions, 306–307
- during COVID-19, 296–297, 302, 304t, 305t
- distributors network, 311
- logistics management, 312
- manufacturing, 310–311
- operations and SCM, 308–309
- strategies, 307–308
- suppliers network, 309–310
- Maruti Suzuki Veridical Value, 297
- Maruti Udyog Limited, 290
- MATLAB, 27
- toolbox for image processing, 247
- Matplotlib, 24, 81, 89, 116
- Matrix, data structure in R, 125, 132–136
- assigning rows and columns names, 133
- computation, 135–136
- creation, 132–133
- elements
- assessing, 134
- updating, 134–135
- transposition, 136
- matrix() function, 132
- .max( ) function, 84, 85f
-
Medical data, 67
- Medicine, 227
- Meng, J., 224
- Mescheder, L.M., 225
- Metadata, creation of, 75–76
- Metal gauge sensor, 199
- Metaxas, D.N., 225
- Metz, L., 220, 221, 226
- Miao, X., 276
- Microsoft Azure, 22
- Microsoft SQL, 21
- MidiNet, 227
- Miksch, S., 54–55
- MIR Flickr dataset, 219
- Mirza, M., 214, 218
- Mishandling of data, 39–41
- Missing data (inaccurate data), 100–101
- MNIST dataset, 219, 223
- Modelling and forecasting analysis, 11
- Monthly, quarterly cash flows, and expense management, 255
- Mp4 video format, 286
- Mroueh, Y., 225–226
- Ms Access database, 204
- MSIL. see Maruti Suzuki India Limited (MSIL)
- Multiclass classification, 243
- Multidimensional scaling (MDS), 172, 173
- Munzner, T., 67
- Murray, P., 164
- Music, 227
- MyDoom, 38
- Mysql, 204
- MySQL, 21, 100
- Nankaku, Y., 168–169
- Natural language processing (NLP), 242, 263
- Nayak, J., 293
- Nearest neighbors, 246
- Neighbourhood size, 174
- NET, 21
- Netflix, 3, 4
- Network-based attack, 40
- NetworkX, 97, 98f
- Neumayr, B., 66
- Neural language processing, 238
- Neural machine translation, 242
- Neural nets, 246
- Neural networks (NN), 176, 280
- applications, 247
- generative adversarial, 227
- Ng, H., 224
- Nguyen, M.H., 168
- Nissan, 291
- Niu, X., 224
- Noisy data,
- presence of, 101
- process of handling, 103
- Non-linear dimensionality reduction techniques, 148, 149, 150, 179
- Non-linear mapping function, 165
- Non-linear PCA, 161, 165
- Novelty detection, 168
- Nowozin, S., 225
- Numerical Python (NumPy), 23, 81, 115, 279, 285
- Nvidia, 226, 227
- Nym health, 263
- Object detection, 276
- ObjGAN, 226
- Obstacle avoidance, 283
- ODBC, 21, 27
- Odena, A., 225
- Olmos, P.M., 55, 67
- One-on-one, form of presentation, 31
- Online data analysis preparation (OLAP), 192
- Online shopping websites, 242
- OpenCV, 247, 283–284
- Open government data, 66
- OpenRefine, 115
- Optimization, data, 13
- Oracle, 21, 100
-
Original equipment manufacturers (OEMs), 292
- Orsi, G., 54
- Osindero, S., 218
- Output actions,
- at produced stage, 13–14
- at raw data stage, 6
- at refined stage, 11–12
- Ozair, S., 214
- Pandas, 22, 23–24, 25, 81, 85, 97, 116
- Pan-India automobile market, 306
- Parallel transport unfolding, 173
- PassGAN, 228
- Patil, M.D., 148
- Paton, N.W., 54
- Pattern recognition, 170, 173, 194, 236
- Paxata, 63, 64f
- Pay and receive processing, 254
- PCA (principal component analysis), 148, 149, 150, 158–161
- PepsiCo (case study), 48–50
- Performance monitoring, 36
- Perl, 80–81
- Personal assistants, 238, 240f
- Phased manufacturing program (PMP), 310
- Pie chart, 86, 87, 88f
- Pivoting, 78
- Plaisant, C., 54, 81
- Plotly, 116
- Plots, creating, 24
- Polynomial kernel, 165, 166
- Pouget-Abadie, J., 214
- Power BI, 29–30, 55
- Power query editor, 55
- Predictions, apps for, 238, 240f
- Predictive analysis for corporate enterprise, 204–207
- Predictive analytics, 100
- Prescriptive analytics, 100
- Presentation skills, 31–32
- Principal component analysis (PCA), 148, 149, 150, 158–161
- Probabilistic PCA, 161
- Production data, 12–14, 73, 74
- data optimization, 13
- output actions, 13–14
- stage actions, 77–78
- Production industry, IoT and data science applications in, 196–207
- data transformation, 199–204
- analytical input, 201–204
- cleaning and processing of data, 200–201
- information collection and storage, 200
- representing data, 201
- inter linked devices, 199
- predictive analysis for corporate enterprise, 204–207
- Product recommendations, 242
- Profiling, core, 79–80
- individual values profiling, 80
- set-based profiling, 80
- Prykhodko, O., 227
- Publishing, data, 16, 59, 95–96, 111
- Publishing skills, 32–33
- Purrr, 116
- PwC report, 42
- Python, as programming language, 22–25, 96–99, 115–116, 120
- PyTorch, 247, 279
- Qiu, G., 224
- Q-learning, 280
- Quadratic discriminant analysis (QDA), 165
- Que, Z., 224
- R, managing data structure in
- heterogeneous data structures, 138–145
- dataframe, 144–145
- defined, 138
- list, 139–143
-
homogeneous data structures, 124, 125–138
- array, 136–138
- factor, 131–132
- matrix, 132–136
- vectors, 125–131
- overview, 123–125
- Radford, A., 220, 221, 225, 226
- Radial Basis Function (RBF) kernel, 165, 166
- Random forest algorithm, 92
- Rattenbury, T., 55
- Raw data, defined, 110
- Raw data stage, 4–8, 73, 74–76
- Raw type of atomic vector, 126
- Raychaudhuri, S., 161
- Real-time business intelligence, 193
- Real-time lane detection and obstacle avoidance, 283
- Records, dataset’s, 6–7
- Recovery, data, 35
- Recycle GAN, 226
- Red-wine quality dataset, 178, 179, 180t
- Reed, Z.A., 222
- Reed gauge, 199
- Refined data, 9–12, 73, 74
- accuracy issues, 10–11
- design and preparation, 9
- granularity issues, 10
- output actions at refined stage, 11–12
- scope issues, 11
- stage actions, 76–77
- structure issues, 9
- Regression-based algorithms, 103, 244, 245
- Regularised discriminant analysis (RDA), 165
- Reinforcement algorithms, 245, 246
- Reinforcement learning (RL), 236, 238, 239t
- Relational database, 6
- ReLU activation function, 221
- Renault, 302, 304t, 305t
- Representational consistency, defined, 6
- Representation of data, 201, 207–208
- Reproducibility of data, 111, 114
- Reputation, diminished, 42
- Resende, F.G., 168–169
- Resource chain management, 206
- Response without thinking, 33
- Responsibilities as database administrator, 20, 34–37
- capacity planning, 36
- data authentication, 35
- data backup and recovery, 35
- database tuning, 36–37
- data extraction, transformation, and loading, 34
- data handling, 35
- data security, 35
- effective use of human resource, 36
- security and performance monitoring, 36
- software installation and maintenance, 34
- troubleshooting, 36
- REST, 21
- Riche, N.H., 54, 81
- Riegling, M., 48–49
- RL (reinforcement learning), 236, 238, 239t
- Robotic Process Automation (RPA), 258
- Robust KPCA, 168
- Robust PCA, 161
- Rows, in relational database, 6, 7
- R programming language, 25–26, 80–81, 116
- RStudio, 120
- Runaway effect, 264
- Russell, C., 224
- SAGAN, 225
- Saini, O., 178
-
Salimans, T., 225
- Sallinger, E., 66
- Samadi, K., 224
- Sane, S.S., 148
- Sarveniaza, A., 150
- Saxena, G.A., 173
- Scala, 27–28
- Schiele, B., 222, 226
- Schlegl, T., 227
- Schmidt-Erfurth, U., 227
- Schulman, J., 223
- Scikit-learn, 22, 25
- SciPy, 24–25
- Scipy.integrate, 24
- Scipy.linalg, 24
- Scipy.optimize, 24
- Scipy.signal, 24
- Scipy.sparse, 25
- Scipy.stats, 25
- SCM (supply chain management) of Indian auto industry. see Suppliers network on SCM of Indian auto industry
- Scope of dataset, 8
- Security, 227–228
- data, 35
- performance monitoring and, 36
- Seeböck, P., 227
- Self-driving car simulation, 281
- Self-driving technology, 246
- Self-organising maps (SOMs), 150, 173–174
- Self-service analytics, 50
- Semantic constraints, 80
- Sensors, 199
- Sequence (seq) operator, vectors using, 127
- Sercu, T., 225–226
- Service Mandi, 292
- Set-based profiling, 80
- Shah, M., 276
- Shah, M.K., 294
- Sigmoid kernel, 165, 166
- Single element vector, 125–126
- Singular value decomposition (SVD), 150, 174–175
- Siri, 236, 238
- Skills and responsibilities of data wrangler,
- case studies, 42–50
- PepsiCo, 48–50
- Uber, 42–48
- data administrators
- responsibilities, 34–37
- roles, 20, 21–22
- database administrator (DBA), role, 20, 21–22
- overview, 20
- soft skills, 31–34
- business insights, 32
- issues, 33–34
- presentation skills, 31–32
- response without thinking, 33
- speaking and listening skills, 33
- storytelling, 32
- writing/publishing skills, 32–33
- technical skills, 22–30
- Excel, 28
- MATLAB, 27
- Power BI, 29–30
- python, 22–25
- R programming language, 25–26
- Scala, 27–28
- SQL, 26–27
- Tableau, 28–29
- SL (supervised learning), 236–237, 239t
- Small intimate groups, 31
- Smart intelligence, examples of, 193
- Smart production, 194
- Smith, B., 67
- Smolley, S.P., 222
- Snore-GAN, 227
- Social attack, 40–41
- Social media using phone, 240, 241f
- Society of Indian Automobile Manufacturers (SIAM), 295, 301
-
Soft skills, of data wrangler, 31–34
- business insights, 32
- issues, 33–34
- presentation skills, 31–32
- response without thinking, 33
- speaking and listening skills, 33
- storytelling, 32
- writing/publishing skills, 32–33
- Software installation and maintenance, 34
- Solvexia, 114
- SOMs (self-organising maps), 150, 173–174
- sort() function, 130–131
- Spark, 27, 28
- Sparse KPCA, 168, 169
- Sparse PCA, 161
- Speaking and listening skills, 33
- Spectral normalization, 225
- Spectral regularization technique (SR-GAN), 224–225
- Speech recognition, 168
- Spline kernel, 165
- Splitstackshape, 116
- SQL, 26–27, 55, 117
- SQL DBA, 21
- SQLJ, 21
- Srivastava, A., 224
- SSGAN, 228
- StackGANs, 218, 222
- Statsmodel, 25
- #Stayhomestaysafe, 294
- StormWorm, 38
- Storytelling, 32
- str() function, 132, 141
- Structuring data, 15, 78, 95
- Stuart, J.M., 161
- StyleGAN, 226
- summary() function, 141–142
- Sun, Q., 226
- Supervised dimensionality reduction, 161
- Supervised learning (SL), 236–237, 239t
- Supervised machine learning algorithms, 99, 105
- Supervised vs unsupervised learning, 215–216
- Supplier on boarding and procurement, 255
- Suppliers network on SCM of Indian auto industry
- discussion, 306–312
- competitive dimensions, 306–307
- MSIL distributors network, 311
- MSIL logistics management, 312
- MSIL manufacturing, 310–311
- MSIL operations and SCM, 308–309
- MSIL strategies, 307–308
- MSIL suppliers network, 309–310
- findings, 298–306
- effect on Indian automobile industry, 301–305
- global automobile industry, 298, 300–301
- post COVID-19 recovery, 306
- worldwide economic impact of epidemic, 298, 299t
- literature review, 292–297
- methodology, 297–298
- MSIL during COVID-19, 296–297
- overview, 290–292
- prior pandemic automobile industry, 294–296
- Supply chain management (SCM) of Indian auto industry. see Suppliers network on SCM of Indian auto industry
- Surge pricing, 44–45
- Sutskever, I., 223
- Sutton, C.A., 224
- Suzuki Inc. (Japan), 307
- Suzuki Motor corporation, 290
- SVD (singular value decomposition), 150, 174–175
- Syntactic constraints, 80
-
Tableau, 28–29, 49, 50, 100
- Tabula, 61, 62f, 115
- .tail( ) function, 83, 84f
- Talend, 65, 75
- Tang, W., 224
- TanH activation function, 221
- Tata motors, 290–291, 296, 302, 304t, 305t, 306
- Tata –Nano, 308
- Technical skills, of data wrangler, 22–30
- Excel, 28
- MATLAB, 27
- Power BI, 29–30
- python, 22–25
- R programming language, 25–26
- Scala, 27–28
- SQL, 26–27
- Tableau, 28–29
- Temporal difference (TD), 280
- Temporality, 8
- Tenenbaum, J.B., 149
- Tensorflow, 247
- TensorFlow K-NN classification technique, 194
- Tesla, 292
- Test dataset, 237
- Text mining, 192
- t() function, 136
- Theano, 116
- Theft, data, 40
- Thermal imaging sensor, 199
- Tokuda, K., 168–169
- Tomer, S., 294
- Tools, data wrangling, 59–65
- Altair Monarch, 60, 61f
- Anzo, 60, 61, 62f
- basic data munging tools, 115
- cleaning and consolidating data, 100
- Datameer, 63, 64f
- Excel, 59–60
- extracting insights from data, 100
- Paxata, 63, 64f
- processing and organizing data, 99–100
- for python, 96–99, 115–116
- R tool, 116
- Tabula, 61, 62f, 115
- Talend, 65
- Trifacta, 61, 63
- Toyota, 290, 291, 294, 301, 302, 304t, 305t
- #ToyotaWithIndia, 294
- Traffic data, 66–67
- Training dataset, 237
- Transformation, data, 2, 21, 26–27, 34, 54, 63, 66, 71, 117
- Transformation tasks, in data wrangling, 78–79
- cleansing, 79
- enriching, 78–79
- structuring, 78
- Transpose of matrix, 136
- Trifacta, 49, 50, 55, 61, 63
- Trifacta wrangler, 55, 61, 66
- Troubleshooting, 36
- Trust, loss of, 42
- Tuytelaars, T., 226
- Twitter, 119, 194
- Uber (case study), 42–48
- UberPOOL, 46
- UL (unsupervised learning), 236, 237, 239t, 245
- Unions, 79
- United States, COVID-19 on automotive sector, 300–301
- Unsupervised learning (UL), 236, 237, 239t, 245
- Unsupervised machine learning algorithms, 99, 105
-
VAEs (variational autoencoders), 67, 215, 224
- Validation,
- Valkov, L., 224
- Valuation offerings, information management to, 195–196
- Value-added data system (VADA), 66
- van der Maaten, L.J.P., 148
- van Ham, F., 54, 81
- Varghese, S., 293
- Variances, defined, 159
- Variational autoencoders (VAEs), 67, 215, 224
- Vectors, data structure in R, 124, 125–131
- arithmetic operations, 129–130
- atomic vectors, types, 125–126
- element recycling, 130
- elements, accessing, 128–129
- nesting of, 129
- sorting of, 130–131
- using c() function, 127–128
- using colon operator, 126
- using sequence (seq) operator, 127
- VEEGAN, 224
- Verizon, 42
- Videos, 226
- Vidya, R., 292
- Visa exchange, 257
- Visualization,
- VLOOKUP function, 28
- Volkswagen, 293, 306
- Waldstein, S.M., 227
- Wang, L., 226
- Wang, Z., 222
- WannaCry, 38
- Warde-Farley, D., 214
- Warehouse administrator, 21
- Wasserstein distance, 221
- Wasserstein GANs (WGANs), 218, 221–222
- WebGazer, 247–248
- Websites, online shopping, 242
- Wei, X., 226
- #WePledgeToBeSafe, 294
- WGANs (Wasserstein GANs), 218, 221–222
- Wikiart dataset, 227
- Wisconsin breast cancer dataset, 178, 179, 181t
- Within-class scatter matrix, 163, 164
- Wood inspection, 173
- Workflow framework, holistic,
- actions in, 74–78
- production data stage, 77–78
- raw data stage, 74–76
- refined data stage, 76–77
- for data projects, 72–74
- World Health Organization (WHO), 294
- Wrangler edge, 61
- Wrangler enterprise, 61
- Writing skills, 32–33
- Xero, 261
- Xie, H., 222
- XML, data format, 7
- Xu, B., 214
- Xu, T., 222
- Xu, Z., 293
- Yan, X., 222
- Yates, A., 66
- Yazdanbakhsh, A., 224
- YFCC100M dataset, 219
- Yoo, H., 276
- Zaremba, W., 225
- Zen, H., 168–169
- Zeng, C., 224
- Zhang, H., 222, 225
- Zhou, F., 224
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.