Gurpreet Kaur1 and Kamaljit Singh Saini2*
1University Institute of Computing, Chandigarh University, Mohali, India
2University Institute of Engineering, Chandigarh University, Mohali, India
The evolution of the artificial intelligence (AI) has changed the 21st century. Technologically, the advancements are quicker than the predictions. With certain advancements in AI, the field of machine learning (ML) has become the trendiest in this century. ML deals with the science that creates computers, which can learn and perform activities like human beings when we fed data and information into them. These computers do not require explicit programming. In this paper, a general idea of machine leaning concepts is given. It also describes different types of machine learning methods and enlightens the differences between them. It also enlightens the applications and frameworks used with ML for analyzing data.
Keywords: Machine learning basics, types, applications, analysis, wrangling, ML in image processing, frameworks
ML is a type of AI that creates computers that work without explicit programming and have ability to learn. ML is all around us in this modern world. It works on developing computer programs, which can access datasets and execute automatically with detections and predictions. It enables machines to learn from experience continuously. Feeding more data into computer system enables them to improve the results. When trained machines come across to new datasets, these grow, develop, learn, and change by themselves [1]. The application of machine learning use concept of pattern recognition to provide reliable results.
ML deals with computer programming that can be changed when exposed to new data. Machine can learn its code. The machine is programmed once, every time it encounters some problem, it can solve the problem by its analyzing the learned code. There is no need to program it again and again. It changes its own code according to the new scenarios it discovers. It self-learns whatever has to be learnt according to provided scenarios, past experiences, from provided values and it comes up with new solutions [2, 3]. Here the question arises that “How a machine can recode its code by own?” As per the study, plenty of research has been done on the ways with the help of which machines learn by themselves.
ML process first need to input the training dataset in a particular algorithm. The training data trains the ML algorithm with known and unknown data [4]. Now to check that the trained algorithm is working properly, the algorithm is exposed to new input data. Then the results and predictions are checked. If the results are not as per expectations, then the algorithm has to train multiple times till it meet the desired result. This will enable the algorithm to learn continuously by its own and better result, which will increase accuracy percentage in output over time [5–7]. Today, both personal and professional lives are totally dependent on technology. Google assistant and Siri are the two examples. This is all because of ML and artificial intelligence [8–10].
ML has various algorithms to train machines so that they can solve a problem. Based on the approach, it can be decided that which algorithm can be used. The different means by which a machine can learn and analyze the data are supervised learning (SL), unsupervised learning (UL), and reinforcement learning (RL) [11]. Figure 11.1 elaborates the different types of ML algorithms.
SL methods require external assistance. In this type of learning, external supervision is provided to a certain activity so that it can be done correctly. With the help of training dataset’s input and responses, the SL algorithms make predictions for result of new datasets [12]. The way the machine is trained is known as supervised learning. The machine is provided with input and corresponding answers are given. The training and test datasets are given to the machine as input. The algorithm learns different types of patterns from train dataset then analyze and do prediction by applying these to test datasets.
For example, if it is to be checked that what are the parameters of raining today, the humidity and temperature should be above certain level. The wind should be in a certain direction, so if this scenario is there, it will rain. Similarly, to let understand the kids with scenario, we tell them answers and example [13–15]. If the data is structured and can be classified on some basis, then SL can be applied on it.
In case of UL, the methods learn various features by the given data. The unsupervised methods use previously learned features to identify the classification of the data when new data is introduced. Unsupervised learning is mainly used for association and clustering. For example, when a kid is taking decisions out of their own understanding or through book, etc., this type of learning would be unsupervised learning. Here the computer is only given with the inputs and computer finds the pattern or structure in it. If the computer is given with inputs regarding fruits like what is the size, color, taste but the computer is not given the name of the fruit. Then, computer groups the fruits based on given characteristics finally comes out with the output [16, 17]. When the correlation in the data or structure of the data is not known, like in case of big data, which is huge chunk of unstructured data, unsupervised learning is used to find the structure. So, it is the job of the algorithm to find the structure, on behalf of which some decision can be made [18].
In reinforcement learning, computer tries to take decisions of their own. For example, if a computer is to train to play chess, then it is not possible to train it every move, because the move can be randomly changed in the game, so what one can do is, the computer can be told that is the move is right or wrong. For example, if a new situation comes up, the kid will take actions on his own, i.e., from his past experiences, but as a parent towards the end of an action, one can tell him whether he did good or not. In that case, the kid will understand that he should do repeat the action next time for same type of scenario or not. In a temperature control system, it has to decide whether to increase or decrease the temperature [19]. So, using reinforcement learning, using different parameters like number of persons in the room, outside temperature, etc., it makes decision with its past experiences. In this type, the hit and trail concept is used where the only way to learn is past experience. Table 11.1 describes about the variation between ML techniques based on various perspectives.
As shown in Figure 11.2, Google, Bixby, Alexa, and Siri are some virtual personal assistants. Using neural language processing based algorithm, they help in searching information when asked. After activating, they can be asked for any type of information, setting schedule, calling on a number and sending commands to other phone apps for completion of the tasks. ML plays a significant role in collecting and refining information on the basis of previous experience with user [8].
GPS navigation service is used all over the world. Whenever this app is used, the central server saves our current locations with velocities to manage a map of current traffic. This helps in estimating congestion on the basis of daily traffic experience. Accordingly, one can set the route. Also, cab booking app estimates ride’s price and timing with the help of ML. Figure 11.3 shows few apps used for predictions [9].
Table 11.1 Difference between SL, UL, and RL.
Supervised learning | Unsupervised learning | Reinforcement learning | |
Introduction | In this, external supervision is provided with the help of training data, to a certain activity so that it can be done correctly. | The unsupervised methods use previously learned features to identify the classification of the data when new data is introduced. | In reinforcement learning, computer tries to take decisions of their own. |
Deals with problems related to | Regression problems and classification problems. | Problems which require clusters and problems related to anomaly detection. | The problems using hit and trail concept where the only way to take decision is the experience. |
Required data type | Labeled data | Unlabeled data | No predefined data |
Training requirements | Need external Supervision | No external supervision is required | No external supervision is required |
Aim | Forecast Outcome | Discover underlying patterns | Understand a sequence of actions |
Approach | Map labeled input to known output | Understand patterns and discover output | Follow trial and error method |
Algorithms Names | Linear Regression, Support Vector Machine, Random Forest | C-Means, K-Means, a priori | SARSA, Q-Learning |
Applications | Forecast Sales, Risk Evaluation | Anomaly Detection, Recommendation System | Gaming, Self-driving cars |
Social media utilizes machine learning for user and their own benefit. By understanding from experience, Facebook notices your connection with people, interests, profiles you often visit etc. then it suggests you the people who can be your friends [9]. So applications like face recognition and people you may know are very complicated at backend but at front end, these seems very simple application of ML [10]. Figure 11.4 is an example of using social media through mobile phone.
Fraud detection is an important and necessary application of ML. The number of frauds are increasing day by day due to more payment channels like numerous wallets, credit/debit cards tec. Also, the number of criminals have become proficient at searching loopholes. When a person performs some transaction, the ML method search profile for suspicious patterns. These kinds of problems are classification problems in machine learning [10].
Gone are the days when it was difficult to communicate in areas having other than native language. Figure 11.6 show icon of Google translator. Google’s Neural Machine Translation is a machine learning translator which uses Natural Language processing and works on various languages and dictionaries. This ML application is mostly used application [10].
Online shopping websites recommends items those somehow matches with customer’s taste. Websites or apps are able to do so using ML. Based on past experience of site visiting, product selection, brand preferences etc., the product recommendation is done [9, 10] (refer Figure 11.7).
It is quite difficult for a single person to monitor multiple video cameras. So, computers are trained to make this job easy. Video Surveillance is an application of artificial intelligence that detect crime before happening. By tracking unusual activities, like stumbling, meaningless standing of someone for a long time etc., the system alerts the human attendants to avoid mishaps. This task is performed actually with the help of ML at backend [10] (refer Figure 11.8).
The data science problems can be categories in five ways which can be understood by following five questions given in the diagram.
These types of algorithms classify a record. We can use these for a question with limited count of answers. If the problem wants an answer of first type of question in Figure 11.9, for example, “Is it cold,” then classification algorithms are used. It works for questions having certain number of answers like true/false, yes/no, or maybe. The first question in the diagram has two choices, so it is called two-class classification, and if the question has more than two choices then it is called multiclass classification [20].
This type of algorithm alerts for change in some particular pattern when analyze it. So, if the problem is to analyze unusual happening and where one wants to find anomaly or odd one out, then Anomaly Detection Algorithms are used.
In Figure 11.10, there is a pattern of all blue persons, but when one red man comes in between, which can be called as anomaly, the algorithm will flag that person because he was someone who was not expected [21]. In real life, credit card companies use these anomaly detection algorithms to flag any transaction, which is not usual as per the company’s transaction history and put message on the registered number to confirm that the transaction is done by authenticated person.
Regression analysis investigates relationship in an independent variable(s) and dependent variable. Regression algorithms can be used to calculate a continuous value such as weight or salary. These algorithms fall in supervised learning category. These are used to calculate numeric values using formulas. In these types of algorithms, we deal with questions like “what should be the number of hours one should put in to get promotion?” i.e., the problems where we want a numeric value [12]. There are different models with regression analysis. The most important among all regression-based algorithms are linear and logistic regressions.
Clustering algorithm helps to understand the structure of a dataset. These algorithms separate the data into groups or clusters, to ease out the interpretation of the data. Data organization helps in prediction of behavior of some event. So, when the structure behind a dataset is to find out, then clustering algorithms are used [21] (refer Figure 11.11).
In unsupervised learning, where one tries to establish a structure from unstructured data, clustering algorithms are used. If one feeds data to computer, then applies clustering algorithm on it, it categories the data into groups A, B, C on behalf of which one can make decision that what he can do with this data.
This type of algorithms deal with the problems where lots of inputs given to machine and we want to take some decision on the basis of past experiences. These algorithms were designed as to how brains of responds to punishments and reward, they learn from past results and then decide on next action. They are good for the systems, which require small decision making without human assistance.
These algorithms analyze the dataset using trial and error method and predict the output with higher rewards. The three main components used in reinforcement learning are the agent, environment and actions. Here the agent is a learning machine, the environment means the conditions with which the agent interacts and finally with past experience and predicted data, the agent makes a decision and performs certain action [19]. Table 11.1 summarizes the difference between the 3 types of ML techniques on the basis of different criteria.
Computer vision is a field where machines can recognize videos and images. The core of this field is image processing. Image processing is a technology that can process the images, analyze them, and can extract the meaning details from these images. This field is used now a days in several areas for various purposes like pattern recognition, visualization, segmentation, classification etc. Image processing can be applied using two methods-analogue image processing and digital image processing. The former method is used for hard copy images. For example- scanning printouts. The latter is used to manipulate the digital images to extract meaningful information about them. ML and deep learning-based techniques are becoming more popular for image processing. These techniques interpret images like human brain. Some examples of image processing using ML are biometric authentication, gaming with virtual reality experience, image sharpening, self-driving technology, etc. Images are to be processed to be more suitable for using them as input. For example, images are to be converted from PNG or JPEG to byte data or array of pixels form for neural networks. So here, computer vision term is to generate ideal data sets for ML techniques after processing and manipulating images. For example, to predict an image is of a cat or a dog. For this, collection of cat and dog images is made and processed to extract features to be used by the ML techniques to have prediction. Some popular techniques for this purpose are—neural nets, genetic algorithms, genetic algorithms, nearest neighbors, decision trees, etc.
Figure 11.12 shows that the ML algorithms learn from training data with specific parameters, then take predictions for unseen data.
Among plenty of existing programming languages, developer preferably use python for ML applications. However, other languages can also be used which are suitable for particular use case. The frameworks used for various ML image processing applications are [22]:
ML is a subclass of AI and is one of the most powerful technology now. It is a tool to turn the information into knowledge. The ample data produced in last 50 years is useless till we analyze and find hidden patterns from it. ML uses data and results to predict the rules behind a problem. This paper gives an overview of ML basics, types of algorithms and applications. This paper includes some open source libraries which are utilized for preprocessing, analyzing, and extracting the details from the images with the help of ML. Although the paper is not resolving this substantial concept, hopefully it clears the basic concepts and provides useful information.
3.147.63.199