8
Modified Cross-Sell Model for Telecom Service Providers Using Data Mining Techniques

K. Ramya Laxmi1*, Sumit Srivastava2, K. Madhuravani1, S. Pallavi1 and Omprakash Dewangan3

1Department of CSE, Sreyas Institute of Engineering and Technology, Nagole, Hyderabad, India

2Dept. of Computer Science & Engineering Birla Institute of Technology, Mesra, Ranchi, India

3CSE, Kalinga University, Naya Raipur, India

Abstract

Intensified competition and frequent shifting of the customer base for fixed-line telecom service providers, in recent years, has increased the necessity for better targeting and segmenting prospects and customers for cross-selling and up-sell of products and services. Telecom service providers now know and understand that old-fashioned marketing is no longer the option because of the abysmally low hit rates in the targeting of customers and the consequently low Return on Investment. Decision-makers in most fixed-line telecom operators are now of the view that better and accurate targeting of customers is only possible with accurate predictive analytics and data mining. A logistic regression algorithm has been used in this case study to identify those customers with the highest propensity to buy new products and services.

Keywords: Cross-sell model, data mining techniques, logistic regression algorithm

8.1 Introduction

A gold mine of the fixed-line telecom companies is their customer base. In the region across Asia Pacific, the telecom as a sector has witnessed dramatic changes in the past 15 years, owing to improvement in technology and socio-economic conditions. As a result, there has been a manifold increase in the customer base of the telecom providers [1].

In the past, the fixed-line telecom operators, particularly in the Asian region, rarely engaged in marketing activities to manage customer relationships. With the emergence of many new market players in the past decade and a half, catering to diversified bouquets and offerings, the need for customer focus has increased greatly. Intensifying competition and a multitude of choices have also resulted in frequent shifting of the customer base in the past few years [2, 3].

By better targeting and segmenting prospects and customers, fixed-line telecom operators can:

  • Identify more sales-ready prospects.
  • We are enhancing customer relationships and profitability.
  • Capitalize on Marketing Return on Investment.
  • Optimizes marketing campaign and performance
  • Decision-makers in most fixed-line telecom operators are now of the view that all of the above is only possible with accurate predictive analytics and data mining [4, 5].

This work analyzes the problem of predicting students’ academic performance, motivation, classroom management, and interaction, etc., that is increasingly investigated within the Educational Data Mining literature.

The proposed system having the following steps [6, 7],

  • image Dataset
  • image Pre-processing
    • ◦ Splitting
    • ◦ Data Cleaning
    • ◦ Data Conversion
  • image Data Mining Feature Extraction
    • ◦ Itemset
    • ◦ Frequent Itemset
    • ◦ Closed Frequent Itemset
    • ◦ Support
    • ◦ Confidence
    • ◦ Lift
  • image Ranking
    • ◦ Entropy
  • image Performance Analysis Training
    • ◦ DH-DLNN
  • image Testing
    • ◦ K-Fold Testing

1. Dataset

Data collection tools in the study include a self-structure questionnaire covering all the dimensions related to use of, increased access, knowledge building, learning, performance, motivation, classroom management and interaction, collaborative learning, and satisfaction [8].

Pre-processing: The first step in the proposed system is pre-processing of the dataset. Analyzing data must be done so that no misleading results are obtained [9].

Splitting: This is the first step in pre-processing phase. Splitting is done to part the values into words.

Data Cleaning: After that, unwanted words are removed from the dataset.

Data Conversion: Dataset having the attribute values in string format. The system can process only the numerical values. So that only, in the dataset the string will be converted into numerical values for corresponding strings [10].

2. Data Mining Feature Extraction

After pre-processing, the data mining features such as item set, frequent itemset, closed frequent itemset, support, confidence, lift are extracted from the dataset [11].

3. Ranking

After that, the Ranking will be calculated by using the features such as support, confidence, and lift with the help of the Entropy technique [12].

4. Performance Analysis Training

Finally, the performance will be analyzed by using the DH-DLNN algorithm. This algorithm trains the dataset depending upon the ranked features. Here the weight value was optimized using the Deer Hunting Optimization Algorithm to reduce the back propagation problem in the Artificial Neural Network algorithm [13–15].

5. Testing

After completion of the training process, the testing will be done. In this, 80% of the data will be given to training, and 20% of the data will be given for testing [16].

8.2 Literature Review

This chapter compares growth of the telecom industry, and the report is given by the ASA & Associates regarding Telecom Sector. Over the last few decades, significant expansion of the telecom sector was constantly increasing network coverage and catalysts for the growth in subscriber base. The objective of the telecom industry, according to growth story and potential, served new players in the industry. Top 10 risks in telecommunications 2014, Ernst & Young [17]. Predictive analytics for Telecommunications service providers, RED giant. Inc. Madison, WI 53713 USA, February 2012, the service provider has responsible for individual preferred the most disarble segments for economic growth [18].

The present paper ‘Predictive Analytics: A Game-Changer for Telcos has explained the easy profit-making business and how revenue can be enhanced. There are many challenges to increased the number of subscribers; Modified cross-selling activities help them a lot to generate the revenue.

In the paper, telecom companies tap analytics for Growth WIPRO. Telecom service providers across the world face an enormous challenge. Analytics can help provide a solution by monetizing the huge data pools. Telecom service providers are looking at how best they can bundle their services and products and improve revenues and profitability (Knowledge@ Wharton—Wipro August 2014).

Determining the next offer for your customers using sophisticated analysis has been given Cross-Sell and Up-Sell for Telecommunications SAS. SAS integrates seamlessly with the other SAS intelligence solutions. OS upsell and cross-sell analysis are used in customer retention strategies for the rapid growth in the telecommunication industry.

In logical literature, the author [19] examines a few activities that attempt to classify the understudies to predict their last grade based on the highlights extracted from the recorded details in the insightful electronic frameworks. A variety of different classification models contributed to crucial progress in the implementation of specification by measuring the vectors of the component. The analysis directions of the developer through the Data-Mining rehearsals require discovering viable ways to provide the Managers of Specialized Education Institutions with enough knowledge to prepare a better explanation in a specific timeframe, which was in the past inflexible or impossible, taking into account huge datasets and prior techniques. Hence, the point is to advance an approach to comprehend the understudies’ conclusions, fulfillments, and dissatisfaction in every component of the instructive cycle, and to anticipate their inclination in specific fields of study, the desire to continue with preparation, the advanced education failure, and to have an accurate connection among their perspective and the specifications of the labor market. Perhaps the fascinating data-mining initiatives in the informative sector are seen in the current section. The developer applies his ideas and applications to informative problems using specific data-mining techniques [20].

Information gives power in some genuine settings empowering and encouraging the protection of important legacy, new picking up, tackling mind-boggling issues, making center capabilities, and starting new circumstances for the two people and associations now and later [21]. The tremendous measures of information in data sets, which contain huge quantities of records, ascribes that while investigatomh to find helpful data and information, it makes manual examination unreasonable. All these variables show the requirement for shrewd and computerized information investigation procedures, which may find helpful information from information. Information revelation in data sets and Data mining has accordingly become critical apparatuses in understanding the target of insightful and robotized information examination [22, 23].

Data mining is a painstakingly arranged utilization of measurable and machine learning strategies and devices through the space of scientific procedures, which is considered as a cycle of choosing what will be generally helpful, promising, and uncovering. A definite survey of Data-Mining instruments and their applications can be found in, and Extensively, the significant undertakings of Data mining are prescient and expressive errands under disclosure situated Data-Mining framework [24, 25].

Data mining [26]:

Data mining is the center of the information disclosure measure, including the deducing of calculations that investigate the information, build up the model and find already obscure examples. Data mining is a multidisciplinary field that joins insights, AI, computerized reasoning, and data set innovation to extricate elevated level information from certifiable informational collections. Data mining includes deciding examples from or fitting models to noticed information. It is a complex information logical technique that concentrates upon investigation and develops new experiences for supporting dynamic. This separated data helps recognize patterns, frame a forecast or characterization model, and in summing up a data set. Data-Mining Techniques [27, 28]:

The Data-Mining procedures include:

  • Classification/Regression: the revelation of a model or capacity that guides objects into predefined classes (grouping) or reasonable qualities (relapse). The model/work is processed on a preparation set (regulated learning).
  • Grouping: acknowledgment of the restricted structure of classification systems or categories for the representation of knowledge.
  • Highlight: seeking a progressive description for a subsection of details, such as the approximation of breakdown or affiliate guidelines and the use of multivariate analysis method.
  • Constraint Designing: seeking a local design that represents the necessary elements among variables or between the estimation methods of a variable in an informative set or part of a data set.
  • Sequential Patterns: disclosure of incessant aftereffects in an assortment of arrangements (grouping information base), each speaking to a bunch of occasions happening at ensuing occasions. The requesting of the occasions in the aftereffects is pertinent.
  • Update and Variance Identification: identifying the key improvements to the details from recent calculated or controlled estimations.

8.3 Methodology and Implementation

8.3.1 Selection of the Independent Variables

In most cross-sell or up-sell predictive modeling situations where either Logistic regression or some other competing data mining paradigm is the tool of choice, the analyst has several independent variables to choose from. These variables could either be there in the database or can be created from the existing variable list as newly derived variables [29].

Prediction of the dependent variable is made using Logistic regression variable on the basis of continuous and/or categorical independents and to determine the percent of the variance, and to rank the relative importance of explanatory/independent variables [30].

The simplest logistic regression equation is represented as Equation

(8.1)image

The output from the PROC LOGISTIC program contains different tools for applying regression models to a dataset and assesses its results. The statistical significance of each of the independent variables, informs about both the overall fit of the model. Since cross-sell and up-sell efforts have an economic component, i.e., what it will cost to the X-cell product to a client and parameters, odds ratio makes it much easier for an cross-sell and up-sell effort to grasp the impact of parameters. The confusion matrix that PROC LOGISTIC generates provides an idea about model accuracy.

Model Process Life Cycle

For this project, we did not have real-time data, so we went ahead and created dummy data with dummy variables while incorporating real-time logic. The variables required for this model is decided based on extensive research like going through a lot of article and research paper, brainstorming with peers and from own experience is represented in Figure 8.1 [31, 32].

Two types of variables created for the modeling are:

  • Dependent/response variable:
    • ◦ Propensity Buy—It has two values 1,0.
      • ∙ 1 indicates Customer who will buy new products (no. of products >1)
      • ∙ 0 indicates Customer who will not buy new products (no. of products >1)
  • Independent variables:
    • ◦ The lists of independent variables are created based on extensive research and experience. These variables broadly fall under the following category:
      Schematic illustration of the process life cycle.

      Figure 8.1 Process life cycle.

      • ∙ Basic customer information
      • ∙ Demography
      • ∙ Product usage
      • ∙ Payment behavior
      • ∙ Complaint variables.
  • Data Exploration
    This stage usually starts with data preparation. This involves cleaning data, data transformations, selecting subsets of records, etc. Then, depending on the nature of the analytic problem, this first stage of the process of data mining may involve anywhere between simple or elaborate methods that are implemented in order to identify the most relevant variables. This is also used to determine the complexity and/or the general nature of models that can be taken into account in the next stage.
  • Sampling
    Data sampling is an arithmetic and statistical data mining analysis technique used to select, manipulate and analyze a representative subset of data points to identify patterns and trends in the data set being examined as a whole.
    • image Logistic regression—Using variables short-listed in the above step, a first-cut logistic regression model is developed. The model is developed in a step-wise fashion, introducing the significant variables one-by-one into the model in the order of their significance. The model will retain only those variables that have a significant effect on the probability to buy, controlling for the effects of all other variables in the model.
  • Confusion Matrices
  • Misclassification rate: The misclassification rate calculates the proportion of an observation being allocated to the incorrect group. It is calculated as follows and it is represented in Figure 8.2.
  • (False Positive + False Negative) /Total Number of classification
Schematic illustration of the confusion matrices.

Figure 8.2 Confusion matrices.

8.4 Data Partitioning

Data partitioning provides mutually exclusive training and validation datasets and helps in validating the built model. Keeping industry best practices in mind, the data has been partitioned into Training and Validation on a 70:30 basis. The buyer and non-buyer counts in the training and validation datasets show approximately similar buyer count percentages (Table 8.1).

8.4.1 Interpreting the Results of Logistic Regression Model

The results of logistic regression on the training data set reveal that, out of the selected independent variables, seven variables are found to be significantly related to the Customer’s propensity to be cross-sold or up-sold (Table 8.2).

Table 8.1 Buyer counts.

CategoryTrainingValidation
FrequencyPercentageFrequencyPercentage
Buyer99614.2340213.40
Non-buyer6,00485.772,59886.60
Total7,0001003,000100

Table 8.2 Analysis of maximum likelihood estimates.

ParameterDFEstimateStandard errorWald chi-SquarePr >chiSq
Intercept1−7.5185275.70.00070.9782
Number of late payments1−0.56110.0309329.8319<.0001
Proportion of bill paid15.79020.3353298.2068<.0001
Location/Geography10.43480.083527.1011<.0001
Mean bill complaint closure time1−1.37710.0967202.9235<.0001
(Repair time/Assurance time) for bill complaint1−3.46870.449159.6525<.0001
(Repair time/Assurance time) for service complaint1−4.06330.412397.1184<.0001
(Repair time/Assurance time) for technical complaint1−0.27960.09069.52540.0020

8.5 Conclusions

The model has been very successful in classifying the best prospects from among the sample dataset of customers for cross-selling and up-sell campaigns. The expected rate of purchase for these prospects is approximately seven times more than the overall expected rate of purchase in the sample data. The predicted probability that these prospects would respond to a dedicated campaign is approximately 0.97, which means that 97 out of 100 prospects are likely to respond to a campaign positively. Most of the parameters that influence the Customer’s propensity to purchase belong to the category of “Customer experience” with the service provider.

References

1. Srivastava, R., Bangle, J., Somaiya, K.J., Role of Competition in Growing Markets: Telecom Sector. Indian J. Market., XXXVI, 9, 50–62, September 2006.

2. Sinha, S.K. and Wagh, A., Analyzing Growth of Cellular Telecom Sector and Understanding Consumer’s Preferences and Choices on the Use of Cell phone. Indian J. Market., VII, 9, 39–47, September 2008.

3. Anderson, J., Developing a route to market strategy for mobile communications in rural-India, Int. J. Emerg. Mark., 3, 2, 22, 2008.

4. Banumathy, S. and Kalaivani, S., Customers’ Attitude Towards Cell phone in Communication System. Indian J. Market., VI, 3, 129–136, March 2006.

5. Shankar, R., Innovation in the Indian Telecom Industry, IJBARR and Banglore, India, Feb 2006.

6. Piatetsky-Shapiro, G. and Masand, B., Estimating campaign benefits and modeling lift. Proceedings of KDD-99 Conference, ACM Press, 1999.

7. Larose, D.T., Data mining methods and models, A John Wiley & Sons, Inc. Publication, Hoboken, NJ, USA, 2006.

8. Rusu, L. and Breşfelean, V.P., Management prototype for universities. Annals of the Tiberiu Popoviciu Seminar, Supplement: International Workshop in Collaborative Systems, vol. 4, Mediamira Science Publisher, Cluj-Napoca, Romania, pp. 287–295, 2006.

9. Shahiri, A.M. and Husain, W., A review on predicting student’s performance using data mining techniques. Proc. Comput. Sci., 72, 414–422, 2015.

10. Sinha, A.P. and Zhao, H., Incorporating domain knowledge into data mining classifiers: An application in indirect lending. Decis. Support Syst., 46, 1, 287–299, 2008.

11. Sørebø, Ø., Halvari, H., Gulli, V.F., Kristiansen, R., The role of self-determination theory in explaining teachers’ motivation to continue to use e-learning technology. Comput. Educ., 53, 1177–1187, 2009.

12. Tomasevic, N., Gvozdenovic, N., Vranes, S., An overview and comparison of supervised data mining techniques for student exam performance prediction. Comput. Educ., 143, 103676, 2020.

13. Tso, G.K.F. and Yau, K.K.W., Predicting electricity energy consumption: A comparison of regression analysis, decision tree and neural networks. Energy, 32, 1761–1768, 2007.

14. Universitatea Babes-Bolyai Cluj-Napoca, Romania. Programul Strategic al Universitatii Babes-Bolyai (2007–2011), Nr.11.366, 1 August 2006.

15. Vandamme, J.P., Meskens, N., Superby., J.F., Predicting Academic Performance by Data Mining Methods. Educ. Econ., 15, 4, 405–419, 2007.

16. Vanderlinde, R., Aesaert, K., van Braak, J., Measuring ICT use and contributing conditions in primary schools. Br. J. Educ. Technol., 46, 5, 1056–1063, 2015.

17. Vidya, Y. and Shemimol, B., Secured Friending in Proximity-based Mobile Social Network. J. Excell. Comput. Sci. Eng., 1, 2, 1–10, 2015.

18. Waheed, H., Hassan, S., Aljohani, N.R., Hardman, J., Alelyani, S., Nawaz, R., Predicting academic performance of students from VLE big data using deep learning models. Comput. Hum. Behav., 104, 106189, 2020.

19. Wan, S. and Lei, T.C., A knowledge-based decision support system to analyze the debris-flow problems at Chen-Yu-Lan River, Taiwan. Knowledge-Based System, 22, 8, 580–588, 2009.

20. Wang, H. and Wang, S., A knowledge management approach to data mining process for business intelligence. Ind. Manage. Data Syst., 108, 5, 622–634, 2008.

21. Witten, I.H. and Frank, E., Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed., Morgan Kaufmann series in Data Management Systems, Elsevier Inc., New Zealand, 2005.

22. Xu, J., Moon, K.H., Van Der Schaar, M., A machine learning approach for tracking and predicting student performance in degree programs. IEEE J. Sel. Top. Signal Process., 11, 5, 742–753, 2017.

23. Yadav, R.S., Application of hybrid clustering methods for student perfor mance evaluation. Int. J. Inf. Technol., 21, 1–8, 2018.

24. Yousafzai, B.K., Hayat, M., Afzal, S., Application of machine learning and data mining in predicting the performance of intermediate and secondary education level student. Educ. Inf. Technol., 25, 1–21, 2020.

25. Zughoul, O., Momani, F., Almasri, O.H., Zaidan, B.B., Alsalem, M.A., Albahri, O.S., Hashim, M., Comprehensive insights into the criteria of student performance in various educational domains. IEEE Access, 6, 73245–73264, 2018.

26. Pathak, S., Raja, R., Sharma, V., Ambala, S., ICT Utilization and Improving Student Performance in Higher Education. Int. J. Recent Technol. Eng. (IJRTE), 8, 2, 5120–5124, July 2019.

27. Laxmikant Tiwari, R., Vaibhav Sharma, R., Miri, R., Adaptive Neuro-Fuzzy Inference System Based Fusion Of Medical Image. Int. J. Res. Electron. Comput. Eng., 7, 2, 2086–2091.

28. Sumati Pathak, R., Vaibhav Sharma, R., Ramya Laxmi, K., A Framework Of ICT Implementation On Higher Educational Institution With Data Mining Approach. Eur. J. Eng. Res. Sci, 4.

29. Sumati Pathak, R. and Vaibhav Sharma, R., The Impact of ICT in Higher Education. IJRECE, 7, 1, 130–145, January–March, 2019.

30. Raja, R., Kumar, S., Rashid, Md., Color Object Detection Based Image Retrieval using ROI Segmentation with Multi-Feature Method. Wirel. Pers. Commun. Springer J., 5, 1–24, https://doi.org/10.1007/s11277-019-07021-6.

31. Raja, R., Shishir Sinha, T., Patra, R.K., Tiwari, S., Physiological Trait Based Biometrical Authentication of Human-Face Using LGXP and ANN Techniques. Int. J. Inf. Comput. Secur., 10, 2/3, 303–320, 2018.

32. Raja, R., Patra, R.K., Sinha, T.S., Extraction of Features from Dummy face for improving Biometrical Authentication of Human. Int. J. Lumin. Appl., 7, 3–4, Article 259, 507–512, 2017.

33. Raja, R., Sinha, T.S., Dubey, R.P., Soft Computing and LGXP Techniques for Ear Authentication using Progressive Switching Pattern. Int. J. Eng. Future Technol., 2, 2, 66–86, 2016.

  1. *Corresponding author: [email protected]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.119.229