Preface

I was first exposed to the term network science in 2008 when I moved to Dublin to conduct a post‐doctoral research term at the Dublin City University in partnership with Eircom. By that time, there was much more network analysis, or social network analysis. At first glance, I thought it was related to social media, such as Facebook, Twitter, LinkedIn, and Instagram, along with many others. Many people likely have this misunderstanding. When we mention social network analysis most people are directly pointed to social media, or the analysis of social interactions.

Network science involves several disciplines like social sciences, graph theory, mathematical modeling, statistics analysis, and optimization, to name a few. Assuming that everything is connected, the main goal is to solve complex problems or to understand a scenario in a unique perspective. When I say everything is connected, I mean that most of the real‐world problems can be analyzed in a network perspective, where descriptive attributes are linked to constraints or restrictions, which are linked to possible outcomes or targets, which are linked to goals and solutions, which ultimately are linked to the problems. Even traditional approaches such as predictive modeling can have a network understanding, where input or independent variables are linked to output, target, or dependent variables. How strongly or weakly are they connected to each other? How strongly or weakly are they connected to the target? Surrogate variables can be connected to the original input variables. Network metrics or centralities can turn out to be the most important and relevant predictors in supervised modeling. It can definitely be a more complex approach, but certainly give us opportunities for a more comprehensive understanding of the problem and the possible optimal solutions.

For example, in association rules, items are correlated to each other based on some specific transactions. These correlations create rules, which are symmetric. Similar correlations can be visualized as a network, where items are connected to each other upon a very particular frequency, defining therefore distinct weights for the links among them. The weighted links between items define the importance of the relationships, similar to how confidence and support define the importance of the association rules. This is a very straightforward network analysis that can be conducted in a similar approach as association rules. As in sequence analysis, if we have the time identifier, or the information about the sequence of the transactions, we can also define the direction of the links between the items, and then we can produce a similar analysis as we do in sequence association rules, but again, in a network perspective. Something that we cannot see in association rules or in sequence analysis is the strength of the weak links or the missing links. Imagine that from the association rules, we find two strong rules, Coke associated to Lays, and Pepsi associated to Lays. There is no association between Coke and Pepsi. In network analysis, we would see a “triangle” with a missing link, with a link between Coke and Lays, a link between Pepsi and Lays, but no link between Coke and Pepsi. Perhaps, this missing link indicates they are surrogate products. Speaking about Coke and Pepsi it is easy to figure that out. But what about a grocery store with dozens of thousands of items?

In business problems there is always a question to be answered, an operational task to be improved, a challenge to be overcome, an insight to be produced. Are the customers willing to purchase this specific product? How likely are they to purchase it? How do subscribers consume this specific service? Are they willing to increase or decrease their usage? How likely? How can we improve a telecommunications network due to the customers' usage? How can we improve our supply chain across all stores based on past purchases? Can we describe a specific economic scenario considering different countries, states, or cities? Can we explain cause and effect of complex events in politics, international trade relations, or immigration? Network science works nicely to describe social relationships. However, by social here you can understand almost anything. Social can be people, employees, companies, countries, equipment, products, services, governments, or a combination of them. Network science also works well as an exploratory analysis tool, clearly describing complex scenarios, particularly when relations between entities play a key role.

How do we answer business questions? We often answer questions, solve problems, or explain business scenarios by working on the data that describe that problem or scenario. Machine learning models adjust mathematical and statistical equations according to the data available, based on the data distribution, the types of the variables, their variation, and many other data aspects. Different models work better for some specific type of data, but all models require data to create, or better, to find the right correlations between the problem and the solution. The nice thing about network science is that we can create different networks upon the same set of data, which creates distinct exploratory models and then different outcomes. The way we translate the data into the nodes and links, the way we use the data to define the relationships between them, the way we weight the nodes and links, everything changes the input network and therefore the results. For instance, based on the CDRsCall Detail Records – we can define a network where the mobile phone is a node, a household is a node, or a switch is a node. The way these nodes are related also depends on the way we want to envision the network. The link can be calls and messages, physical connections of the telecommunications network, and people moving around and being handled by different cell towers. This flexibility in building different “inputs” and performing multiple exploratory models gives us more possibilities to describe and understand the problem, and for sure, more options to find viable solutions.

The first real problem I experienced in terms of network science was working in Dublin at Dublin City University in partnership with Eircom. We had a wonderful challenge to better understand the churn event at Eircom, a major telecommunications company in Ireland. We were asking ourselves if the churn event could occur as a viral event. When subscribers decide to leave, would they influence other subscribers to leave as well? In order to answer this question, we should understand the subscribers' relationships. At the end, this is what communications providers do right? They allow people to get connected to each other. Then, we got the fundamental data from carriers, calls, and text messages. This data describes in detail when and how one subscriber gets connected to another. Based on this data, we built the network, considering all subscribers, and all relationships. In addition to that, we considered the churn event over time. By doing this, we could monitor what happened when a subscriber decided to leave. What happened with their friends, relatives, co‐workers, and so on? Did they leave afterwards? We investigated a substantial amount of data, considering a reasonable timeframe, to understand the overall viral effect some subscribers could exert over the others. In addition to the traditional transactional and demographic data about the customers, we described them in terms of their network centralities (what valued them as nodes within a network), and in terms of communities (how they were grouped together based on their relationships). At the end, a classification model was trained to estimate the likelihood of each subscriber's behavior as an influencer in terms of the churn event. We also noticed that some subscribers can be influencers in one specific business event and not be for another. That means, the characteristics of the influencer subscribers differ from one business event to another. The patterns of influence in churn were different from purchasing, consuming, or product adoption.

After that experience, I had the privilege and luck to work on many projects involving network science, looking at vastly different type of problems in a variety of industries. Even though each project is unique in terms of the particular problem, or the best possible solution, or based on the timely data available, most of them search for solutions in the same space. The solutions go from traditional business demands such as avoiding churn and boosting product adoption in communications and retail, to detecting fraud in finance, insurance, communications, taxpayers, and consumer goods. This required, a search for the optimal learning path not based on the content or subject of the courses but upon the relationships of the courses created by the student enrollments to evaluate and understand the players in economic trading among government agencies. Also, to search the main actors in illicit trade to find the best delivery routing in wholesales – from depot to stores, to optimize a work force scheduling in restaurants and hotels, to find optimal routes using the public transportation systems, to understand the virus spreading and predict new outbreaks, to foresee population movements and the impact in society, to evaluate the urban mobility and create solutions for specific big events, and unexpected situations, among others. The more I work in the field of network science in different industries with distinct customers, the more I believe it is part of an overall solution for complex problems.

After many projects in the field of network science, sometimes having network analysis and network optimization as a unique solution for the problem, sometimes having these techniques as part of the solution, either by identifying new inputs to supervised or unsupervised machine learning models, or by just revealing new insights about the problem, I have decided to enhance the work I started back in 2010 when I wrote Social Network Analysis in Telecommunications, also published by Wiley. This book describes in detail most of the important algorithms in subnetwork analysis, centrality metrics, and network optimization. In addition to covering the fundamental math behind the algorithms, I tried to emphasize a holistic explanation about each algorithm, why it is used, how it can be used, the benefits of using it, and so on. Each algorithm within the books is somehow associated to a business description and a possible application to solve complex real‐world problems. Throughout the book, code examples for each algorithm, and most important, the analysis of their outcomes, are also added to complete a step‐by‐step approach that in my view makes it easier to understand each technique, how to deploy it, and some possible ways to interpret the results. At the end of the book, I shared some real‐world case studies to demonstrate how beneficial network science can be in industry, government, and society.

All code examples in this book were based on SAS Viya, a new in‐memory distributed engine for analytics. This environment is perfect for network science. Most of the network analysis and network algorithms can benefit from a distributed computing environment to execute the procedures faster and better. I have been using SAS, mostly as a customer, since 2002. I have been using the network procedures at SAS (formerly OptGraph and OptNet) since 2010. Most of the projects I have been involved with in the field of network science was before joining SAS at the end of 2015. I was a customer, a partner, or a researcher. Since the release of Viya, around 2016, I am truly amazed about how fast and reliable the environment is, and how I can benefit from it when running network analysis and network optimization algorithms. The new procedures for the Viya platform are now named as Network and OptNetwork. We are going to use them a lot throughout this book.

For me, at the end of the day, it is hard to think of a problem that cannot benefit from network science, either as a straightforward solution or as part of a more overall approach in solving complex problems. Even when I engage in an initially traditional problem, where machine learning or statistical modeling can be the solution, I always envision a network, and always search for the nodes and links. In my opinion, everything is connected. Finding the dots and connecting them, can be a great analytical approach to solving real‐world problems.

I hope you enjoy reading this book, and I anticipate that it will be somehow helpful to you.

Carlos Andre Reis Pinheiro

March 2022

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.248.159