Preface

Bedir TekinerdoganaÖnder BaburbLoek Cleophasb,cMark van den BrandbMehmet Akşitd     aInformation Technology Group, Wageningen University, Wageningen, The Netherlands
bEindhoven University of Technology, Eindhoven, The Netherlands
cStellenbosch University, Matieland, Republic of South Africa
dUniversity of Twente, Computer Science, Formal Methods & Tools Group, Enschede, The Netherlands

In the context of increasingly large and complex software-intensive systems, modeling and model-based approaches are widely used to tackle their development and maintenance. Cyber-physical and automotive systems are two examples of such domains, where models are central and indispensable artifacts for the corresponding software. The widespread and large-scale application of modeling, in turn, leads to a deluge of models and other related artifacts. This observation addresses the main concern of the book: how we can analyze and manage large collections of models in a scalable and efficient manner. Given that this subarea in model-driven and model-based software engineering research is in its infancy, the book aims to pioneer and promote model management and analytics for large software systems. As a focused collection of chapters on various aspects around the topic, we target researchers and practitioners, serving towards the cross-fertilization of new ideas and strengthening model management and analytics as an established research area.

Introduction

The trend of ever-increasing ubiquity of software is continuing, and even pacing up with the new advancements in areas including Industry 4.0, cloud computing, and Artificial Intelligence. Software plays a central role in those systems; as a result, growing systems and systems-of-systems naturally lead to larger and more complex software. This can be traced, for instance, in the evolution of the software in passenger cars: from hardly any software up until a few decades ago, to the current state, moving towards self-driving cars with hundreds of millions of lines of code, and an elaborate network of numerous software components in interplay.

Not just the development, but also the maintenance and in general engineering of those systems proves to be a challenge. Model-based and model-driven approaches have been introduced to tackle this challenge by employing models with higher abstraction levels, rather than the general low-level code. Indeed, model-* approaches have been successfully adopted in key industries, such as cyber-physical and automotive systems. Models corresponding to several domains and abstraction layers are used for a variety of purposes, ranging from conceptualization of those large systems to simulation, verification, and automatic generation of software.

As model-* approaches are being applied to larger problems and operational context, however, the complexity, size, and variety of the models and other related artifacts increase as well. The aspect of scalability with respect to (a few) large and complex models has been pointed out in the literature, along the lines of model comparison, merging, persistence, or transformation. On the other hand, the scalability with respect to model variety and multiplicity (i.e., dealing with a large number of possibly heterogeneous models) has so far remained mostly under the radar.

In this book, we advocate this aspect of scalability as a big challenge for the management, and therefore broader adoption, of model-* approaches for large-scale systems. We can trace these challenges in open source. There are tens of thousands of model artifacts in public repositories. In GitHub, for instance, the literature reports tens of thousands of UML models and Eclipse Modeling Framework (EMF) metamodels, with numbers increasing rapidly over the years too. On the other hand, in industry, even within single organizations with a large-scale and long history of model-* adoption, we observe a large (and increasing) number and heterogeneity of artifacts in model-* ecosystems. These include Domain-Specific Languages, the corresponding models, and transformations; all spanning across multiple domains, technologies, and organizational units. Note that there are two orthogonal factors which complicate the situation even more. First of all, the various artifacts constantly change and (co-)evolve, confirming (at least several of) Lehman's laws of software evolution. Secondly, for systems with implicit or explicit (e.g., as a Software Product Line) variability, variants can be considered another amplifying factor besides versions for the total number of model-* artifacts to manage. The scalability and management challenges are further elaborated with respect to various aspects in the individual chapters of the book. It is, however, evident that this highly upward trend of a deluge of model-* artifacts is indeed continuing. We foresee this to speed up even more, along with recent developments such as low-code platforms, intelligent techniques such as model learning and process mining gaining more popularity in the model-* world.

Having set the scene, we go on stating that analyzing models to derive relevant information using traditional model management approaches does not scale for the current situation. For other types of artifacts, such as text documents, webpages, and more recently source code, the community has earlier recognized the challenge and has for long developed efficient and scalable techniques for dealing with them. An important note is that model-* artifacts have certain distinguishing features, such as underlying graph structure, various abstract and concrete syntaxes, and tool-specific representations. This might render it difficult to directly apply techniques from other domains to model-* approaches; therefore, it should be investigated how and to what extent the relevant techniques can be transferred into the model-* technical space.

We next give a nonexhaustive list of related areas, with inspirational purposes for the modeling domain. Statistical approaches, descriptive ones for empirical analyses for discovering characteristics, patterns, and distributions, as well as predictive ones along with advanced Machine Learning/data mining techniques, can help analyze, classify, and ultimately manage large sets of model-* artifacts. The indexing, storing, searching, and retrieval in large repositories can be facilitated using information retrieval techniques adapted for the new type of artifacts. Visualization, in the broadest sense, could be another essential ingredient for understanding large and heterogeneous sets of model-* artifacts. Natural language processing would be necessary to deal with the (potentially erroneous and ambiguous) text content of those artifacts, which can be inevitable in real-world applications. Finally, the challenge with respect to the computational complexity and large data sizes can be remedied by successful utilization of distributed computing infrastructure and techniques, with an eye towards the Big Data implications in the future of model management and analytics.

Why a book on model management and analytics

There is a sizeable community of researchers and practitioners of model-* approaches. On the one hand, there are dedicated conferences such as the International Conference on Model Driven Engineering Languages and Systems (MODELS) and several ones covering different aspects of the topic under the federation Software Technologies: Applications and Foundations (STAF). There are working groups and initiatives to advance and promote model-* approaches; examples include the Model-Based Systems Engineering promoted by the International Council of Systems Engineering, and many others within the Object Management Group. However, there has been hardly any focused effort tackling the aforementioned problems of model-* approaches. As the topic is quite new, no books have yet been published on this topic. Moreover, only a handful of scientific papers and PhD theses have been published that are somewhat related to the topic, but these are too specialized for the broader public. A notable set of activities have been initiated with support from the editors of this book. We have held a local symposium in the Netherlands with high industry participation, and a dedicated international workshop, Analytics and Mining of Model Repositories (AMMoRe), co-located with MODELS in 2018. This book aims to follow up on those and to present a concentrated volume of knowledge, as well as set the agenda for novel approaches in large-scale model management and analytics.

A highly related topic is (big) data management and data analytics. The approaches of this book will be based on and partially enhance these general topics. The topic of model management and analytics however is unique and needs further elaboration, as models are very complex units of (typically graph) data with a lot of special technical and domain-specific characteristics.

We target both academics and practitioners who are interested in model-based development and the analytics of large-scale models. Academics working in the field of model-based development, data analytics, or both could gain novel insights in the topic of model analytics that goes beyond both model-based development and data analytics. The book specializes in model management and analytics and as such could be used in courses on model-based development, data analytics, and data management. Typically, the book could be used for courses in later years of the academic study, that is, the third or fourth year bachelor, Master, or doctorate study programmes. The book comprises both chapters that discuss experiences from industry and ones that are more research-oriented. Practitioners will benefit from the book by identifying the key problems, the solution approaches, and the tools that have been developed or are necessary for model management and analytics. Researchers will benefit from the book by identifying the basic theory and background, the current research topics, the related challenges, and the research directions for model management and analytics.

Given the novelty of the topic, and the fact that this book presents the first concentrated effort around it, we think that this is a pioneering book that will set the scene for the paradigm and as such will be referred to frequently by practitioners and researchers in this field.

Book outline

The book consists of three main parts: concepts and challenges in model management and analytics; methods and tools; and finally industrial applications. Chapter 1 gives a general introduction to model management and analytics, outlining the problem, challenges, and relations to certain established domains. First of all, a case is made where a deluge of model-* artifacts is explicitly observed in both industry and open source. Distinguishing those artifacts from other traditional types of data, the chapter lists several related domains for inspiration. This is performed within a reference architecture for a (big) data analytics framework. MMA is discussed with the reservation that direct applications of techniques from other domains into MMA might not be possible; certain adaptations and extensions might be necessary. This is demonstrated using the SAMOS framework as a real world-example.

Chapter 2 is a contribution by Truong Ho-Quang, Michel Chaudron, Regina Hebig, and Gregorio Robles: the team behind the seminal work of the Lindholmen UML model dataset and the analytics research around it. The authors present a reference architecture for a community infrastructure for evidence-based research in software architecture. Software architecture and design, which are often represented as models, have been popular research areas for decades, with the aim of gaining insights into, as well as improving, industrial software and system development. A point is made of the lack of large-scale empirical research on those areas while validation is performed merely based on industrial experiences and cases. The authors, based on their previous experience with model analytics and empirical research on models, discuss several aspects and challenges for the domain and propose a reference architecture as a community-wide infrastructure for evidence-based research in software architecture and design.

Model clone and pattern detection, which can be considered as prominent subdomains of MMA, are discussed in Chapter 3. Matthew Stephan and Eric J. Rapos present potentials and challenges of using model clone detectors as emergent pattern miners. As model-based software engineering approaches gain traction, the size, complexity, and prevalence of models increase. In those approaches, model analysis, and emergent pattern extraction in particular, can be used by analysts to support the software engineering life cycle. Applications areas include ensuring standard compliance and quality assurance. The authors present the underlying concepts and feasibility of their approach along with a conceptual framework called MCPM. The framework is demonstrated on Simulink models using clone detector software, with remarks on the generalizability of their approach on different types of models and tools. The demonstration is followed by a discussion of open challenges and potential benefits of the approach from both researchers' and practitioners' points of view.

In the final chapter of the first part, Chapter 4, Burak Uzun and Bedir Tekinerdogan present a domain-driven analysis of architecture reconstruction methods. The chapter outlines different approaches in the literature for the reconstruction of system architecture from various sources such as documentation, logs, and code. The focus is given on automated approaches, excluding the various manual (and more costly and less scalable) reconstruction techniques. A systematic domain-driven survey on the automated approaches is conducted. As the major outcome, the authors construct a domain model of the area along with key concepts and terms, as well as the business process model and variability model addressing the domain from complementary points of view. The presented generic knowledge and models are supported by the accompanying method for deriving concrete architecture reconstruction methods, illustrated in two cases.

Part 2 consists of five chapters covering various methods and tools for MMA. Chapter 5, by Konstantinos Barmpis, Antonio García-Domínguez, Alessandra Bagnato, and Antonin Abherve, presents the Hawk toolset for large-scale monitoring and analysis of models. The need for large-scale analysis and tool support arises from modeling practices for large and complex systems, where traditional approaches based on single-file persistence do not suffice. The authors further emphasize that existing solutions to this, based on model persistence and operating on small model fragments, can lead to other challenges for performance and I/O. They introduce their heterogeneous model indexing framework, Hawk, which has been developed and applied on a variety of scenarios. In this chapter, the authors provide an overview and high-level design of the framework along with the next steps, both specifically for Hawk and for model analytics research in general. The chapter also includes an additional demonstration and evaluation of the framework in terms of its model querying capabilities and performance, applied on real-world industrial datasets.

Chapter 6, by Aydin Kaya, Ali Seydi Keceli, Cagatay Catal, and Bedir Tekinerdogan, addresses software defect prediction in early stages of software development, noting the advantages of such early detection. Defect prediction models can indeed lead to identifying error-prone components before the testing phase and can contribute to quality of the software. The authors apply Machine Learning techniques using design-level metrics and data sampling techniques as an effective means of defect prediction. The study demonstrates a strong correlation of design-level metrics with defect probabilities, while it applies advanced ensemble methods and data sampling to yield high-accuracy defect prediction models.

Haralt Störrle addresses the structuring and visualization of large models in Chapter 7. The premise of the chapter is based on the observation that large modeling projects increasingly need better internal structuring to organize their artifacts and that bad structuring can lower the efficiency and quality of modeling. The author aims to improve the state of the art for model structuring through providing templates, examples, and best practices. This is to be achieved via a practical visual notation to describe model structure. Based on two distinct case studies from academic and industrial modeling projects with diverse underlying technologies and conceptualization, a common notation called MONO is presented to cover both modeling structures, thus serving as a validation for the approach. The resulting process and set of artifacts, i.e., the templates and notation, demonstrate the effectiveness and genericness of the approach in the two case studies and provide a basis for further application in other modeling projects.

Chapter 8, by Christopher Pietsch, Christoph Seidl, Michael Nieke, and Timo Kehrer, contributes with a comparison of delta-oriented development of model-based software product lines using DeltaEcore and SiPL. The widespread use of model-based development in embedded systems calls for variability management where customized functionality needs to be delivered to the clients. Model-based software product line engineering addresses this by explicitly modeling and managing the variability. Delta modeling is one such technique in the domain with emphasis on a core model and delta modules and transformations on top of them. While there is limited tool support for the approach, it is realized in two major tool suites, i.e., DeltaEcore and SiPL. The authors provide an overview and comparison of the capabilities of these tools. The extensive comparison and demonstration aims to help both researchers and practitioners that are interested in the topic, in terms of the state of practice, research directions, and tool adoption and use.

The OptML framework and its application in model optimization are discussed in Chapter 9, by Guner Orhan and Mehmet Akşit. The authors argue that companies need to deal with larger model bases as their assets in widespread adoption of model-driven approaches. Besides the sheer size of model bases, software engineers may additionally need to use configurations of existing models, further expanding the design space with alternatives. The OptML framework addresses this problem with the capability of computing optimal models over a number of Ecore-based models based on user-defined criteria. The framework is implemented and validated on a set of models for image processing, but provides a first attempt in the literature as a generic framework for model optimization.

The final part of the book, Part 3, tackles MMA from an industrial application perspective with two extensive industrial studies. Chapter 10, by Benny Akesson, Jozef Hooman, Jack Sleuters, and Adrian Yankov, discusses the improvement of design time performance and evolvability of systems using Domain-Specific Languages. The authors note the increasing complexity of cyber-physical systems and the arising need for variability and mass-customization along with adaptation of changing requirements and technologies. Model-Based Engineering based on Domain-Specific Languages can be used to remedy many of these challenges for developing such complex systems. The chapter elaborates several aspects of successful application of Model-Based Engineering around a real-world industry case study from the defence domain. The aspects include modularity and reuse, evolution, model quality, and generated artifacts.

Chapter 11, by Önder Babur, Aishwarya Suresh, Wilbert Alberts, Loek Cleophas, Ramon Schiffelers, and Mark van den Brand, is the final chapter of the book. The authors explore a wide set of model analytics techniques using the SAMOS framework in the context of real domain-specific model-driven engineering ecosystems in the lithography industry. Given the multidisciplinary and heterogeneous nature of the ecosystems, automated analyses are proposed as a means for managing those artifacts, in terms of getting an overview of or detecting duplication within the ecosystems. The case studies involve clone detection on data and control models within one of the ecosystems, cross-language conceptual analysis and language-level clone detection on three ecosystems, and finally architectural analysis and reconstruction on another ecosystem. The authors discuss how model analytics can be used to discover insights in model-driven engineering ecosystems (e.g., via model clone detection and architectural analysis) and opportunities such as refactoring to improve them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.38.43