Types of data sources

In a data-intensive application, knowing the source of your data is one of the most important things. Understanding the relevance of the data you are collecting for processing and analysis will help you get insights into your data quicker.

Data for your processing needs can, and usually will, originate from a variety of sources. All different sources will usually provide different types of information that you can augment or relate to each other in a manner to suit your business needs.

Taking our example on intrusion and vulnerability detection, you can collect data from system and application logs. You also need to collect data from services that map IP Address to Domain names so that you can enrich your information and check the domains against the blacklisted domains, for example. You need to collect vulnerability information about different products from, let's say, national vulnerability database, so that you can try to plug the gaps. Thus data is originating from both internal and external sources.

Transactional data is the first important data source for any organization. It consists of detailed and structured information that captures key transactional characteristics. Transactional data almost always resides in a relational database.

User Data and Personnel Data is another piece of data that usually plays a vital role in the preparation of business use cases as they are mostly targeted towards a specific set of users.

Social and demographic data about the targeted users comes next in line as that kind of information helps you narrow down your processing algorithm.

Other sources could be public and private surveys that target the potential end users, unstructured data such as text messages, emails, or photos, or publicly-available data.

No matter what type of data source it is, you need to be able to consume it and the application should be designed such that there is minimum effort and zero disruption to existing collectors.

Let's now dive into designing a data-collection system from scratch. The idea is to give you a chance to learn what we need to be aware of while designing such a system. As we will see in later sections, it is not always required to build your own data collectors. Going through the process of learning how to build a data-collection system will give you enough insight into when and why you should consider building such a system, and when it is better to use a COTS solution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.102.189