Trojan traffic identification method and Bayes classification

Among probabilistic classification algorithms is the Bayes classification, which is based on Bayes' theorem. It predicts the instance or the class as the one that makes the posterior probability maximal. The risk for Bayes classification is that it needs enough data to estimate the joint probability density more reliably.

Given a dataset D with a size n, each instance or point x belonging to D with a dimension of m, for each Trojan traffic identification method and Bayes classification. To predict the class Trojan traffic identification method and Bayes classification of any x, we use the following formula:

Trojan traffic identification method and Bayes classification

Basing on Bayes' theorem, Trojan traffic identification method and Bayes classification is the likelihood:

Trojan traffic identification method and Bayes classification

Then we get the following new equations for predicting Trojan traffic identification method and Bayes classification for x:

Trojan traffic identification method and Bayes classification

Estimating

With new definitions to predict a class, the prior probability and its likelihood needs to be estimated.

Prior probability estimation

Given the dataset D, if the number of instances in D labeled with class Prior probability estimation is Prior probability estimation and the size of D is n, we get the estimation for the prior probability of the class Prior probability estimation as follows:

Prior probability estimation

Likelihood estimation

For numeric attributes, assuming all attributes are numeric, here is the estimation equation. One presumption is declared: each class Likelihood estimation is normally distributed around some mean Likelihood estimation with the corresponding covariance matrix Likelihood estimation. Likelihood estimation is used to estimate Likelihood estimation, Likelihood estimation for Likelihood estimation:

Likelihood estimation
Likelihood estimation
Likelihood estimation
Likelihood estimation

For categorical attributes, it can also be dealt with similarly but with minor difference.

The Bayes classification

The pseudocode of the Bayes classification algorithm is as follows:

The Bayes classification

The R implementation

The R code for the Bayes classification is listed as follows:

  1 BayesClassifier <- function(data,classes){
  2     bayes.model <- NULL
  3 
  4     data.subsets <- SplitData(data,classes)
  5     cards <- GetCardinality(data.subsets)
  6     prior.p <- GetPriorProbability(cards)
  7     means <- GetMeans(data.subsets,cards)
  8    cov.m <-GetCovarianceMatrix(data.subsets,cards,means)
  9 
 10     AddCardinality(bayes.model,cards)
 11     AddPriorProbability(bayes.model,prior.p)
 12     AddMeans(bayes.model,means)
 13     AddCovarianceMatrix(bayes.model,cov.m)
 14 
 15     return(bayes.model)
 16 }
 17 
 18 TestClassifier <- function(x){
 19     data <- GetTrainingData()
 20     classes <- GetClasses()
 21     bayes.model <- BayesClassifier(data,classes)
 22 
 23     y <- GetLabelForMaxPostProbability(bayes.model,x)
 24 
 25     return(y)
 26 }

One example is chosen to apply the Bayes classification algorithm, in the following section.

Trojan traffic identification method

A Trojan horse, which is a malicious program, surreptitiously performs its operation under the guise of a legitimate program. It has a specific pattern and unique malicious behavior (such as traffic and other operations). For example, it may obtain account information and sensitive system information for further attacks. It can also fork processes for dynamic ports, impersonate software and redirect traffic of affected services to other systems, make them available to attackers to hijack connections, intercept valuable data, and inject fake information or phishing.

Depending on the purpose of Trojans, there are many versatile types of designs for Trojans, each with a certain traffic behavior. With the ability to identify the Trojan traffic, further processing can be performed to protect information. As a result, detecting the traffic of Trojans is one of the main tasks to detect Trojans on system. The behavior of Trojans is an outlier compared to the normal software. So the classification algorithms such as the Bayesian classification algorithm can be applied to detect the outliers. Here is a diagram showing the Trojan traffic behavior:

Trojan traffic identification method

The malicious traffic behaviors include but are not limited to spoofing the source IP addresses and (short and long) term scanning the flow of the address/port that serves as the survey for successive attacks. Known Trojan traffic behaviors are used as the positive training data instances. The normal traffic behaviors are used as the negative data instances in the training dataset. These kinds of datasets are continuously collected by NGOs.

The attributes used for a dataset may include the latest DNS request, the NetBIOS name table on the host machine, ARP cache, intranet router table, socket connections, process image, system ports behavior, opened files updates, remote files updates, shell history, packet TCP/IP headers information, identification fields (IPID) of the IP header, Time To Live (TTL), and so forth. One possible attribute set for a dataset is source IP, port, target IP, target port, number of flows, number of packets, number of bytes, timestamp at certain checkpoint, and the class label for the type of detection. The DNS traffic plays an important role in the Trojans' detection too; the traffics of Trojans has certain a relation with DNS traffic.

Trojan traffic identification method

The traditional technologies for detecting a Trojan often rely on the Trojan's signature and can be deceived by dynamic ports, encrypted messages, and so on. This led to the introduction of mining technologies for the classification of Trojan traffic. The Bayesian classifier is one of the better solutions among others. The preceding diagram is one such possible structure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.204.96