• News
  • Security
  • Products
  • About Seqrite
Seqrite Blog Blog
  • News
  • Security
  • Products
  • About Seqrite
Home  /  Endpoint Security • Security  /  Machine learning approach for advanced threat hunting
02 February 2018

Machine learning approach for advanced threat hunting

Written by Ajay Shinde
Ajay Shinde
Endpoint Security, Security
  • 5
    Shares
Estimated reading time: 6 minutes

In today’s fast-changing world, the cyber threat landscape is getting increasingly complex and signature-based systems are falling behind to protect endpoints. All major security solutions are built with layered security models to protect endpoints from today’s advanced threats. Machine learning-based detection is also becoming an inevitable component of these layered security models. In this post, we are going to discuss how Quick Heal is incorporating machine learning (ML) in its products to protect endpoints.

Use of ML or AI (artificial intelligence) has already been embraced by the security industry. We, at Quick Heal, are using ML for different use cases. One of them is an initial layer of our multi-level defense. The main objective of this layer is to determine the suspiciousness of a given file or sample. If it finds the file as suspicious, then the file will be scrutinized with the next layers. This filtering reduces the load on the other security layers.

Here, we are going to discuss how we use static analysis of PE files using ML to filter out the traffic sent to the next security layers.

Machine learning overview

In machine learning, we try to give computers the ability to learn from data instead of programmed explicitly. In our problem statement, we can use the power of ML to make a model that is trained on the known data of clean and malicious files and that can further predict the nature of unknown sample files. The key requirement for building ML models is to have data for training. And it must be accurately labeled otherwise results and predictions can be misleading.

The process starts with organizing a trained set by collecting a large number of clean and malicious samples and labeling them. We extract features from these samples (for our models we extracted features from PE headers of all the collected samples). Then the model is trained by providing these feature matrices to it – as shown in the block diagram below.

Machine Learning Training Cycle
Fig 1. Machine Learning Training Cycle

A feature array of an unknown sample is provided to this trained model. It returns a confidence score for that sample suggesting whether this file is similar to a clean or malicious set.

ML Prediction stage
Fig 2. ML prediction stage

 

Feature selection

A feature is an individual measurable property or characteristic of the things being observed. For example, if we want to classify cats and dogs we can take features like “type of fur”, “loves the company of others or loves to stay alone”, “barks or not”,  “number of legs”, “very sharp claws”, etc.

Selecting good features is one of the most important steps in training any ML model. As shown in the above example, if we select “number of legs” as a feature for training, it won’t be possible to classify that both dogs and cats have the same number of legs. But, if we take “bark or not”, it would be a good classifier as dogs bark and cats don’t. Also “sharpness of claws” would be a good feature.

On similar lines, to classify malicious samples from clean ones, we need to have features. So, we start with deriving pe file header attributes which need fewer computations to extract. We have some Boolean kind features like “EntryPoint present in First Section”, “Resource present” or “Section has a special character in its name”. And some count or value-based like “NumberOfSections”, “File TimeDateStamp”, etc. Some of these features are so strong that many times they act as good separator for malicious and benign samples. We will try to explain this by following two examples. As shown in the following graph 1.0, in most of the cases, for clean files, the entry point section is the 0th section, but for malicious files, it may vary.

%(percentage) of Entry point section index for clean v/s malicious
Graph 1. %(percentage) of entry point section index for clean v/s malicious

 

Similarly, for an index of “First empty section” of a file, we get the following plot.

%(percentage) of index of first empty section for clean v/s malicious
Graph 2. %(percentage) of index of the first empty section for clean v/s malicious

 

Feature scaling 

The data we feed to our model must be scaled properly. For example, the value of attribute like Address_OF_EntryPoint = 307584 will have a very large value when compared with the value of attribute like No_Of_Sections = 5. Such a huge difference can cause our model to be poorly trained. We use min_max scaling to scale the data. Here, we bring all the features to the same range in this case 0 to 1.

Dimensionality reduction

The next step is dimensionality reduction. In data, many times there are groups of features that are co-related to each other, i.e. if one feature changes, others also change at some rate. Removing or combining such features saves space and time and improves the performance of machine learning models. This process of reducing the number of unwanted or redundant features is known as dimensionality reduction. This ensures models are trained and tested faster without compromising the accuracy. We tried using PCA (Principal Component Analysis), Select K-Best by various algorithms.

PCA is used to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. PCA tries to find the principal components so as the distance between our samples and the principal component is minimized. The lesser is the distance between samples and principal components, the more variance is retained.

We can make a plot of cumulative variance versus the number of principal components to find the number of Principal Components (PCs) required to cover almost all data. Applying PCA on our data and tuning it gives us 163 PCs for nearly 1000 different features. That means now we can train the model on 163 features that can almost cover the complete data set.

cumulative variance v/s No. of Principal components
Graph 3. cumulative variance v/s no. of principal components

 

We can transform our test data into these 163 Principal Components and can apply classification models. Though PCA works fine, we need to calculate PCs at a client’s end before prediction. So, we settled with Select K-Best using Chi-squared statistics, which gives k number of best features by finding out their importance after comparing whole data.

Training

The next step is about training the model. While training, we need to focus on the following parameters:

Size of Model: The model generated by any algorithm should have an optimal size and updating it should take minimal of efforts.

Training Time: Time and resources utilized by an algorithm while training should be minimal (though it is not a primary requirement).

Prediction Time: Prediction function must be as fast as possible.

Overfitting: Some algorithms work great on small datasets. But, as we grow the dataset the results tend to go down this is called overfitting.

We tried a few algorithms; Logistic Regression worked good on small datasets but it tends to overfit & the accuracy reduces as we increase the training set.

Then we tried Random Forest Classifier. Accuracy here is good but the size of the model that is generated is large as it is a forest of multiple decision trees. And prediction function also takes time as trees need to be traversed. So we moved to SVM (Support Vector Machine).

Support vector machines

Support vector machine (SVM) is a supervised machine learning algorithm which can be used for both classification and regression challenges. However, it is mostly used in classification problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is a count of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform a classification by finding the hyperplane that differentiates the two classes properly.

There can be a lot of hyperplanes classifying the data, but SVM targets to find out a hyperplane that has the largest distance from the nearest sample set on both sides of the plane.

For example, here SVM will choose plane C, among A and B.

Linear SVM classifier
Graph 4. Linear SVM classifier

The example above is of Linear SVM. By using kernel functions in SVM, we can classify non linearly distributed data as well. The kernel function is a function which takes two inputs and outputs a number representing how similar they are. Generally used kernels in SVM are Linear kernel, Polynomial kernel, Radial bias function kernel (RBF). We tried with all of these kernels and found that RBF and Linear work well for our data. As linear kernel has an extremely fast prediction function, we finalized it. The prediction function is (w*x + b) where w is a vector of support vectors, x is a feature vector of the test sample and b is intercept or bais.

The SVM model trained using the above steps has a very good accuracy that it nearly reduced 1/3rd of the traffic to next levels. It is helping us greatly in improving performance and accuracy of detection.

Conclusion

Machine learning can be effectively used by the antivirus industry and use cases are huge. Quick Heal is committed to bringing the latest research in ML to reform endpoint security. At Quick Heal, we practice ML at various stages and the above case study is one of them. In the future, we will share write-ups on other case studies.

 Previous PostHow to defend your network against DDoS attacks?
Next Post  CVE-2018-4878 – Adobe Flash Player use after free (Zero Day...
Ajay Shinde
About Ajay Shinde

Ajay Shinde is a senior scan engine developer in Quick Heal Security Labs. He has 5 years of experience in reverse engineering, malware analysis, and development of...

Articles by Ajay Shinde »

Related Posts

  • Seqrite Endpoint Security 7.6 supports macOS Big Sur 11

    Seqrite Endpoint Security 7.6 supports macOS Big Sur 11

    November 14, 2020
  • Seqrite Endpoint Security supports Windows 10 October 2020 Update

    Seqrite Endpoint Security Supports Windows 10 October 2020 Update 20H2

    October 26, 2020
  • Are you doing enough to protect your sensitive data?

    Did you know that adversaries are planning to steal your data?

    September 11, 2020

No Comments

Leave a Reply.Your email address will not be published.

Cancel reply

CAPTCHA Image
Refresh Image

Popular Posts

  • Turn the Page: Cybersecurity Predictions for 2021 & beyond Turn the Page: Cybersecurity Predictions for 2021 & beyond February 18, 2021
  • The Data breach inferno burning big-ticket businesses The Data breach inferno burning big-ticket businesses February 5, 2021
  • Pharma Sector needs to streamline its insides to avoid cyberattacks Pharma Sector needs to streamline its insides to avoid cyberattacks February 12, 2021

Featured Authors

  • Seqrite
    Seqrite

    Follow us for the latest updates and insights related to security for...

    Read more..
  • Viraj Talikotkar
    Viraj Talikotkar

    Viraj is a Lead Technical Writer at Quick Heal Technologies. He is always on...

    Read more..
  • Sanjay Katkar
    Sanjay Katkar

    Sanjay Katkar is the Joint Managing Director and Chief Technology Officer of...

    Read more..

Latest Posts

  • Businesses now worried about the surge in COVID-19 infodemic

    Businesses now worried about the surge in COVID-19 infodemic

    February 26, 2021
  • Turn the Page: Cybersecurity Predictions for 2021 & beyond

    Turn the Page: Cybersecurity Predictions for 2021 & beyond

    February 18, 2021
  • Pharma Sector needs to streamline its insides to avoid cyberattacks

    Pharma Sector needs to streamline its insides to avoid cyberattacks

    February 12, 2021

Stay Updated!

Topics

Antivirus For Linux (10) apt (9) BYOD (9) COVID-19 (10) Cyber-attack (31) cyber-attacks (56) cyberattacks (12) Cybersecurity (279) cyber security (25) Cyber threat (29) cyber threats (44) Data (11) data breach (50) data breaches (27) data loss (28) data loss prevention (33) data protection (21) data security (13) DLP (49) Encryption (16) endpoint security (102) Enterprise security (14) EPS (9) Exploit (12) firewall (11) hackers (9) IoT (10) malware (58) malware attack (22) malware attacks (12) MDM (25) mobile device management (9) Network security (18) Patch Management (12) phishing (16) Ransomware (56) ransomware attack (29) ransomware attacks (30) ransomware protection (12) Seqrite (24) Seqrite Encryption (27) Seqrite EPS (33) Seqrite Services (16) UTM (34) Vulnerability (10)

Products

  • Endpoint Security (EPS)
  • Seqrite Encryption Manager
  • Seqrite Endpoint Security Cloud
  • Cloud Security
  • Seqrite mSuite
  • Seqrite MobiSMART
  • Unified Threat Management
  • Seqrite Secure Web Gateway
  • Antivirus for Server
  • Antivirus for Linux

Resources

  • White Papers
  • Datasheets
  • Threat Reports
  • Manuals
  • Case Studies

About Us

  • Company Overview
  • Leadership
  • Why choose SEQRITE?
  • Awards & Certifications
  • Newsroom

Archives

  • By Date
  • By Category

© 2020 Quick Heal Technologies Ltd. (Formerly Known as Quick Heal Technologies Pvt. Ltd.) Cookie Policies Privacy Policies

Our website uses cookies. Cookies enable us to provide the best experience possible and help us understand how visitors use our website.
By browsing this website, you agree to our cookie policy.