How to Implement Adaptive Authentication Using Machine Learning

Original article: https://www.linkedin.com/pulse/how-implement-adaptive-authentication-using-machine-learning-thomas/

Introduction

Authentication is a process used in almost every organization, starting from an employee whose identity needs to be verified to access corporate information system and ending with a company client to provide personal service, for example banking services or social networks. Consider the importance of authentication, this functionality is often being attacked to gain access. Machine learning can help to detect potentially dangerous authentication attempts and take action to prevent them or alert administrator personnel.

For example, a system administrator may receive an alert about attacks from a specific IP address, or about a password mining for a specific account. Or, the authentication system will automatically offer the user an additional step for authentication — enter a one-time confirmation code via SMS or captcha. Thus, the authentication system adapts to the user’s behavior (becomes adaptive), so, if the user’s behaviour is suspicious, it can make authentication process more complex or deny authentication.

Machine Learning Implementation

Domain analysis

  • IP address;
  • User Agent header of a user’s device;
  • mark, which can be either known or unknown device (based on the presence of a cookie from a previous authentication);
  • authentication time;
  • previous authentication time;
  • time that has passed since the last authentication;
  • whether authentication is performed from a trusted network or from the external Internet
  • and etc…

Feature generation

For example, if users are usually authenticated on workdays at labor hours, then authentication on weekends at night will be suspicious. So, authentication attempts from the different parts of the globe could be suspicious for a machine learning model.

Data transformation

Supervised learning

  • Logistic Regression
  • Neural Networks
  • Decision Trees
  • Gradient Boosting
  • Random Forest
  • etc

For model training you should random split your events into two sets of data — training set and test set. Training set should contain approximately 70% or 80% of data and test set should be 30% or 20%. Then, you should train algorithm with the train data and estimate model quality with test data.To achieve better result, you need to adjust parameters of the model, as well as remote features that do not correlates with the result and degrade the quality of the model. Additionally, you can combine models to achieve the best result in determining fraud.

Unsupervised learning

For unsupervised learning you can use following algorithms:

  • Local Outlier Factor
  • One Class Support Vector Machine
  • Isolation Forest

Conclusion

Besides, machine learning will not be able to prevent all attacks, for example, stolen user password, but it can identify suspicious user behavior, identify common fraud patterns, and increase the overall resistance of the authentication system to attacks. Of course, having deep domain expertise, it is possible to build an adaptive authentication system based on empirical data, but machine learning helps to identify hidden patterns that may not be available for human analysis.

Open Identity Community member, OSS Enthusiast