Machine Learning for User Authentication Using Keystroke Dynamics

 

Machine Learning for User Authentication Using Keystroke Dynamics
Ahmad Ayman Al-Tarawneh
Mu'tah University, 2015

 

This thesis presents a methodology for improving the security of an authentication process: Keystroke Dynamics (KSD). KSD is considered a behavioral biometric, operating as a second level of security along with the log-in process after inserting user name and password. KSD is mainly about observing the way in which the user types.

In this thesis, firstly, we propose 4 time features in addition to the main three features, these features represent the user’s behavior, which will be used in the authentication phase. Secondly, because of the lack of datasets in this field and because there is no standard dataset, we built a new dataset consisting of 504 records: 9 attempts for 56 users.

Thirdly, we proposed employing KSD in CAPCHA Code. We supposed three cases; first case is when a program hacks the CAPTCHA code and sends the code directly, where accuracy is certain to produce a result of 100%, and therefore there is no need to build a dataset for this. The second case is when the hacker is smarter, hacking the source code and knowing that there are features that must be sent with the code, therefore the hacker generates features with the code randomly. In this case, the best accuracy results achieved were 98.13% for Random Forest and J48 classifier. The third case is when the hacker is even smarter and hacks the source code, knowing that there is a relationship between the features and making the right calculations for the features, so that the main features are set randomly and others are calculated. In this case, the best accuracy results achieved were 93.125% by using Multi-Layer Perceptron (MLP). After that we run the validation process, and generated new dataset for that and obtained best accuracy94.76% using Random Forest classifier.


Finally, for the authentication users, we selected 20 users randomly. Our results were convergent; the average of the accuracy results for MLP was 94.90%, 91.53% when using Random Forest and 89.68% when using J48.