Machine learning has become increasingly established in everyday life over the past decade. One of the main reasons for this is that scientists can always solve problems that are not manageable with conventional approaches.
Examples are the translation of natural language or the automated recognition of images or methods in the control of autonomous agents (keyword: reinforcement learning). Motivated by these successes, machine learning plays an increasingly important role in cyber security. This can be seen particularly well using the example of so-called deep fakes.
From The Funny Gimmick To The Real Threat
The name “Deep Fake” is a neologism made up of the English terms “fake” and “deep”. The latter refers to the underlying deep neural networks, which vaguely resemble the structure of the human brain. Consisting of several million artificial neurons, the system can solve highly complex tasks, especially when manipulating the image and video material. How? By training such networks with video and audio material of a target person, they learn to imitate facial expressions or speech melodies.
Use The Opportunities Of Machine Learning Correctly
However, machine learning can also analyze large data streams and identify patterns that indicate attacks or abusive behavior. This pattern recognition relieves administrators and cyber security experts in their work. However, a significant challenge is the so-called ‘false positives, i.e. false alarms in situations in which no attack or misuse behavior occurs. Users have a very low tolerance for these false alarms because the software prompts them to take action on these alarms, which creates additional work. The consequence of this: After repeated false positives, patience is at an end, and the user no longer accepts the warning system. With more than a million daily activities,
False Positives: A Perennial Problem Of Security Technologies
Eliminating the false positives in AI systems seems easier at first glance than in practice. Because countless borderline cases can be classified as false alarms with human knowledge and instinct but quickly overwhelm machine learning algorithms. An example is the credit card behavior of a customer, viewed over time: If the card owner has made an average monthly transaction and a four-digit amount is then debited abroad, this movement can trigger an automated alarm. However, during the manual check, a human employee can quickly see if the customer has only traveled and, for example, rented a car or paid a hotel bill abroad.
Integrating such human understanding into AI systems is one of the significant current challenges in machine learning research. A solution to this will advance learning methods that do not look at the transactions in isolation but correlate them. Under certain circumstances, the customer may have paid for a flight booking using the card two months earlier, which means that a trip can be inferred.
Challenges Learning Data – Often Unstructured Or Erroneous
Transferred to the network traffic in a company with several hundred or even thousand users and even more endpoints. It is first necessary to gain a deep understanding of the initial situation. On the one hand, this includes understanding the architecture of the entire system. And the behavior of the users, on the other hand, comprehensive knowledge of Obtain attack vectors and learning algorithms of anomaly detection. However, for the algorithm to learn, it needs data. Therefore, a clean database is essential – which hardly exists in any organization. Unstructured, erroneous, in various formats, duplicate data are more the norm.
This is where the expertise of data scientists comes into play. They are the ones who have to deal with the data and all its errors and weaknesses. The corporate network in a medium-sized company consists of many different systems that interact. Web and e-mail servers, databases and applications of all kinds. Although their communication will standardize, it is only bilateral. The result is a Babylonian confusion of languages in the network, consisting of particular cases. The algorithm has to distinguish between this cacophony of normal behavior and deviation.
Threat Scenario Data Poisoning
A well knowing, the field of IT security is a rabbit-and-hedgehog game. And security procedures that use machine learning are no exception. Their target manipulation is a different threat scenario as data poisoning. This means the ‘poisoning’ of data sets that the machine uses to learn. Attackers try to manipulate the system during the learning process. For example, with viruses and worms that will declare ‘harmless‘. Using this as training data would poison the AI system and classify malicious behavior as benign.
This can also be the case the other way around. By presenting good behavior as malicious, the system gets entirely out of control will switch off by the stressing user. And thus ultimately overturned by the attackers. Therefore, the purity of the training data is essential for all data-based machine learning methods.
Also Read: The Limits Of Artificial Intelligence