Comparison of DLP systems. Review of DLP systems on the global and Russian markets

About the problem Today, information technology is an important component of any modern organization. Figuratively speaking, information technology is the heart of the enterprise, which maintains the performance of the business and increases its efficiency and competitiveness in the conditions of modern, fierce competition. Business process automation systems, such as document flow, CRM systems, ERP systems, multidimensional analysis and planning systems allow quickly collect information, systematize and group it, accelerating management decision-making processes and ensuring transparency of business and business processes for management and shareholders. It becomes obvious that a large amount of strategic, confidential and personal data is an important information asset of the enterprise, and the consequences of leakage of this information will affect the efficiency of the organization. The use of today's traditional security measures, such as antiviruses and firewalls, perform the functions of protecting information assets from external threats, but do not in any way ensure the protection of information assets from leakage, distortion or destruction by an internal attacker. Internal threats to information security may remain ignored or, in some cases, unnoticed by management due to a lack of understanding of the criticality of these threats to the business. It is for this reason protection of confidential data so important today. About the solution Protecting confidential information from leakage is an important component of an organization’s information security complex. DLP systems (data leakage protection system) are designed to solve the problem of accidental and intentional leaks of confidential data.

Comprehensive data leak protection system (DLP system) are a software or hardware-software complex that prevents the leakage of confidential data.

It is carried out by the DLP system using the following main functions:

  • Traffic filtering across all data transmission channels;
  • Deep traffic analysis at the content and context level.
Protecting confidential information in a DLP system carried out at three levels: Data-in-Motion, Data-at-Rest, Data-in-Use.

Data-in-Motion– data transmitted over network channels:

  • Web (HTTP/HTTPS protocols);
  • Internet - instant messengers (ICQ, QIP, Skype, MSN, etc.);
  • Corporate and personal mail (POP, SMTP, IMAP, etc.);
  • Wireless systems (WiFi, Bluetooth, 3G, etc.);
  • FTP connections.
Data-at-Rest– data statically stored on:
  • Servers;
  • Workstations;
  • Laptops;
  • Data storage systems (DSS).
Data-in-Use– data used on workstations.

Measures aimed at preventing information leaks consist of two main parts: organizational and technical.

Protecting Confidential Information includes organizational measures to search and classify the data available in the company. During the classification process, data is divided into 4 categories:

  • Secret information;
  • Confidential information;
  • Information for official use;
  • Public information.
How confidential information is determined in DLP systems.

In DLP systems, confidential information can be determined by a number of different characteristics, as well as in various ways, for example:

  • Linguistic information analysis;
  • Statistical analysis of information;
  • Regular expressions (patterns);
  • Digital fingerprint method, etc.
After the information has been found, grouped and systematized, the second organizational part follows - the technical one.

Technical measures:
The protection of confidential information using technical measures is based on the use of the functionality and technologies of the system for protecting data leaks. The DLP system includes two modules: a host module and a network module.

Host modules are installed on user workstations and provide control over the actions performed by the user in relation to classified data (confidential information). In addition, the host module allows you to track user activity by various parameters, such as time spent on the Internet, launched applications, processes and data paths, etc.

Network module carries out analysis of information transmitted over the network and controls traffic that goes beyond the protected information system. If confidential information is detected in the transmitted traffic, the network module stops data transmission.

What will the implementation of a DLP system give?

After implementing a data leakage protection system, the company will receive:

  • Protection of information assets and important strategic information of the company;
  • Structured and systematized data in the organization;
  • Transparency of business and business processes for management and security services;
  • Control of the processes of transfer of confidential data in the company;
  • Reducing the risks associated with loss, theft and destruction of important information;
  • Protection against malware entering the organization from within;
  • Saving and archiving of all actions related to the movement of data within the information system;
Secondary advantages of the DLP system:
  • Monitoring the presence of personnel at the workplace;
  • Saving Internet traffic;
  • Optimization of the corporate network;
  • Control of applications used by the user;
  • Increasing staff efficiency.

Today, the market for DLP systems is one of the fastest growing among all information security tools. However, Belarus still does not quite keep up with global trends, and therefore the market has DLP-systems in our country have their own characteristics.

What is DLP and how do they work?

Before we talk about the market DLP -systems, it is necessary to decide what, strictly speaking, is meant when talking about such decisions. Under DLP - systems are commonly understood as software products that protect organizations from leaks of confidential information. The abbreviation itself DLP stands for DataLeakPrevention , that is, preventing data leaks.

Systems of this kind create a secure digital “perimeter” around the organization, analyzing all outgoing, and in some cases, outgoing information. Controlled information should be not only Internet traffic, but also a number of other information flows: documents that are taken outside the protected security loop on external media, printed on a printer, sent to mobile media via Bluetooth, etc.

Because DLP - the system must prevent leaks of confidential information, then it must have built-in mechanisms for determining the degree of confidentiality of a document detected in intercepted traffic. As a rule, the most common are two methods: by analyzing special document markers and by analyzing the contents of the document. The second option is now more common because it is resistant to modifications made to the document before it is sent, and also allows you to easily expand the number of confidential documents that the system can work with.

"Side" tasks DLP

In addition to its main task related to preventing information leaks, DLP -systems are also well suited for solving a number of other tasks related to monitoring personnel actions. Most often DLP -systems are used to solve the following non-core tasks:

  • Monitoring the use of working time and work resources by employees;
  • Monitoring employee communications in order to identify “undercover” struggles that could harm the organization;
  • Monitoring the legality of employee actions (prevention of printing fake documents, etc.);
  • Identifying employees sending out resumes to quickly search for specialists for vacant positions;

Due to the fact that many organizations consider a number of these tasks (especially control of the use of working time) to be of higher priority than protection against information leaks, a number of programs have emerged that are designed specifically for this, but can in some cases also work as a means of protecting the organization from leaks . From full-fledged DLP -Systems such programs are distinguished by the lack of developed tools for analyzing intercepted data, which must be done manually by an information security specialist, which is convenient only for very small organizations (up to ten controlled employees). However, since these solutions are in demand in Belarus, they are also included in the comparison table accompanying this article.

Classification of DLP systems

All DLP systems can be divided according to a number of characteristics into several main classes. Based on the ability to block information identified as confidential, systems with active and passive control of user actions are distinguished. The former are able to block transmitted information, the latter, accordingly, do not have this ability. The first systems are much better at combating random data leaks, but at the same time are capable of accidentally stopping the organization’s business processes, while the second systems are safe for business processes, but are only suitable for combating systematic leaks. Another classification of DLP systems is based on their network architecture. Sluice DLP run on intermediate servers, while host servers use agents that run directly on employee workstations. Today, the most common option is to use gateway and host components together.

Global DLP market

Currently, the main players in the global market DLP -systems are companies that are widely known for their other products for ensuring information security in organizations. This is, first of all, Symantec, McAffee, TrendMicro, WebSense. About total global market volume DLP -solutions are estimated at $400 million, which is quite a bit compared to the same antivirus market. However, the market DLP is showing rapid growth: back in 2009 it was estimated at just over 200 million.

The market of Belarus has a huge influence on the market of its eastern neighbor, Russia, which is already quite large and mature. The main players on it today are Russian companies: InfoWatch , "Jet Infosystems", SecurIT, SearchInform, Perimetrix and a number of others. The total volume of the Russian DLP market is estimated at 12–15 million dollars. At the same time, it is growing at the same pace as the world.

The main one of these trends, as experts believe, is the transition from “patch” systems, consisting of components from various manufacturers, each solving its own problem, to unified integrated software systems. The reason for this transition is obvious: complex integrated systems relieve information security specialists from the need to solve problems of compatibility of various components of the “patch” system with each other, make it easy to change settings immediately for large arrays of client workstations in organizations, and also allow you to avoid difficulties when transferring data from one component of a single integrated system to another. Also, the movement of developers towards integrated systems is due to the specifics of the tasks of ensuring information security: after all, if at least one channel through which information can leak can be left uncontrolled, one cannot talk about the organization’s security from this kind of threats.

Western manufacturers DLP -systems that came to the market of the CIS countries faced a number of problems related to the support of national languages ​​(in the case of Belarus, however, it is appropriate to talk about supporting the Russian, not the Belarusian language). Since the CIS market is very interesting to Western vendors, today they are actively working to support the Russian language, which is the main obstacle to their successful development of the market.

Another important trend in the field DLP is a gradual transition to a modular structure, when the customer can independently select those system components that he needs (for example, if support for external devices is disabled at the operating system level, then there is no need to pay extra for the functionality to control them). Important role in development DLP -systems will also be influenced by industry specifics - we can expect the emergence of special versions of well-known systems, adapted specifically for the banking sector, for government agencies, etc., corresponding to the needs of the organizations themselves.

An important factor influencing the development DLP systems, is also the proliferation of laptops and netbooks in corporate environments. The specifics of laptops (work outside a corporate environment, the possibility of stealing information along with the device itself, etc.) forces manufacturers DLP -systems to develop fundamentally new approaches to protecting laptop computers. It is worth noting that today only a few vendors are ready to offer customers the function of monitoring laptops and netbooks with their DLP system.

Application of DLP in Belarus

DLP in Belarus -systems are used in a relatively small number of organizations, but their number was growing steadily before the crisis. However, collected using DLP -systems information, Belarusian organizations are in no hurry to make public information, prosecuting employees responsible for information leaks in court. Despite the fact that Belarusian legislation contains provisions that allow punishing distributors of corporate secrets, the vast majority of organizations using DLP -systems prefer to limit themselves to internal proceedings and disciplinary sanctions, and, as a last resort, dismissing employees who have committed particularly large-scale offenses. However, the tradition of “not washing dirty linen in public” is characteristic of the entire post-Soviet space, unlike Western countries, where data leaks are reported to everyone who could have suffered from it.

Vadim STANKEVICH

If we are quite consistent in our definitions, we can say that information security began precisely with the advent of DLP systems. Before this, all products that dealt with “information security” actually protected not information, but infrastructure - places where data is stored, transmitted and processed. The computer, application, or channel that hosts, processes, or transmits sensitive information is protected by these products in the same way as the infrastructure that handles otherwise innocuous information. That is, it was with the advent of DLP products that information systems finally learned to distinguish confidential information from non-confidential information. Perhaps, with the integration of DLP technologies into the information infrastructure, companies will be able to save a lot on information protection - for example, use encryption only in cases where confidential information is stored or transmitted, and not encrypt information in other cases.

However, this is a matter of the future, and in the present these technologies are used mainly to protect information from leaks. Information categorization technologies form the core of DLP systems. Each manufacturer considers its methods of detecting confidential information unique, protects them with patents and comes up with special trademarks for them. After all, the remaining elements of the architecture that are different from these technologies (protocol interceptors, format parsers, incident management and data storage) are identical for most manufacturers, and for large companies they are even integrated with other information infrastructure security products. Basically, to categorize data in products for protecting corporate information from leaks, two main groups of technologies are used - linguistic (morphological, semantic) analysis and statistical methods (Digital Fingerprints, Document DNA, anti-plagiarism). Each technology has its own strengths and weaknesses, which determine the scope of its application.

Linguistic analysis

The use of stop words (“secret”, “confidential” and the like) to block outgoing email messages in mail servers can be considered the progenitor of modern DLP systems. Of course, this does not protect against attackers - removing a stop word, which is most often placed in a separate section of the document, is not difficult, and the meaning of the text will not change at all.

The impetus for the development of linguistic technologies was made at the beginning of this century by the creators of email filters. First of all, to protect email from spam. It is now reputational methods that predominate in anti-spam technologies, but at the beginning of the century there was a real linguistic war between the projectile and the armor - spammers and anti-spammers. Remember the simplest methods for fooling filters based on stop words? Replacing letters with similar letters from other encodings or numbers, transliteration, random spaces, underlining or line breaks in the text. Anti-spammers quickly learned to deal with such tricks, but then graphic spam and other cunning types of unwanted correspondence appeared.

However, it is impossible to use anti-spam technologies in DLP products without serious modification. After all, to combat spam, it is enough to divide the information flow into two categories: spam and non-spam. The Bayes method, which is used to detect spam, gives only a binary result: “yes” or “no”. This is not enough to protect corporate data from leaks - you cannot simply divide information into confidential and non-confidential. You need to be able to classify information by functional affiliation (financial, production, technological, commercial, marketing), and within classes - categorize it by level of access (for free distribution, for limited access, for official use, secret, top secret, and so on).

Most modern linguistic analysis systems use not only contextual analysis (that is, in what context, in combination with what other words a particular term is used), but also semantic analysis of the text. These technologies work more efficiently the larger the fragment being analyzed. Analysis is carried out more accurately on a large fragment of text, and the category and class of the document is more likely to be determined. When analyzing short messages (SMS, Internet messengers), nothing better than stop words has yet been invented. The author was faced with such a task in the fall of 2008, when thousands of messages like “we are being laid off,” “they will take away our license,” “outflow of depositors,” which had to be immediately blocked from their clients, were sent to the Internet from the workplaces of many banks via instant messengers.

Advantages of technology

The advantages of linguistic technologies are that they work directly with the content of documents, that is, it does not matter to them where and how the document was created, what type of stamp it is on, or what the file is called - the documents are protected immediately. This is important, for example, when processing drafts of confidential documents or to protect incoming documentation. If documents created and used within the company can somehow be named, stamped or marked in a specific way, then incoming documents may have stamps and labels that are not accepted by the organization. Drafts (unless, of course, they are created in a secure document management system) may also already contain confidential information, but not yet contain the necessary stamps and labels.

Another advantage of linguistic technologies is their learning ability. If you have ever pressed the “Not Spam” button in your email client at least once in your life, then you can already imagine the client part of the linguistic engine training system. Let me note that you absolutely do not need to be a certified linguist and know what exactly will change in the category database - just indicate to the system a false positive, and it will do the rest itself.

The third advantage of linguistic technologies is their scalability. The speed of information processing is proportional to its quantity and is absolutely independent of the number of categories. Until recently, the construction of a hierarchical database of categories (historically it was called the BKF - content filtering database, but this name no longer reflects the real meaning) looked like a kind of shamanism of professional linguists, so setting up the BKF could easily be considered a shortcoming. But with the release of several “autolinguistic” products in 2010, building a primary database of categories became extremely simple - the system is shown the places where documents of a certain category are stored, and it itself determines the linguistic features of this category, and in case of false positives, it trains itself. So now ease of setup has been added to the advantages of linguistic technologies.

And one more advantage of linguistic technologies that I would like to note in the article is the ability to detect categories in information flows that are not related to documents located within the company. A tool for monitoring the content of information flows can identify categories such as illegal activities (piracy, distribution of prohibited goods), use of the company’s infrastructure for one’s own purposes, harm to the company’s image (for example, spreading defamatory rumors), and so on.

Disadvantages of technology

The main disadvantage of linguistic technologies is their dependence on language. It is not possible to use a linguistic engine designed for one language to analyze another. This was especially noticeable when American manufacturers entered the Russian market - they were not ready to face Russian word formation and the presence of six encodings. It was not enough to translate the categories and key words into Russian - in English, word formation is quite simple, and cases are put into prepositions, that is, when the case changes, the preposition changes, and not the word itself. Most nouns in English become verbs without changing the word. And so on. In Russian, everything is not like that - one root can give rise to dozens of words in different parts of speech.

In Germany, American manufacturers of linguistic technologies were faced with another problem - the so-called “compounds”, compound words. In German, it is customary to attach definitions to the main word, resulting in words that sometimes consist of a dozen roots. There is no such thing in the English language, where a word is a sequence of letters between two spaces, so the English linguistic engine was unable to process unfamiliar long words.

To be fair, it should be said that these problems have now been largely solved by American manufacturers. The language engine had to be redesigned (and sometimes rewritten) quite a bit, but the large markets of Russia and Germany are certainly worth it. It is also difficult to process multilingual texts using linguistic technologies. However, most engines still cope with two languages, usually the national language + English - for most business tasks this is quite enough. Although the author has encountered confidential texts containing, for example, Kazakh, Russian and English at the same time, this is the exception rather than the rule.

Another disadvantage of linguistic technologies for controlling the full range of corporate confidential information is that not all confidential information is in the form of coherent texts. Although in databases information is stored in text form, and there are no problems extracting text from the DBMS, the received information most often contains proper names - full names, addresses, company names, as well as digital information - account numbers, credit cards, their balances, etc. . Processing such data using linguistics will not bring much benefit. The same can be said about CAD/CAM formats, that is, drawings that often contain intellectual property, program codes and media (video/audio) formats - some texts can be extracted from them, but their processing is also ineffective. Just three years ago, this also applied to scanned texts, but leading manufacturers of DLP systems quickly added optical recognition and dealt with this problem.

But the biggest and most often criticized shortcoming of linguistic technologies is still the probabilistic approach to categorization. If you've ever read an email with the "Probably SPAM" category, you'll know what I mean. If this happens with spam, where there are only two categories (spam/not spam), you can imagine what will happen when several dozen categories and privacy classes are loaded into the system. Although training the system can achieve 92-95% accuracy, for most users this means that every tenth or twentieth movement of information will be mistakenly assigned to the wrong class, with all the ensuing business consequences (leakage or interruption of a legitimate process).

It is usually not customary to consider the complexity of technology development as a disadvantage, but it cannot be ignored. The development of a serious linguistic engine with categorization of texts into more than two categories is a knowledge-intensive and rather technologically complex process. Applied linguistics is a rapidly developing science, which received a strong impetus in its development with the spread of Internet search, but today there are only a few workable categorization engines on the market: for the Russian language there are only two of them, and for some languages ​​they simply have not yet been developed. Therefore, there are only a couple of companies in the DLP market that are able to fully categorize information on the fly. It can be assumed that when the DLP market grows to multi-billion dollar sizes, Google will easily enter it. With its own linguistic engine, tested on trillions of search queries across thousands of categories, it will not be difficult for him to immediately grab a serious piece of this market.

Statistical methods

The task of computer search for significant quotes (why exactly “significant” - a little later) interested linguists back in the 70s of the last century, if not earlier. The text was broken into pieces of a certain size, and a hash was taken from each of them. If a certain sequence of hashes occurred in two texts at the same time, then with a high probability the texts in these areas coincided.

A by-product of research in this area is, for example, the “alternative chronology” of Anatoly Fomenko, a respected scholar who worked on “textual correlations” and once compared Russian chronicles from different historical periods. Surprised by how much the chronicles of different centuries coincide (more than 60%), in the late 70s he put forward the theory that our chronology is several centuries shorter. Therefore, when some DLP company entering the market offers “revolutionary technology for searching quotes,” it can be said with a high probability that the company has created nothing but a new brand.

Statistical technologies treat texts not as a coherent sequence of words, but as an arbitrary sequence of characters, and therefore work equally well with texts in any language. Since any digital object - be it a picture or a program - is also a sequence of symbols, the same methods can be used to analyze not only text information, but also any digital objects. And if the hashes in two audio files match, one of them probably contains a quote from the other, so statistical methods are effective means of protecting against audio and video leaks, actively used in music studios and film companies.

It's time to return to the concept of a "meaningful quote." The key characteristic of a complex hash taken from a protected object (which in different products is called either Digital Fingerprint or Document DNA) is the step at which the hash is taken. As can be understood from the description, such a “print” is a unique characteristic of the object and at the same time has its own size. This is important because if you take prints from millions of documents (which is the storage capacity of the average bank), you will need a sufficient amount of disk space to store all the prints. The size of such a fingerprint depends on the hash step - the smaller the step, the larger the fingerprint. If you take a hash in one character increments, the size of the fingerprint will exceed the size of the sample itself. If you increase the step size (for example, 10,000 characters) to reduce the “weight” of the fingerprint, then at the same time the probability increases that a document containing a quote from a sample 9,900 characters long will be confidential, but will slip through unnoticed.

On the other hand, if you take a very small step, a few symbols, to increase the detection accuracy, then you can increase the number of false positives to an unacceptable value. In terms of text, this means that you should not remove the hash from each letter - all words consist of letters, and the system will take the presence of letters in the text as the content of the quote from the sample text. Typically, manufacturers themselves recommend some optimal step for removing hashes so that the size of the quote is sufficient and at the same time the weight of the print itself is small - from 3% (text) to 15% (compressed video). In some products, manufacturers allow you to change the size of the quote significance, that is, increase or decrease the hash step.

Advantages of technology

As you can understand from the description, to detect a quote you need a sample object. And statistical methods can say with good accuracy (up to 100%) whether the file being checked contains a significant quote from the sample or not. That is, the system does not take responsibility for categorizing documents - such work lies entirely with the conscience of the person who categorized the files before taking fingerprints. This greatly facilitates the protection of information if the enterprise stores infrequently changed and already categorized files in some place(s). Then it is enough to take a fingerprint from each of these files, and the system will, in accordance with the settings, block the transfer or copying of files containing significant quotes from the samples.

The independence of statistical methods from the language of the text and non-textual information is also an undeniable advantage. They are good at protecting static digital objects of any type - pictures, audio/video, databases. I will talk about protecting dynamic objects in the “disadvantages” section.

Disadvantages of technology

As is the case with linguistics, the disadvantages of technology are the flip side of the advantages. The ease of training the system (indicate the file to the system, and it is already protected) shifts the responsibility for training the system onto the user. If suddenly a confidential file ends up in the wrong place or was not indexed due to negligence or malicious intent, then the system will not protect it. Accordingly, companies that care about protecting confidential information from leakage must provide a procedure for controlling how confidential files are indexed by the DLP system.

Another drawback is the physical size of the print. The author has repeatedly seen impressive pilot projects on prints, when the DLP system with 100% probability blocks the transfer of documents containing significant quotes from three hundred sample documents. However, after a year of operating the system in combat mode, the fingerprint of each outgoing letter is compared not with three hundred, but with millions of sample fingerprints, which significantly slows down the operation of the mail system, causing delays of tens of minutes.

As I promised above, I will describe my experience in protecting dynamic objects using statistical methods. The time it takes to take a fingerprint directly depends on the file size and format. For a text document like this article it takes fractions of a second, for an hour and a half MP4 movie it takes tens of seconds. For files that rarely change, this is not critical, but if an object changes every minute or even a second, then a problem arises: after each change of the object, a new fingerprint needs to be taken from it... The code that the programmer is working on is not the biggest complexity, it is much worse with databases used in billing, core banking or call centers. If the time for taking a fingerprint is longer than the time for the object to remain unchanged, then the problem has no solution. This is not such an exotic case - for example, the fingerprint of a database storing phone numbers of clients of a federal cellular operator takes several days to be taken, but changes every second. So when a DLP vendor claims that their product can protect your database, mentally add the word “quasi-static.”

Unity and struggle of opposites

As can be seen from the previous section of the article, the strength of one technology manifests itself where another is weak. Linguistics does not need samples, it categorizes data on the fly and can protect information that has not been fingerprinted, either accidentally or intentionally. The fingerprint gives better accuracy and is therefore preferable for use in automatic mode. Linguistics works great with texts, fingerprints work well with other information storage formats.

Therefore, most leading companies use both technologies in their developments, with one of them being the main one and the other being additional. This is due to the fact that initially the company's products used only one technology, in which the company advanced further, and then, according to market demand, a second one was connected. For example, previously InfoWatch used only the licensed Morph-OLogic linguistic technology, and Websense used PreciseID technology, which belongs to the Digital Fingerprint category, but now companies use both methods. Ideally, these two technologies should be used not in parallel, but sequentially. For example, fingerprints will do a better job of determining the type of document - is it a contract or a balance sheet, for example. Then you can connect the linguistic database created specifically for this category. This greatly saves computing resources.

Several other types of technologies used in DLP products are beyond the scope of this article. These include, for example, a structure analyzer that allows you to find formal structures in objects (credit card numbers, passports, tax identification numbers, etc.) that cannot be detected either using linguistics or fingerprints. Also, the topic of different types of labels is not covered - from entries in the attribute fields of a file or simply a special name for files to special cryptocontainers. The latter technology is becoming obsolete, since most manufacturers prefer not to reinvent the wheel themselves, but to integrate with DRM system manufacturers, such as Oracle IRM or Microsoft RMS.

DLP products are a fast-growing area of ​​information security; some manufacturers release new versions very often, more than once a year. We look forward to the emergence of new technologies for analyzing the corporate information field to increase the effectiveness of protecting confidential information.

Before studying and discussing the DLP systems market in detail, you need to decide what this means. DLP systems usually mean software products that are created to protect organizations and enterprises from leaks of classified information. This is how the abbreviation DLP itself is translated into Russian (in full - Data Leak Prevention) - “avoidance of data leaks”.

Such systems are capable of creating a digital secure “perimeter” for analyzing all outgoing or incoming information. The information controlled by this system is Internet traffic and numerous information flows: documents taken outside the protected “perimeter” on external media, printed on a printer, sent to mobile devices via Bluetooth. Since sending and exchanging various types of information is an inevitable necessity these days, the importance of such protection is obvious. The more digital and internet technologies are used, the more security guarantees are needed on a daily basis, especially in corporate environments.

How it works?

Since the DLP system must counteract leaks of corporate confidential information, it, of course, has built-in mechanisms for diagnosing the degree of confidentiality of any document found in intercepted traffic. In this case, there are two common ways to recognize the degree of confidentiality of files: by checking special markers and by analyzing the content.

Currently, the second option is relevant. It is more resistant to modifications that may be made to the file before it is sent, and also makes it possible to easily expand the number of confidential documents that the system can work with.

Secondary DLP Tasks

In addition to its main function, which is related to preventing information leakage, DLP systems are also suitable for solving many other tasks aimed at monitoring the actions of personnel. Most often, DLP systems solve a number of the following problems:

  • full control over the use of working time, as well as working resources by the organization’s personnel;
  • monitoring employee communications to detect their potential to cause harm to the organization;
  • control over the actions of employees in terms of legality (prevention of the production of counterfeit documents);
  • identifying employees who send out resumes to quickly find personnel for a vacant position.

Classification and comparison of DLP systems

All existing DLP systems can be divided according to certain characteristics into several main subtypes, each of which will stand out and have its own advantages over the others.

If it is possible to block information that is recognized as confidential, there are systems with active or passive constant monitoring of user actions. The first systems are able to block transmitted information, unlike the second. They are also much better able to deal with accidental information passing to the side, but at the same time they can stop the company’s current business processes, which is not their best quality in comparison with the latter.

Another classification of DLP systems can be made based on their network architecture. Gateway DLPs operate on intermediate servers. In contrast, hosts use agents that work specifically on employee workstations. At the moment, a more relevant option is the simultaneous use of host and gateway components, but the former have certain advantages.

Global modern DLP market

At the moment, the main places in the global DLP systems market are occupied by companies widely known in this field. These include Symantec, TrendMicro, McAffee, WebSense.

Symantec

Symantec maintains its leading position in the DLP market, although this fact is surprising since many other companies could replace it. The solution still consists of modular components that allow it to provide the latest capabilities designed to integrate DLP systems with the best technologies. The technology roadmap for this year was compiled using information from our clients and is today the most progressive available on the market. However, this is far from the best choice of a DLP system.

Strengths:

  • significant improvements to Content-Aware DLP technology for portable devices;
  • Improved content retrieval capabilities to support a more comprehensive approach;
  • improving the integration of DLP capabilities with other Symantec products (the most striking example is Data Insight).

What you need to pay attention to (important disadvantages in the work that are worth thinking about):

  • despite the fact that Symantec’s technology roadmap is considered progressive, its implementation often occurs with hitches;
  • Even though the management console is fully functional, it is not as competitive as Symantec claims;
  • Often clients of this system complain about the response time of the support service;
  • the price of this solution is still significantly higher than that of competitors' designs, which over time may take a leading position thanks to small changes in this system.

Websense

Over the past few years, developers have been regularly improving Websense's DLP offering. It can safely be considered a fully functional solution. Websense has provided the modern user with advanced capabilities.

Winning sides:

  • Websense's proposal is to use a full-featured DLP solution that supports endpoints and data discovery.
  • Using the drip DLP function, it is possible to detect gradual information leaks that last quite a long time.

What deserves special attention:

  • You can only edit data while you are at rest.
  • The technological map is characterized by low power.

McAfee DLP

The McAfee DLP security system also managed to undergo many positive changes. It is not characterized by the presence of special functions, but the implementation of basic capabilities is organized at a high level. The key difference, aside from integration with other McAfee ePolicy Orchestrator (EPO) console products, is the use of storage technology in a centralized database of captured data. This framework can be used to optimize new rules to test against false positives and reduce deployment time.

What attracts you most about this solution?

Incident management can easily be called a strength of the McAfee solution. With its help, documents and comments are attached that promise benefits when working at any level. This solution is able to detect non-text content, for example, a picture. It is possible for DLP systems to deploy a new solution from this developer to protect endpoints, for example, stand-alone.

Functions aimed at developing platforms, presented in the form of mobile communication devices and social networks, have performed quite well. This allows them to beat competitive solutions. New rules are analyzed through a database containing the captured information, which helps reduce the number of false positives and speed up the implementation of rules. McAfee DLP provides core functionality in a virtual environment. Plans regarding their development have not yet been clearly formulated.

Prospects and modern DLP systems

The overview of the various solutions presented above shows that they all work in the same way. According to experts, the main development trend is that “patch” systems containing components from several manufacturers involved in solving specific problems will be replaced by an integrated software package. This transition will be carried out due to the need to relieve specialists from solving certain problems. In addition, existing DLP systems, the analogues of which cannot provide the same level of protection, will be constantly improved.

For example, through complex integrated systems, the compatibility of various types of “patch” system components with each other will be determined. This will facilitate easy change of settings for huge-scale arrays of client stations in organizations and, at the same time, the absence of difficulties with transferring data from components of a single integrated system to each other. Developers of integrated systems are strengthening the specificity of tasks aimed at ensuring information security. Not a single channel should be left uncontrolled, because it is often the source of probable information leakage.

What will happen in the near future?

Western manufacturers trying to take over the market for DLP systems in the CIS countries had to face problems regarding support for national languages. They are quite actively interested in our market, so they strive to support the Russian language.

The DLP industry is seeing a move towards a modular structure. The customer will be given the opportunity to independently select the system components he or she requires. Also, the development and implementation of DLP systems depends on industry specifics. Most likely, special versions of well-known systems will appear, the adaptation of which will be subordinated to work in the banking sector or government agencies. The relevant requests of specific organizations will be taken into account here.

Corporate Security

The use of laptops in corporate environments has a direct impact on the direction of development of DLP systems. This type of laptop computer has many more vulnerabilities, which requires increased protection. Due to the specific nature of laptops (the possibility of theft of information and the device itself), manufacturers of DLP systems are developing new approaches to ensuring the security of laptop computers.

Even the most fashionable IT terms must be used appropriately and as correctly as possible. At least in order not to mislead consumers. It has definitely become fashionable to consider yourself a manufacturer of DLP solutions. For example, at the recent CeBIT-2008 exhibition, the inscription “DLP solution” could often be seen on the stands of manufacturers of not only little-known antiviruses and proxy servers in the world, but even firewalls. Sometimes there was a feeling that around the next corner you could see some kind of CD ejector (a program that controls the opening of the CD drive) with the proud slogan of an enterprise DLP solution. And, oddly enough, each of these manufacturers, as a rule, had a more or less logical explanation for such positioning of their product (naturally, in addition to the desire to get “benefit” from a fashionable term).

Before considering the market of DLP system manufacturers and its main players, we should decide what we mean by a DLP system. There have been many attempts to define this class of information systems: ILD&P - Information Leakage Detection & Prevention (“identification and prevention of information leaks”, the term was proposed by IDC in 2007), ILP - Information Leakage Protection (“protection against information leaks”, Forrester , 2006), ALS - Anti-Leakage Software (“anti-leakage software”, E&Y), Content Monitoring and Filtering (CMF, Gartner), Extrusion Prevention System (similar to the Intrusion-prevention system).

But the name DLP - Data Loss Prevention (or Data Leak Prevention, protection against data leaks), proposed in 2005, nevertheless became established as a commonly used term. As a Russian (rather than a translation, but a similar term) the phrase “confidential protection systems” was adopted data from insider threats.” At the same time, internal threats are understood as abuses (intentional or accidental) by employees of an organization who have legal rights to access the relevant data and their powers.

The most harmonious and consistent criteria for belonging to DLP systems were put forward by the research agency Forrester Research during their annual study of this market. They proposed four criteria according to which a system can be classified as DLP. 1.

Multichannel. The system must be able to monitor several possible channels of data leakage. In a network environment, this is at least e-mail, Web and IM (instant messengers), and not just scanning mail traffic or database activity. On the workstation - monitoring of file operations, work with the clipboard, as well as control of e-mail, Web and IM. 2.

Unified management. The system must have unified information security policy management tools, analysis and event reporting across all monitoring channels. 3.

Active protection. The system should not only detect violations of the security policy, but also, if necessary, force compliance with it. For example, block suspicious messages. 4.

Based on these criteria, in 2008, Forrester selected a list of 12 software vendors for review and evaluation (they are listed below in alphabetical order, with the name of the company acquired by this vendor in order to enter the DLP systems market indicated in brackets) :

  1. Code Green;
  2. InfoWatch;
  3. McAfee (Onigma);
  4. Orchestria;
  5. Reconnex;
  6. RSA/EMC (Tablus);
  7. Symantec (Vontu);
  8. Trend Micro (Provilla);
  9. Verdasys;
  10. Vericept;
  11. Websense(PortAuthority);
  12. Workshare.

Today, of the above-mentioned 12 vendors, only InfoWatch and Websense are represented on the Russian market to one degree or another. The rest either do not work in Russia at all, or have only announced their intentions to start selling DLP solutions (Trend Micro).

Considering the functionality of DLP systems, analysts (Forrester, Gartner, IDC) introduce a categorization of protection objects - types of information objects to be monitored. Such categorization makes it possible, to a first approximation, to assess the scope of application of a particular system. There are three categories of monitoring objects.

1. Data-in-motion (data in motion) - email messages, Internet pagers, peer-to-peer networks, file transfers, Web traffic, as well as other types of messages that can be transmitted over communication channels. 2. Data-at-rest (stored data) - information on workstations, laptops, file servers, specialized storage, USB devices and other types of data storage devices.

3. Data-in-use (data in use) - information being processed at the moment.

Currently, there are about two dozen domestic and foreign products on our market that have some of the properties of DLP systems. Brief information about them in the spirit of the above classification is listed in table. 1 and 2. Also in table. 1 introduced such a parameter as “centralized data storage and auditing”, implying the ability of the system to save data in a single depository (for all monitoring channels) for further analysis and audit. This functionality has recently acquired particular importance not only due to the requirements of various legislative acts, but also due to its popularity among customers (based on the experience of implemented projects). All information contained in these tables is taken from public sources and marketing materials of the respective companies.

Based on the data presented in Tables 1 and 2, we can conclude that today only three DLP systems are presented in Russia (from the companies InfoWatch, Perimetrix and WebSence). These also include the recently announced integrated product from Jet Infosystem (SKVT+SMAP), since it will cover several channels and have a unified management of security policies.

It is quite difficult to talk about the market shares of these products in Russia, since most of the mentioned manufacturers do not disclose sales volumes, the number of clients and protected workstations, limiting themselves only to marketing information. We can only say for sure that the main suppliers at the moment are:

  • “Dozor” systems, present on the market since 2001;
  • InfoWatch products sold since 2004;
  • WebSense CPS (began selling in Russia and around the world in 2007);
  • Perimetrix (a young company, the first version of whose products was announced on its website at the end of 2008).

In conclusion, I would like to add that whether or not one belongs to the class of DLP systems does not make the products worse or better - it is simply a matter of classification and nothing more.

Table 1. Products presented on the Russian market and having certain properties of DLP systems
CompanyProductProduct Features
Data-in-motion protectionData-in-use protectionProtection of “data at rest” (data-at-rest)Centralized storage and auditing
InfoWatchIW Traffic MonitorYesYesNoYes
IW CryptoStorageNoNoYesNo
PerimetrixSafeSpaceYesYesYesYes
Jet Information SystemsDozor Jet (SKVT)YesNoNoYes
Jet Watch (SMAP)YesNoNoYes
Smart Line Inc.DeviceLockNoYesNoYes
SecurITZlockNoYesNoNo
SecretKeeperNoYesNoNo
SpectorSoftSpector 360YesNoNoNo
Lumension SecuritySanctuary Device ControlNoYesNoNo
WebSenseWebsense Content ProtectionYesYesYesNo
InformzashitaSecurity StudioNoYesYesNo
PrimetekInsiderNoYesNoNo
AtomPark SoftwareStaffCopNoYesNoNo
SoftInformSearchInform ServerYesYesNoNo
Table 2. Compliance of products presented on the Russian market with the criteria for belonging to the class of DLP systems
CompanyProductCriterion for belonging to DLP systems
MultichannelUnified managementActive protectionConsidering both content and context
InfoWatchIW Traffic MonitorYesYesYesYes
PerimetrixSafeSpaceYesYesYesYes
“Jet Infosystems”“Dozor Jet” (SKVT)NoNoYesYes
“Dozor Jet” (SMAP)NoNoYesYes
"Smart Line Inc"DeviceLockNoNoNoNo
SecurITZlockNoNoNoNo
Smart Protection Labs SoftwareSecretKeeperYesYesYesNo
SpectorSoftSpector 360YesYesYesNo
Lumension SecuritySanctuary Device ControlNoNoNoNo
WebSenseWebsense Content ProtectionYesYesYesYes
“Informzashita”Security StudioYesYesYesNo
"Primtek"InsiderYesYesYesNo
“AtomPark Software”StaffCopYesYesYesNo
“SoftInform”SearchInform ServerYesYesNoNo
“Infodefense”“Infoperimeter”YesYesNoNo