Big data measurement system. What is Big Data: characteristics, classification, examples. Tasks related to Big Data

The term "Big Data" may be recognizable today, but there is still quite a bit of confusion surrounding it as to what it actually means. In truth, the concept is constantly evolving and being redefined as it remains the driving force behind many ongoing waves of digital transformation, including artificial intelligence, data science, and the Internet of Things. But what is Big-Data technology and how is it changing our world? Let's try to understand the essence of Big Data technology and what it means in simple words.

The Amazing Growth of Big Data

It all started with an explosion in the amount of data we have created since the dawn of the digital age. This is largely due to the development of computers, the Internet and technologies that can “snatch” data from the world around us. Data in itself is not a new invention. Even before the age of computers and databases, we used paper transaction records, customer records, and archival files that constitute data. Computers, especially spreadsheets and databases, have made it easy for us to store and organize data on a large scale. Suddenly information was available with just one click.

However, we have come a long way from the original tables and databases. Today, every two days we create as much data as we received from the very beginning until the year 2000. That's right, every two days. And the amount of data we create continues to grow exponentially; by 2020, the amount of available digital information will increase from approximately 5 zettabytes to 20 zettabytes.

Nowadays, almost every action we take leaves its mark. We generate data every time we go online, when we carry our smartphones equipped with a search engine, when we talk to our friends through social networks or chats, etc. In addition, the amount of machine-generated data is also growing rapidly. Data is generated and shared when our smart home devices communicate with each other or with their home servers. Industrial equipment in plants and factories is increasingly equipped with sensors that accumulate and transmit data.

The term "Big-Data" refers to the collection of all this data and our ability to use it to our advantage in a wide range of areas, including business.

How does Big-Data technology work?

Big Data works on the principle: the more you know about a particular subject or phenomenon, the more reliably you can achieve new understanding and predict what will happen in the future. As we compare more data points, relationships emerge that were previously hidden, and these relationships allow us to learn and make better decisions. Most often, this is done through a process that involves building models based on the data we can collect and then running simulations that tweak the values ​​of the data points each time and track how they affect our results. This process is automated—modern analytics technology will run millions of these simulations, tweaking every possible variable until they find a model—or idea—that helps solve the problem they're working on.

Bill Gates hangs over the paper contents of one CD

Until recently, data was limited to spreadsheets or databases - and everything was very organized and neat. Anything that couldn't be easily organized into rows and columns was considered too complex to work with and was ignored. However, advances in storage and analytics mean that we can capture, store and process large amounts of different types of data. As a result, “data” today can mean anything from databases to photographs, videos, sound recordings, written texts and sensor data.

To make sense of all this messy data, Big Data-based projects often use cutting-edge analytics using artificial intelligence and computer learning. By teaching computing machines to determine what specific data is—through pattern recognition or natural language processing, for example—we can teach them to identify patterns much faster and more reliably than we can ourselves.

How is Big Data used?

This ever-increasing flow of sensor data, text, voice, photo and video data means that we can now use data in ways that would have been unimaginable just a few years ago. This is bringing revolutionary changes to the business world in almost every industry. Today, companies can predict with incredible accuracy which specific categories of customers will want to make a purchase and when. Big Data also helps companies carry out their activities much more efficiently.

Even outside of business, projects related to Big Data are already helping to change our world in various ways:

  • Improving Healthcare – Data-driven medicine has the ability to analyze vast amounts of medical information and images into models that can help detect disease at an early stage and develop new drugs.
  • Predicting and responding to natural and man-made disasters. Sensor data can be analyzed to predict where earthquakes are likely to occur, and human behavior patterns provide clues that help organizations provide assistance to survivors. Big Data technology is also used to track and protect the flow of refugees from war zones around the world.
  • Preventing crime. Police forces are increasingly using data-driven strategies that incorporate their own intelligence information and publicly available information to use resources more effectively and take deterrent action where necessary.

The best books about Big-Data technology

  • Everybody lies. Search engines, Big Data and the Internet know everything about you.
  • BIG DATA. All technology in one book.
  • Happiness industry. How Big Data and new technologies help add emotion to goods and services.
  • Revolution in analytics. How to improve your business in the era of Big Data using operational analytics.

Problems with Big Data

Big Data gives us unprecedented ideas and opportunities, but also raises problems and questions that need to be addressed:

  • Data Privacy – The Big Data we generate today contains a lot of information about our personal lives, the privacy of which we have every right to. More and more, we are being asked to balance the amount of personal data we disclose with the convenience that Big Data-based apps and services offer.
  • Data Security - Even if we decide we are happy with someone having our data for a specific purpose, can we trust them to keep our data safe and secure?
  • Data discrimination - once all the information is known, will it be acceptable to discriminate against people based on data from their personal lives? We already use credit scores to decide who can borrow money, and insurance is also heavily data-driven. We should expect to be analyzed and assessed in more detail, but care must be taken to ensure that this does not make life more difficult for those with fewer resources and limited access to information.

Performing these tasks is an important component of Big Data and must be addressed by organizations that want to use such data. Failure to do this can leave a business vulnerable, not only in terms of its reputation, but also legally and financially.

Looking to the future

Data is changing our world and our lives at an unprecedented pace. If Big Data is capable of all this today, just imagine what it will be capable of tomorrow. The amount of data available to us will only increase, and analytics technology will become even more advanced.

For businesses, the ability to apply Big Data will become increasingly critical in the coming years. Only those companies that view data as a strategic asset will survive and thrive. Those who ignore this revolution risk being left behind.



How do you like the article? Even more suitable content on my wonderful YouTube channel

Just be careful! On my YouTube you can become too smart... 👇

Big data (or Big Data) is a set of methods for working with huge volumes of structured or unstructured information. Big data specialists process and analyze it to obtain visual, human-perceivable results. Look At Me talked to professionals and found out what the situation is with big data processing in Russia, where and what is best to study for those who want to work in this field.

Alexey Ryvkin about the main trends in the field of big data, communication with customers and the world of numbers

I studied at the Moscow Institute of Electronic Technology. The main thing I managed to take away from there was fundamental knowledge in physics and mathematics. Simultaneously with my studies, I worked at the R&D center, where I was involved in the development and implementation of noise-resistant coding algorithms for secure data transmission. After finishing my bachelor's degree, I entered the master's program in business informatics at the Higher School of Economics. After that I wanted to work at IBS. I was lucky that at that time, due to a large number of projects, there was an additional recruitment of interns, and after several interviews I began working at IBS, one of the largest Russian companies in this field. In three years, I went from an intern to an enterprise solutions architect. Currently I am developing expertise in Big Data technologies for customer companies from the financial and telecommunications sectors.

There are two main specializations for people who want to work with big data: analysts and IT consultants who create technologies to work with big data. In addition, we can also talk about the profession of Big Data Analyst, i.e. people who directly work with data, with the customer’s IT platform. Previously, these were ordinary mathematical analysts who knew statistics and mathematics and used statistical software to solve data analysis problems. Today, in addition to knowledge of statistics and mathematics, an understanding of technology and the data life cycle is also necessary. This, in my opinion, is the difference between modern Data Analysts and those analysts who came before.

My specialization is IT consulting, that is, I come up with and offer clients ways to solve business problems using IT technologies. People with different experiences come to consulting, but the most important qualities for this profession are the ability to understand the needs of the client, the desire to help people and organizations, good communication and team skills (since it is always working with the client and in a team), good analytical skills. Internal motivation is very important: we work in a competitive environment, and the customer expects unusual solutions and interest in work.

Most of my time is spent communicating with customers, formalizing their business needs and helping them develop the most suitable technology architecture. The selection criteria here have their own peculiarity: in addition to functionality and TCO (Total cost of ownership), non-functional requirements for the system are very important, most often these are response time and information processing time. To convince the customer, we often use a proof of concept approach - we offer to “test” the technology for free on some task, on a narrow set of data, to make sure that the technology works. The solution should create a competitive advantage for the customer by obtaining additional benefits (for example, x-sell, cross-selling) or solve some kind of business problem, say, reduce the high level of loan fraud.

It would be much easier if clients came with a ready-made task, but so far they do not understand that a revolutionary technology has appeared that can change the market in a couple of years

What problems do you face? The market is not yet ready to use big data technologies. It would be much easier if clients came with a ready-made task, but so far they do not understand that a revolutionary technology has appeared that can change the market in a couple of years. This is why we essentially work in startup mode - we don’t just sell technologies, but every time we convince clients that they need to invest in these solutions. This is the position of visionaries - we show customers how they can change their business using data and IT. We are creating this new market - the market for commercial IT consulting in the field of Big Data.

If a person wants to engage in data analysis or IT consulting in the field of Big Data, then the first thing that is important is a mathematical or technical education with good mathematical training. It is also useful to master specific technologies, for example SAS, Hadoop, R language or IBM solutions. In addition, you need to be actively interested in applications for Big Data - for example, how it can be used for improved credit scoring in a bank or customer lifecycle management. This and other knowledge can be obtained from available sources: for example, Coursera and Big Data University. There is also the Customer Analytics Initiative at Wharton University of Pennsylvania, where a lot of interesting materials have been published.

A major problem for those who want to work in our field is the clear lack of information about Big Data. You cannot go to a bookstore or some website and get, for example, a comprehensive collection of cases on all applications of Big Data technologies in banks. There are no such directories. Some of the information is in books, some is collected at conferences, and some you have to figure out on your own.

Another problem is that analysts are comfortable in the world of numbers, but they are not always comfortable in business. These people are often introverted and have difficulty communicating, making it difficult for them to communicate research findings convincingly to clients. To develop these skills, I would recommend books such as The Pyramid Principle, Speak the Language of Diagrams. They help develop presentation skills and express your thoughts concisely and clearly.

Participating in various case championships while studying at the National Research University Higher School of Economics helped me a lot. Case championships are intellectual competitions for students where they need to study business problems and propose solutions to them. There are two types: case championships of consulting firms, for example, McKinsey, BCG, Accenture, as well as independent case championships such as Changellenge. While participating in them, I learned to see and solve complex problems - from identifying a problem and structuring it to defending recommendations for its solution.

Oleg Mikhalsky about the Russian market and the specifics of creating a new product in the field of big data

Before joining Acronis, I was already involved in launching new products to market at other companies. It’s always interesting and challenging at the same time, so I was immediately interested in the opportunity to work on cloud services and data storage solutions. All my previous experience in the IT industry, including my own startup project I-accelerator, came in handy in this area. Having a business education (MBA) in addition to a basic engineering degree also helped.

In Russia, large companies - banks, mobile operators, etc. - have a need for big data analysis, so in our country there are prospects for those who want to work in this area. True, many projects now are integration projects, that is, made on the basis of foreign developments or open source technologies. In such projects, fundamentally new approaches and technologies are not created, but rather existing developments are adapted. At Acronis, we took a different path and, after analyzing the available alternatives, decided to invest in our own development, resulting in a reliable storage system for big data that is not inferior in cost to, for example, Amazon S3, but works reliably and efficiently and on a significantly smaller scale. Large Internet companies also have their own developments in big data, but they are more focused on internal needs rather than meeting the needs of external clients.

It is important to understand the trends and economic forces that influence the field of big data. To do this, you need to read a lot, listen to speeches by authoritative experts in the IT industry, and attend thematic conferences. Now almost every conference has a section on Big Data, but they all talk about it from a different angle: from a technology, business or marketing point of view. You can go for project work or an internship at a company that is already leading projects on this topic. If you are confident in your abilities, then it is not too late to organize a startup in the field of Big Data.

Without constant contact with the market new development risks being unclaimed

True, when you are responsible for a new product, a lot of time is spent on market analytics and communication with potential clients, partners, and professional analysts who know a lot about clients and their needs. Without constant contact with the market, a new development risks being unclaimed. There are always a lot of uncertainties: you have to figure out who the early adopters will be, what you have to offer them, and how to then attract a mass audience. The second most important task is to formulate and convey to developers a clear and holistic vision of the final product in order to motivate them to work in such conditions when some requirements may still change, and priorities depend on feedback coming from the first customers. Therefore, an important task is managing the expectations of clients on the one hand and developers on the other. So that neither one nor the other loses interest and brings the project to completion. After the first successful project, it becomes easier and the main challenge will be to find the right growth model for the new business.

The day before yesterday there were 3 posts about Big Data in the news feed. Yesterday, a colleague sent a note about the same thing. Today they called from Beeline and invited me to a business breakfast on Big Data. Did not go! I am readily and open-mindedly in favor of modern digital technologies. I am against the profanation of expertise and the highly educated laymen who carry it.

A few words for adherents of “big data”, which are completely understandable to marketers.

What is Big Data?

A few words for those who are not entirely familiar with this term. Big data is actually a large stream of poorly structured and unrelated data obtained from unrelated sources; their analysis and construction of event forecasting models based on them. This term appeared relatively recently. Google Trends shows the beginning of active growth in the use of the phrase since 2011.

How can I put it easier? Big Data is the following data:

  • which cannot be processed in Excel
  • the relationship between which a person does not see
  • In addition to those that we did not have time to process yesterday, there is also constantly arriving new data for a new period.

Where does this data come from?

Every second, gigantic megatons of content are generated by world events, news portals, brands, and their trade and information intermediaries. Each store has a sensor at the entrance that detects the arrival of a new visitor to the sales area. Online payment systems record transactions, banks record the movement of cash and non-cash money, stores count checks and analyze their amounts. Search engines record the number and frequency of Internet queries. Social networks see the number of mentions of certain brands, according to the accompanying environment, understand the nature and reason for the mention, mood and attitude.

Well, marketers, your eyes haven’t lit up yet, your hands aren’t shaking with anticipation of “knowing everything”? Calmly! This is not for you! To obtain, put together, and make sense of all this messy data, you need to use cutting-edge analytics using artificial intelligence and monstrous data storage facilities. If everything is solvable with storage facilities, then artificial intelligence still needs to be trained. However, the complexity of this task will be discussed below, but for now let’s preserve the natural desire for a marketer to “know everything” about consumers and let’s go figure it out.

There is a very specific catch with Big Data.

Take a children's scoop, shovel more sand, excuse me, data into it and try to analyze what you collected

Garbage in, errors out

Anyone familiar with computer science, IT technologies and analytics understands that the quality of incoming data and its reliability is the most important thing. It’s very easy to collect and accumulate data, but how can you be sure that you have collected this, there and about that?
  • Do you want to receive data from the Internet? Amazing. Any bot is capable of generating orders of magnitude more traffic and data than your target audience will produce! According to the American organization Association of National Advertisers, in 2015 brands spent $7.2 billion on purchasing fake, non-existent traffic, and last year the figure grew to $10 billion.
  • CNBC estimated that in 2016, up to 20% of online advertising budgets were spent on traffic generated by bots (non-human traffic).
  • Sensors at the entrance to the store? Yes, all the staff who periodically go out to smoke will generate so many visits to your store that it will make this data meaningless. “Non-customers” idly hanging around the shopping center and periodically entering your store will finally finish off the idea of ​​“counting store traffic.” Are you saying that these “idle” numbers can be predicted by the model and cut off? Wonderful, but how to see and cut off the growth of “idle” caused by ATL advertising of this shopping center or anchor tenant? What if your mass advertising is going on at the same time?
  • Logical machine programming will not take into account the fact that people give “likes” not always expressing approval. They do this: out of pity, spontaneously, out of habit, because of the approval of the author himself, and not his specific opinion, they like bots, etc.
The biggest problem with big data is the data itself. As for people, you shouldn’t take everything they generate too definitely on faith. Consumers are confused, employees lie, contractors, motivated by results, cheat.

Incomplete data

Let's consider such a generally trivial task for marketing as monitoring (not yet predicting) the behavior of the target group. And an even more specific subject of observation is the social interaction of members of this target group. Why is this necessary? One of the goals of interaction is to reduce uncertainty - gain knowledge and remove doubts when choosing regarding your marketing subject. This reduction in uncertainty is a consequence of the joint participation of people in inter-individual or group activities, for example, during communication on social networks.

The trouble is that no matter how much we collect “big data,” this communication is not limited only to social networks. People communicate offline: colleagues at work, random communication between members of the target audience on vacation, drinking together in a pub, casual conversation on public transport. Such interaction will be beyond the control of Big Data observers. Truly completeness of data can only be ensured:

  • With total control over all members of the target group. For even in the toilet of a public cinema such an interaction may occur and not be taken into account for subsequent analysis!
  • In a closed system. Let's say, count the number of users using reusable metro tickets and understand what part of this number also uses ground public transport. But how to calculate how many of those who did not use taxis?
The second problem with incomplete data is that the process of such interaction has two aspects: objective and subjective. The objective side is connections that do not depend on individuals or groups, but can be objectively and meaningfully taken into account in Big Data (for example, purchase/sale, transfer and reception of a unit of information). This can be taken into account, calculated, processed and a model can be built on this.

The subjective side of interaction is the conscious, often emotionally charged attitude of individuals towards each other that arises in the process of interaction: mutual expectations of a certain behavior or reaction, personal disposition towards the interaction partner, pleasantness of the partner’s appearance and voice, and the like. All this affects both the interaction itself and the result of such interaction. These same aspects are very difficult to trace and analyze. A surrogate that allows us to at least somehow trace the subjective aspect of interaction is likes and emoticons on social networks. Based on their presence, you can try to determine the degree of involvement, mood and attitude. But, damn it, how to do this if those interacting with them do not use them? And of course, people don’t use emojis on the streets, in shops, in transport - people don’t live on social networks and communicate differently outside of them!

Analyzing the interactions of members of the target group only based on the fact of interaction (sale) or only where you want to observe such interaction (store), not taking into account the subjective quality of interaction - this is dooming yourself to not understanding whether such interaction will continue tomorrow, whether there will be more purchases in this store - whether you liked the purchase, whether you were satisfied with the marketing item or not.

Monkey, grenade, and in the grenade - Big Data

Often, when analyzing something marketing from the market, we observe two phenomena that occur together, but are in no way related to each other: a fall in the number of receipts for our product and an increase in prices for goods in the consumer basket. If such phenomena occur in parallel and for quite a long time, a less-than-founded assumption may arise in the head of a marketing specialist that this is somehow interconnected. There is such a concept as “epiphenomenon” - this is an error in cause-and-effect relationships.

N. Taleb in Antifragile says:

If birds are lectured on the theory of flight, then they will fly - you don’t believe it, it’s stupid, isn’t it? Here's an example: Rich countries conduct more scientific research, so we can assume that science generates wealth. This is more plausible, right? And it completely correlates with worldly wisdom - “if you are so smart, then why are you so poor?” In fact, in the world everything was the other way around - first, some countries acquired wealth, and only then began to develop science. Science is impossible in a poor country.

Targeted advertising using Big Data is still shooting in the dark, writes Forbes. There is no evidence yet that all these techniques, based on the analysis of cookies, social media and other clever “targeting”, work consistently. And you yourself have encountered this a hundred times - when contextual advertising hits you by, not meeting your interests at all, or when you have already bought a product, and a month later you are shown contextually this particular product. Someone is throwing money at you at this moment!?

Big Data in the hands of theory generators is an ideal tool for discovering and promoting an epiphenomenon, and the accumulation of data and observation of them, in the absence of clear forecasting models based on the analysis of realities and not hypotheses, can generate a large number of such false “discoveries.” Why are hypotheses based on nothing in science a blessing, but in marketing death? If one of the scientists writes a dissertation and makes a mistake, it’s okay, it’s forgotten. But if these theories penetrate marketing, a “monkey business” will turn out to be profitable.

First learn to predict the weather for tomorrow

Learning to collect data and sift out “garbage” is a problem, but it is not so significant compared to the problem of the lack of human behavior models and forecasting algorithms. There is such a common joke: “I went for a pink blouse, but I couldn’t resist this purple handbag” - this is about the psychology of consumer behavior, which trying to predict is as “simple” as the weather for the weekend. The ability to predict trends based on Big Data has been greatly exaggerated. It's not even a lack of ability among marketing analysts.

Forecast errors are not mathematical errors, but a fundamental problem!

It's all about the notorious "human factor". The likelihood that a change in people's behavior or assessments observed now will be repeated in the future is not that high. People learn on their own faster than a predictive model can be built. At any moment, a new influencing factor may appear in a person’s views, in society, in a market segment, in the response of brands to the activities of competitors, which will break all your hypotheses.

Despite hundreds of computers and an army of meteorologists, no one can predict the weather three days in advance, so why would you expect the future of your market to be predicted three years in advance?...
Jack Trout, Al Ries "22 Immutable Laws of Marketing"
ISBN: 5-17-024999-3, 978-5
And in this opinion, the master is completely .

Want examples?

Of course, any opinion can be refuted. If not now, then in three hundred years, when experience has appeared and technology has advanced. But today there are examples confirming doubts about the possibility of forecasting using Big Data. These examples are quite convincing.

How was the flu predicted?

Many adherents' favorite predictions for the Big Date were Google Flu Trends - the graphs convincingly showed that it was possible to predict flu epidemics on the Internet, faster and more reliably than doctors. It is enough to analyze user requests for the names of drugs, their descriptions and addresses of pharmacies. This example wandered from presentation to presentation, from article to article. As a result, he ended up in serious books. Once it worked, what next? Everything turned out to be no more accurate than that of the domestic Hydrometeorological Center. The first signal of error was in 2009, when he completely missed the global swine flu epidemic. In 2012, the system failed again - Google Flu Trends overestimated the peak of the next epidemic by more than twice. (Writes in Nature magazine.)

Victory Prediction

During the congressional elections in the primaries in Virginia, according to analysts, E. Cantor from the Republican Party should have won the election. And indeed, he was ahead of his competitors by 34%. However, he lost crushingly - minus 10% of the winner. The mistake was that the model was focused on “typical voters”, taking into account their voting history, behavior and preferences. But this time the turnout turned out to be much higher than usual, and voters who did not fit into the model joined the game. But the example of President Trump’s victory in the election and the unequivocal forecasts of all analysts are not in his favor - this is a much more convincing example of the fact that forecasts using Big Data are a terrible business!

Write long texts

...taught a few years ago by those who observed Google's search engine ranking algorithm. Two thousand characters, numbers and bullets, links to primary sources - these are the few things that promised success in ranking the site. During the practical implementation of this advice, SEO specialists began to write complex and long texts en masse, even on the main page of the site - if you know the algorithm, you can always influence the results. If you know the forecast algorithm based on Big Data, you can easily deceive the system.

MTS Gate

Back in 2015, at the Future of Telecom Forum, the head of the Big Data department at MTS, Vitaly Saginov, spoke about the company’s approaches to developing the direction of working with big data. In his report, he noted: “in the near future, the company’s income from the sale of data analytics will be comparable to income from messaging and SMS.” It’s wonderful, but in May 2018, all news feeds and TV spread the information that Moscow resident Alexey Nadezhin, a client of this cellular operator, discovered that his gate, installed at the entrance to the gardening partnership, “independently” subscribed to paid SMS services and responded on the information sent to them.

The MTS press service then said that “specialists carried out the necessary work to ensure that such an incident does not happen again.” Whether this means that the gate itself typed something into the phone or that subscriptions were issued without the subscriber’s consent is not specified in the company’s commentary. But the SIM card installed in the gate automation received a lot of SMS from short numbers, and the gate, it turns out, “themselves” answered them, sending SMS in response. Well, where is the result of many years of work with “big data” to prevent this from happening? Declaring the ability to collect, analyze and forecast using Big Data does not mean doing it with adequate quality!

What about Procter&Gamble?

At the largest European exhibition and conference on digital marketing Dmexco’2017, held in Cologne, Procter&Gamble in its presentation elaborated on the fact that the company has greatly reduced the budgets allocated for programmatic purchasing. For the first time, the largest transnational advertiser of this level openly, in public, argued with the thesis of advertising technology companies (English AdTech), which until now had argued that the reach of users is much more important than the source of direct advertising traffic. As a result, the company has radically reduced the list of Internet sites on which it is ready to place its advertising - there is no adequate forecasting model, there is no need to spend money on creating something in the future.

How Sberbank suffered from artificial intelligence

In February 2019, during his speech at the “Numbers Lesson” at a private school in Moscow, the head of Sberbank German Gref, answering a question about the risks of introducing technology, said: “Artificial intelligence, as a rule, makes decisions in large systems. A small error that creeps into an algorithm can lead to very large consequences.” Responding to a request from RBC about the essence of losses from the introduction of artificial intelligence, the Sberbank press service clarified that “we are not talking about direct losses, but rather about lost profits.” However, the head of the bank spoke definitely about the losses; the meaning of his statement is that Sberbank has already lost billions of rubles as a result of artificial intelligence errors.

Big Data in marketing - it's time to give up illusions

No matter how many generations of predictors live on Earth, no matter how many shamans and priests try all possible prediction tools, the result is the same - money is siphoned from the pocket of those concerned about the future into the pocket of the predictor. Today, armed with super-powerful computers, predictors are trying to do the same thing as their venerable founders of this ancient profession. The idea of ​​a person as a predictable and predictable automaton is erroneous. Today, Big Data is another fetish and another “crystal ball” in the long, centuries-old list of attributes of predictors of the future. All “compelling examples” of predictive ability in Big Data are falling apart or will be refuted by the harsh reality in the coming years.

Those who have access to statistics: banks, telephone companies, aggregators, yesterday did not know why they themselves needed this data, but today they certainly want to make money on their clients again by reselling them columns of numbers.

Tired of the Big Data hype

Of course, it cannot be said that Google Flu Trends does not work at all, and prediction based on Big Data is a scam. Just a hammer can be used to create something beautiful, but most people use it to fix something bad, and some use it for no good at all. Now, it seems that the greatly increased pressure of forecasters on Big Data has begun to interfere with marketers. Everywhere! Except in Russia, everything, even the most progressive and rapidly growing trends, is observed with a delay of five years.

As for the rest of the world, advertisers are tired of years of hype around Big Data. In general, the first thing that caught my eye at Dmexco’2017 on the very first day of the exhibition was that the term Big Data practically began to disappear from speeches and presentations. This was noticeable, since over the past four years every second speaker has used this phrase. And the reason for the easing of “Big Data pressure” on the business community and the Internet community is clear:

A lot of traffic means a lot of data, a lot of fake traffic generates a lot of fake data, on the basis of which unreliable models are built, and multi-billion-dollar budgets are spent on following them.

What's next?

If until now all the thoughts of IT specialists and analysts were occupied with creating databases, creating principles for storing, classifying information received from different sources, now that such repositories have been created, for many brands the pointlessness of analyzing different sources, accumulating and storing has become obvious data as such. Storing and superficial analysis of large volumes of information costs a lot of money and is not justified if there are no data comparison models, if the brand is not able to process and use statistics to improve sales efficiency, there is no practical value.

Rethinking the challenge of “getting access to data in marketing” has resulted in the emergence of a new term that reflects the growing demand for effective use of data. In their speeches at Dmexco’2017, in blogs and at conferences, speakers are increasingly talking about technologies for working with big data as part of predictive marketing.

Well, it's about marketing! But what about prediction using Big Data in general? This will definitely work in the natural sciences, where there is accumulated data over a long period of time, where a rigorous mathematical model and understanding of ongoing natural processes have been rolled out over the same period. This will really work if you analyze macro trends that occur on a macro scale - trends in society. This will really work if you analyze closed stable microscale systems (a village, a store at a station where trains with random customers never arrive). This is applicable to assess the future behavior of a particular person. But as soon as mutual influence appears, newly emerging trends, “black swans” (Nasim Taleb) – the system under study appears as a “black box”.

At one time I heard the term “Big Data” from German Gref (head of Sberbank). They say that they are now actively working on implementation, because this will help them reduce the time they work with each client.

The second time I came across this concept was in a client’s online store, on which we were working and increasing the assortment from a couple of thousand to a couple of tens of thousands of product items.

The third time I saw that Yandex required a big data analyst. Then I decided to delve deeper into this topic and at the same time write an article that will tell what kind of term it is that excites the minds of TOP managers and the Internet space.

VVV or VVVVV

I usually start any of my articles with an explanation of what kind of term this is. This article will be no exception.

However, this is caused primarily not by the desire to show how smart I am, but by the fact that the topic is truly complex and requires careful explanation.

For example, you can read what big data is on Wikipedia, not understand anything, and then return to this article to still understand the definition and applicability for business. So, let's start with a description, and then to examples for business.

Big data is big data. Amazing, right? In fact, this translates from English as “big data”. But this definition, one might say, is for dummies.

Important. Big data technology is an approach/method of processing more data to obtain new information that is difficult to process using conventional methods.

Data can be either processed (structured) or scattered (i.e. unstructured).

The term itself appeared relatively recently. In 2008, a scientific journal predicted this approach as being necessary to deal with large amounts of information that are growing exponentially.

For example, every year the information on the Internet that needs to be stored and, of course, processed increases by 40%. Again. +40% New information appears on the Internet every year.

If printed documents are clear and the methods for processing them are also clear (transfer to electronic form, stitch into one folder, number), then what to do with information that is presented in completely different “media” and other volumes:

  • Internet documents;
  • blogs and social networks;
  • audio/video sources;
  • measuring devices;

There are characteristics that allow information and data to be classified as big data.

That is, not all data may be suitable for analytics. These characteristics precisely contain the key concept of big data. They all fit into three Vs.

  1. Volume (from the English volume). Data is measured in terms of the physical volume of the “document” to be analyzed;
  2. Speed ​​(from the English velocity). Data does not stand still in its development, but is constantly growing, which is why its rapid processing is required to obtain results;
  3. Variety (from the English variety). The data may not be of the same format. That is, they can be scattered, structured or partially structured.

However, from time to time a fourth V (veracity) and even a fifth V are added to VVV (in some cases this is viability, in others it is value).

Somewhere I even saw 7V, which characterizes data related to big data. But in my opinion, this is from a series (where Ps are periodically added, although the initial 4 are enough for understanding).

WE ARE ALREADY MORE THAN 29,000 people.
TURN ON

Who needs this?

A logical question arises: how can you use the information (if anything, big data is hundreds and thousands of terabytes)? Not even that.

Here is the information. So why was big date invented then? What is the use of big data in marketing and business?

  1. Conventional databases cannot store and process (I’m not even talking about analytics now, but simply storing and processing) huge amounts of information.

    Big data solves this main problem. Successfully stores and manages large volumes of information;

  2. Structures information coming from various sources (video, images, audio and text documents) into one single, understandable and digestible form;
  3. Generating analytics and creating accurate forecasts based on structured and processed information.

It's complicated. To put it simply, any marketer who understands that if you study a large amount of information (about you, your company, your competitors, your industry), you can get very decent results:

  • Full understanding of your company and your business from the side of numbers;
  • Study your competitors. And this, in turn, will make it possible to get ahead by dominating them;
  • Find out new information about your clients.

And precisely because big data technology gives the following results, everyone is rushing around with it.

They are trying to incorporate this business into their company in order to increase sales and reduce costs. And if specifically, then:

  1. Increasing cross-selling and additional sales due to better knowledge of customer preferences;
  2. Search for popular products and reasons why people buy them (and vice versa);
  3. Improvement of a product or service;
  4. Improving the level of service;
  5. Increasing loyalty and customer focus;
  6. Fraud prevention (more relevant for the banking sector);
  7. Reducing unnecessary costs.

The most common example, which is given in all sources, is, of course, the Apple company, which collects data about its users (phone, watch, computer).

It is because of the presence of an eco-system that the corporation knows so much about its users and subsequently uses this to make a profit.

You can read these and other examples of use in any other article except this one.

Let's go to the future

I'll tell you about another project. Or rather, about a person who builds the future using big data solutions.

This is Elon Musk and his company Tesla. His main dream is to make cars autonomous, that is, you get behind the wheel, turn on the autopilot from Moscow to Vladivostok and... fall asleep, because you don’t need to drive the car at all, because it will do everything itself.

It would seem fantastic? But no! It’s just that Elon acted much wiser than Google, which controls cars using dozens of satellites. And he went the other way:

  1. Every car sold is equipped with a computer that collects all the information.

    All – this means everything. About the driver, his driving style, the roads around him, the movement of other cars. The volume of such data reaches 20-30 GB per hour;

  2. Next, this information is transmitted via satellite communication to a central computer, which processes this data;
  3. Based on the big data processed by this computer, a model of an unmanned vehicle is built.

By the way, if Google is doing pretty badly and their cars get into accidents all the time, then Musk, due to the fact that he is working with big data, is doing much better, because test models show very good results.

But... It's all about economics. What are we all about profit, but about profit? Much that a big date can decide is completely unrelated to earnings and money.

Google statistics, based on big data, show an interesting thing.

Before doctors announce the beginning of a disease epidemic in a certain region, the number of search queries about the treatment of this disease in that region increases significantly.

Thus, proper study of data and their analysis can form forecasts and predict the onset of an epidemic (and, accordingly, its prevention) much faster than the conclusion of official bodies and their actions.

Application in Russia

However, Russia, as always, is “slowing down” a little. So the very definition of big data in Russia appeared no more than 5 years ago (I’m talking about ordinary companies now).

And this despite the fact that this is one of the fastest growing markets in the world (drugs and weapons smoke nervously on the sidelines), because every year the market for software for collecting and analyzing big data grows by 32%.

To characterize the big data market in Russia, I remember one old joke. A big date is like having sex before you turn 18.

Everyone talks about it, there is a lot of hype around it and little real action, and everyone is ashamed to admit that they themselves are not doing it. Indeed, there is a lot of hype around this, but little real action.

Although the well-known research company Gartner already announced in 2015 that big data is no longer a growing trend (like artificial intelligence, by the way), but completely independent tools for the analysis and development of advanced technologies.

The most active niches where big data is used in Russia are banks/insurance (it’s not for nothing that I started the article with the head of Sberbank), telecommunications sector, retail, real estate and... the public sector.

As an example, I’ll tell you in more detail about a couple of economic sectors that use big data algorithms.

Banks

Let's start with banks and the information they collect about us and our actions. As an example, I took the TOP 5 Russian banks that are actively investing in big data:

  1. Sberbank;
  2. Gazprombank;
  3. VTB 24;
  4. Alfa Bank;
  5. Tinkoff Bank.

It is especially pleasant to see Alfa Bank among the Russian leaders. At a minimum, it’s nice to know that the bank, of which you are an official partner, understands the need to introduce new marketing tools into its company.

But I want to show examples of the use and successful implementation of big data on a bank that I like for the unconventional view and actions of its founder.

I'm talking about Tinkoff Bank. Their main challenge was to develop a system for analyzing big data in real time due to their growing customer base.

Results: the time of internal processes was reduced by at least 10 times, and for some – by more than 100 times.

Well, a little distraction. Do you know why I started talking about the unusual antics and actions of Oleg Tinkov?

It’s just that, in my opinion, it was they who helped him transform from a mediocre businessman, of which there are thousands in Russia, into one of the most famous and recognizable entrepreneurs. To prove it, watch this unusual and interesting video:

Real estate

In real estate, everything is much more complicated. And this is exactly the example that I want to give you to understand big dates within ordinary business. Initial data:

  1. Large volume of text documentation;
  2. Open sources (private satellites transmitting data on earth changes);
  3. A huge amount of uncontrolled information on the Internet;
  4. Constant changes in sources and data.

And on the basis of this, it is necessary to prepare and evaluate the value of a land plot, for example, near a Ural village. It will take a professional a week to do this.

The Russian Society of Appraisers & ROSEKO, which actually implemented big data analysis using software, will take no more than 30 minutes of leisurely work. Compare, a week and 30 minutes. A huge difference.

Well, for a snack

Of course, huge amounts of information cannot be stored and processed on simple hard drives.

And the software that structures and analyzes data is generally intellectual property and each time the author’s development. However, there are tools on the basis of which all this beauty is created:

  • Hadoop & MapReduce;
  • NoSQL databases;
  • Data Discovery class tools.

To be honest, I won’t be able to clearly explain to you how they differ from each other, since getting to know and working with these things is taught in physics and mathematics institutes.

Why then did I talk about this if I couldn’t explain it? Remember in all the movies, robbers go into any bank and see a huge number of all sorts of pieces of hardware connected to wires?

It's the same in big date. For example, here is a model that is currently one of the leaders on the market.

Big date tool

The cost of the maximum configuration reaches 27 million rubles per rack. This is, of course, the deluxe version. I want you to try out the creation of big data in your business in advance.

Briefly about the main thing

You may ask, why do you, a small and medium-sized business, need to work with big data?

To this I will answer you with a quote from one person: “In the near future, clients will be in demand for companies that better understand their behavior and habits and best suit them.”

But let's face it. To implement big data in a small business, you need to have not only large budgets for the development and implementation of software, but also for the maintenance of specialists, at least such as a big data analyst and a system administrator.

And now I am silent about the fact that you must have such data for processing.

OK. The topic is almost not applicable for small businesses. But this does not mean that you need to forget everything you read above.

Just study not your data, but the results of data analytics from well-known both foreign and Russian companies.

For example, the retail chain Target, using big data analytics, found that pregnant women before the second trimester of pregnancy (from the 1st to the 12th week of pregnancy) are actively buying UN-fragranced products.

Using this data, they send them coupons with limited-time discounts on unscented products.

What if you are just a very small cafe, for example? Yes, very simple. Use a loyalty app.

And after some time and thanks to the accumulated information, you will be able not only to offer customers dishes that are relevant to their needs, but also to see the most unsold and highest-margin dishes in just a couple of mouse clicks.

Hence the conclusion. It is unlikely that a small business should implement big data, but it is imperative to use the results and developments of other companies.

You know this famous joke, right? Big Data is like sex before 18:

  • everyone thinks about it;
  • everyone talks about it;
  • everyone thinks their friends do it;
  • almost no one does this;
  • whoever does it does it badly;
  • everyone thinks it will work out better next time;
  • no one takes security measures;
  • anyone is ashamed to admit that they don’t know something;
  • if someone succeeds at something, there is always a lot of noise about it.

But let's be honest, with any hype there will always be the usual curiosity: what kind of fuss is there and is there something really important there? In short, yes, there is. Details are below. We have selected for you the most amazing and interesting applications of Big Data technologies. This small market study, using clear examples, confronts us with a simple fact: the future does not come, there is no need to “wait another n years and the magic will become reality.” No, it has already arrived, but is still invisible to the eye and therefore the burning of the singularity has not yet burned a certain point of the labor market so much. Go.

1 How Big Data technologies are applied where they originated

Large IT companies are where data science originated, so their internal knowledge in this area is the most interesting. Campaign Google, the birthplace of the Map Reduce paradigm, whose sole purpose is to train its programmers in machine learning technologies. And this is where their competitive advantage lies: after acquiring new knowledge, employees will introduce new methods in those Google projects where they constantly work. Imagine how huge the list of areas in which a campaign can revolutionize is. One example: neural networks are used.

The corporation implements machine learning in all its products. Its advantage is the presence of a large ecosystem, which includes all digital devices used in everyday life. This allows Apple to reach an impossible level: the campaign has more user data than any other. At the same time, the privacy policy is very strict: the corporation has always boasted that it does not use customer data for advertising purposes. Accordingly, user information is encrypted so that Apple lawyers or even the FBI with a warrant cannot read it. Here you will find a large overview of Apple's developments in the field of AI.

2 Big Data on 4 wheels

A modern car is an information store: it accumulates all the data about the driver, the environment, connected devices and itself. Soon, a single vehicle that is connected to a network like the one will generate up to 25 GB of data per hour.

Vehicle telematics has been used by automakers for many years, but there is now lobbying for a more sophisticated data collection method that takes full advantage of Big Data. This means that technology can now alert the driver to poor road conditions by automatically activating the anti-lock braking and traction control systems.

Other companies, including BMW, are using Big Data technology, combined with information collected from prototypes being tested, in-vehicle error memory systems, and customer complaints, to identify model weaknesses early in production. Now, instead of manually evaluating data, which takes months, a modern algorithm is used. Errors and troubleshooting costs are reduced, which speeds up information analysis workflows at BMW.

According to expert estimates, by 2019 the market turnover of connected cars will reach $130 billion. This is not surprising, given the pace of integration by automakers of technologies that are an integral part of the vehicle.

Using Big Data helps make the car safer and more functional. Thus, Toyota by integrating information communication modules (DCM). This Big Data tool processes and analyzes the data collected by DCM to further extract value from it.

3 Application of Big Data in medicine


The implementation of Big Data technologies in the medical field allows doctors to study the disease more thoroughly and choose an effective course of treatment for a particular case. Thanks to the analysis of information, it becomes easier for health workers to predict relapses and take preventive measures. The result is a more accurate diagnosis and improved treatment methods.

The new technique allowed us to look at patients' problems from a different perspective, which led to the discovery of previously unknown sources of the problem. For example, some races are genetically more predisposed to heart disease than other ethnic groups. Now, when a patient complains of a certain disease, doctors take into account data about members of his race who complained of the same problem. Collection and analysis of data allows us to learn much more about patients: from food preferences and lifestyle to the genetic structure of DNA and metabolites of cells, tissues, and organs. Thus, the Center for Children's Genomic Medicine in Kansas City uses patients and analyzes the mutations in the genetic code that cause cancer. An individual approach to each patient, taking into account his DNA, will raise the effectiveness of treatment to a qualitatively different level.

Understanding how Big Data is used is the first and very important change in the medical field. When a patient undergoes treatment, a hospital or other healthcare facility can obtain a lot of relevant information about the person. The collected information is used to predict disease relapses with a certain degree of accuracy. For example, if a patient has suffered a stroke, doctors study information about the time of cerebrovascular accident, analyze the intermediate period between previous precedents (if any), paying special attention to stressful situations and heavy physical activity in the patient’s life. Based on this data, hospitals provide the patient with a clear action plan to prevent the possibility of a stroke in the future.

Wearable devices also play a role, helping to identify health problems even if a person does not have obvious symptoms of a particular disease. Instead of assessing the patient’s condition through a long course of examinations, the doctor can draw conclusions based on the information collected by a fitness tracker or smart watch.

One of the latest examples is . While the man was being examined for a new seizure caused by a missed medication, doctors discovered that the man had a much more serious health problem. This problem turned out to be atrial fibrillation. The diagnosis was made thanks to the fact that the department staff gained access to the patient’s phone, namely to the application associated with his fitness tracker. Data from the application turned out to be a key factor in determining the diagnosis, because at the time of the examination, no cardiac abnormalities were detected in the man.

This is just one of the few cases that shows why use big data plays such a significant role in the medical field today.

4 Data analysis has already become the core of retail

Understanding user queries and targeting is one of the largest and most publicized areas of application of Big Data tools. Big Data helps analyze customer habits in order to better understand consumer needs in the future. Companies are looking to expand the traditional data set with information from social networks and browser search history in order to create the most complete customer picture possible. Sometimes large organizations choose to create their own predictive model as a global goal.

For example, the Target store chain, using in-depth data analysis and its own forecasting system, manages to determine with high accuracy - . Each client is assigned an ID, which in turn is linked to a credit card, name or email. The identifier serves as a kind of shopping cart, where information about everything that a person has ever purchased is stored. Network specialists have found that pregnant women actively purchase unscented products before the second trimester of pregnancy, and during the first 20 weeks they rely on calcium, zinc and magnesium supplements. Based on the data received, Target sends coupons for baby products to customers. The discounts on goods for children themselves are “diluted” with coupons for other products, so that offers to buy a crib or diapers do not look too intrusive.

Even government departments have found a way to use Big Data technologies to optimize election campaigns. Some believe that Barack Obama's victory in the 2012 US presidential election was due to the excellent work of his team of analysts, who processed huge amounts of data in the right way.

5 Big Data protects law and order


Over the past few years, law enforcement agencies have been able to figure out how and when to use Big Data. It is a well-known fact that the National Security Agency uses Big Data technologies to prevent terrorist attacks. Other departments are using advanced methodology to prevent smaller crimes.

The Los Angeles Police Department uses . She does what is commonly called proactive policing. Using crime reports over a period of time, the algorithm identifies areas where crime is most likely to occur. The system marks such areas on the city map with small red squares and this data is immediately transmitted to patrol cars.

Chicago cops use Big Data technologies in a slightly different way. Law enforcement officers in the Windy City do the same, but it is aimed at outlining a “risk circle” consisting of people who could be a victim or participant in an armed attack. According to The New York Times, this algorithm assigns a person a vulnerability rating based on his criminal history (arrests and participation in shootings, membership in criminal groups). The system's developer says that while the system examines a person's criminal history, it does not take into account secondary factors such as a person's race, gender, ethnicity and location.

6 How Big Data technologies help cities develop


Veniam CEO Joao Barros shows a map of tracking Wi-Fi routers on Porto buses

Data analysis is also used to improve a number of aspects of the life of cities and countries. For example, knowing exactly how and when to use Big Data technologies, you can optimize traffic flows. To do this, the movement of cars online is taken into account, social media and meteorological data are analyzed. Today, a number of cities have committed themselves to using data analytics to combine transport infrastructure with other types of public services into a single whole. This is the concept of a “smart” city, in which buses wait for late trains, and traffic lights are able to predict traffic congestion to minimize traffic jams.

Based on Big Data technologies, the city of Long Beach operates smart water meters that are used to stop illegal watering. Previously, they were used to reduce water consumption by private households (the maximum result was a reduction of 80%). Saving fresh water is always a pressing issue. Especially when the state is experiencing the worst drought ever recorded.

Representatives of the Los Angeles Department of Transportation have joined the list of those who use Big Data. Based on data received from traffic camera sensors, authorities monitor the operation of traffic lights, which in turn allows traffic regulation. The computerized system controls about 4,500 thousand traffic lights throughout the city. According to official data, the new algorithm helped reduce congestion by 16%.

7 The engine of progress in marketing and sales


In marketing, Big Data tools make it possible to identify which ideas are most effective in promoting at a particular stage of the sales cycle. Data analysis determines how investments can improve customer relationship management, what strategy should be adopted to improve conversion rates, and how to optimize the customer lifecycle. In cloud businesses, Big Data algorithms are used to figure out how to minimize the cost of customer acquisition and increase customer lifecycle.

Differentiation of pricing strategies depending on the intra-system level of the client is perhaps the main thing for which Big Data is used in the field of marketing. McKinsey found that about 75% of the average firm's revenue comes from core products, 30% of which are mispriced. A 1% increase in price results in an 8.7% increase in operating profit.

The Forrester research team found that data analytics allows marketers to focus on how to make customer relationships more successful. By examining the direction of customer development, specialists can assess the level of their loyalty, as well as extend the life cycle in the context of a specific company.

Optimization of sales strategies and stages of entering new markets using geo-analytics are reflected in the biopharmaceutical industry. According to McKinsey, drug manufacturing companies spend an average of 20 to 30% of profits on administration and sales. If enterprises become more active use Big Data to identify the most profitable and fastest growing markets, costs will be reduced immediately.

Data analytics is a means for companies to gain a complete picture of key aspects of their business. Increasing revenue, reducing costs and reducing working capital are three challenges that modern businesses are trying to solve with the help of analytical tools.

Finally, 58% of marketing directors claim that the implementation of Big Data technologies can be seen in search engine optimization (SEO), e-mail and mobile marketing, where data analysis plays the most significant role in the formation of marketing programs. And only 4% fewer respondents are confident that Big Data will play a significant role in all marketing strategies for many years to come.

8 Global data analysis

No less curious is... It is possible that machine learning will ultimately be the only force capable of maintaining the delicate balance. The topic of human influence on global warming still causes a lot of controversy, so only reliable predictive models based on the analysis of large amounts of data can give an accurate answer. Ultimately, reducing emissions will help us all: we will spend less on energy.

Now Big Data is not an abstract concept that may find its application in a couple of years. This is a completely working set of technologies that can be useful in almost all areas of human activity: from medicine and public order to marketing and sales. The stage of active integration of Big Data into our daily lives has just begun, and who knows what the role of Big Data will be in a few years?