The magic number is seven plus or minus two. George A. Miller The Magic Number Seven Plus or Minus Two

On some limits to our ability to process information

One sign follows me everywhere. For seven years, this number has literally followed me on my heels, I constantly encounter it in my private affairs, it appears to me from the pages of our most widespread magazines. This number takes many forms, sometimes it is a little more, and sometimes a little less than usual, but it never changes so much that it cannot be recognized. The persistence with which this number haunts me is due to something more than mere coincidence. There is some kind of intentionality here, all this is subject to some specific pattern. Either there really is something unusual in this number, or I am suffering from persecution delusions.

I will begin my story by describing to you some experiments that tested the accuracy with which people can denote quantities with numbers. various parameters incentives. In other words traditional language psychology, these experiments should be called absolute evaluation experiments. However, due to an accident of history, they were given a different name, and now we call them experiments to determine the ability of people to transmit information. Since these experiments would never have been carried out without the appearance of information theory on the psychological scene, and since the analysis of the results of the experiments involves concepts of information theory, I will have to make a few remarks regarding this theory before I begin to discuss the topic.

Measuring information

The term "amount of information" refers to the same concept that we have had in mind when we have used the term "variance" for many years. These expressions are not the same, but if we firmly adhere to the idea that every increase in change entails an increase in the amount of information, then we will not greatly sin against the truth.

The advantages of this new approach to interpreting change are quite obvious. Changes are always expressed in different units of measurement - meters, kilograms, volts, etc. - while the amount of information is a dimensionless quantity. Because the information in a discrete statistical distribution is independent of the unit of measurement, we can extend this concept to situations where we can establish metrics and where we would not normally think of using the concept of change. In addition, this approach allows us to compare results obtained under completely different experimental conditions, where it would be difficult to compare changes expressed in different units of measurement. Thus, very compelling reasons force us to accept this newer concept.

The similarity between change and amount of information can be explained as follows: when we are faced with a significant change, we know almost nothing about what is going to happen next; if in such a case (when we know little) we conduct an observation, then it gives us great amount information. On the other hand, if the change is very small, then we know in advance what our observation will give us, so we will receive very little information as a result of our observation.
If you imagine a communication system, it is easy to understand that it is characterized by great variability in both what enters the system and what comes out of it. Therefore, the input and output of a system can be described in terms of their changes (or their information). In a good communication system, there must be a certain systematic relationship between what is input to the system and what is output. In other words, the output of the system depends on the input or corresponds to it. If we find this relationship, we can determine how much of the output variation is determined by the input and how much is due to random fluctuations or “noise” introduced by the system during transmission. Thus, we see that the measure of transmitted information is simply a measure of the connection between input and output.

We must follow two simple rules in the future: whenever I mention “amount of information”, you must understand this as “change”; When I talk about “amount of information transferred”, you will have to understand by this “joint change”, or “interdependence”.
This situation can be represented graphically by two partially overlapping circles. Then the left circle could represent a change in input, the right circle could represent a change in output, and the overlapping part could represent interdependent changes in input and output. The left circle should mean the amount of information at the input, the right circle the amount of information at the output, and the overlapping part is the amount of information transmitted.

In absolute rating experiments, the subject is treated as a communication channel. Then in our graph, the left circle will represent the amount of information contained in the stimuli, the right circle will represent the amount of information contained in the responses of the subject, and the overlapping part will represent the relationship between stimuli and responses, measured by the amount of information transmitted. The goal of the experiment is to measure the amount of transmitted information by increasing the amount of information at the input. If the subject's absolute judgments under these conditions are completely accurate, then almost all of the input information will be transmitted and can then be reconstructed from the subject's responses. If he makes mistakes, then the amount of transmitted information will be significantly less than the input. We can expect that as the amount of input information increases, the subject will make more and more errors, in which case we can try to identify the limits of the accuracy of his absolute estimates.

If a human observer is a reasonably designed communication system, then as the amount of information arriving at the input increases, the amount of transmitted information will initially increase and with further growth of input information will asymptotically approach a certain limiting value. We will take this asymptotic value as the channel capacity of the observer, it represents maximum amount information that an observer can give us about stimuli through absolute ratings. Capacity is the upper limit of the area within which the observer can coordinate his reactions with the stimuli presented to him.

Now it remains to say just a few words about the binary unit (bit), and we will then move on to analyzing some data. One binary unit of information is the amount of information that we need to make a decision when choosing from two equally probable possibilities. If we have to decide whether growth exceeds this person six feet or not, and if we know that both of these possibilities are equally likely, that is, the chances are equally distributed between them, then in that case we need one binary unit of information. Please note that this unit of information we used has nothing to do with length measures - feet, inches or centimeters. No matter how you measure a person's height, you still (for a given problem) will need exactly one binary unit of information.

2 doors units information allow us to make a choice from four equally probable possibilities, 3 doors, units. information – from eight equally probable possibilities, 4 doors. units – out of 16, 5 – out of 32, etc. In other words, if 32 equally probable possibilities are given, then before we know which of them is correct, we must carry out five consecutive binary solutions, each of which is associated with 1 door. units So, there is a fairly simple rule: every time the number of options to choose from doubles, 1 door is added. units information.

There are two ways to increase the amount of input information. We could increase the speed at which we present information to the observer; At the same time, the amount of information per unit of time will increase. Alternatively, we could ignore the time variable entirely and increase the amount of input by increasing the number of alternative stimuli. In experiments on absolute estimates, we are interested in the second method. We give the observer as much time as he needs to respond, while we simply increase the number of alternative stimuli from which he must make his choice, and watch for when errors appear. Errors begin to appear when the level we call “throughput” is reached.

Absolute ratings of unidimensional stimuli

Let us now see what happens when developing absolute judgments about sound tones. Pol-lac gave the subjects the task of identifying tones, and they had to assign a certain number to each tone. The tones varied in frequency and were selected in the range from 100 to 8000 Hz at equal logarithmic intervals. After a certain tone sounded, the subject had to name the corresponding number. After answering, the subject was informed whether he had correctly identified the tone or not.

Rice. 1. Data given by Pollack relating to the amount of information conveyed by the listener making an absolute judgment about pitch. While the amount of input information increases with the number of distinguishable tones (from 2 to 14) that need to be evaluated, the amount of transmitted information increases and reaches an upper limit bandwidth, equal to approximately 2.5 dv. units for one rating. In the case where only 2 or 3 tones were used, subjects never mixed them. With 4 different tones, errors were extremely rare, and with 5 or more tones, errors were noted quite often. With 14 tones, the subjects made a lot of mistakes.

In Fig. 1 shows the experimental data. The abscissa axis shows the amount of information in binary units ah for the incentive. As the number of alternative tones increased from 2 to 14, the input information increased from 1 to 3.8 dv. units The ordinate axis shows the amount of information transmitted. As one would expect, the resulting dependence of the transmitted information on the input information in in this case has the same character as for a communication channel: the transmitted information initially grows linearly to approximately 2 dv. units; then its growth slows down and it tends to an asymptotic value of approximately 2.5 dv. units This value is 2.5 dv. units and there is what we call the bandwidth of the listener making absolute pitch judgments.

So, we got a number equal to 2.5 dv. units What does it mean? First of all, note that 2.5 dv, unit. approximately corresponds to 6 equally probable possibilities. This result means that we cannot select more than 6 tones if we want the subject to never make mistakes. And, to put it a little differently, no matter how many alternative tones we present to the subject, the most we can expect from him is that he will accurately classify them into 6 different classes.

Many will probably be surprised to learn that this number is so small - only 6. Of course, it is known that a musically gifted person is able to distinguish between 50 and 60 tones in absolute pitch. Fortunately, I do not have time to discuss this amazing exception. I say “fortunately” because I don’t know how to explain their achievement of such high performance. Therefore, we will deal with more ordinary facts, which say that each of us is able to distinguish any of 5 or 6 tones, and then begins to make mistakes.
It is worth recalling here that psychologists have long used seven-point rating scales for the intuitive reason that it is useless to try to divide the scale into smaller categories simply because it will not add anything to the final assessment. Pollack's results, at least for the pitch experiments, provide good support for this intuitive position.

The question arises, how widely can this result be generalized? Does it depend only on the separation of tones or also on other experimental conditions? Pollack changed these terms in various ways. The frequency range changed by about 20 times, but at the same time the amount of transmitted information changed by no more than a few percent. Differences in tone rearrangement reduced information, but the loss was very small. For example, if subjects could distinguish 5 high tones in one group and 5 low tones in another group, then one would expect that when all 10 tones were combined into one group, subjects would continue to discriminate these tones accurately. However, they fail to do this. It turns out that the bandwidth for distinguishing tones in height is approximately 6 and this is the best that can be achieved.

Rice. 2. Garner's data on capacity for absolute estimates of stimulus loudness levels.

Let us now move on to Garner's work, in which the discrimination of tones by loudness was studied. In Fig. 2 presents the results obtained by Garner. Garner spent a lot of effort to place the tones in the best possible way, ranging in intensity from 15 to 110 dB. He used 4, 5, 6, 7, 10 and 20 tones of varying intensity. Shown in Fig. 2, the dependence was constructed taking into account differences between subjects and the influence of the immediately preceding assessment on this assessment. And in this case we again discover the presence of a certain limit. The capacity for developing absolute estimates regarding the pitch of tones is 2.3 two units, or approximately 5 clearly distinguishable alternatives.
Because these two studies were conducted in different laboratories with completely different equipment and methods of analysis, we cannot say with complete certainty whether the results obtained - 5 acceptable volume levels and 6 different tones - are significantly different. Apparently, this difference still reflects the actual state of affairs, and absolute pitch judgments are simply somewhat more accurate than judgments of loudness levels. It is important, however, that both answers represent quantities of the same order.

Rice. 3. Data from Beebe-Seyater, Rogers and O’Connelly on throughput with absolute estimates of the degree of salinity of solutions.

Experiments with taste stimuli were also conducted. In Fig. Figure 3 presents the results of experiments obtained by Beebe-Senter, Rogers and O'Connelly on absolute estimates of the concentrations of saline solutions. Concentrations were taken in the range from 0.3 to 34.7 g of table salt per 100 cm3 of drinking water, and the concentration values ​​were selected at equal subjective intervals. The throughput turned out to be 1.9 doors. units, which corresponds to approximately four distinguishable concentrations. Thus, it seems that taste concentrations differ to a slightly lesser extent than sound stimuli, but the values ​​are again approximately the same.
On the other hand, the capacity in the case of estimating the positions in space of visual stimuli turns out to be much greater. Hake and Garner conducted experiments in which observers had to interpolate the position of a pointer between two scale marks. The experimental results are presented in Fig. 4. Experiments were carried out in two directions. In the first version, observers could use any number from 0 to 100 to describe the position. In the second, the subjects were limited in their answers only to those values ​​that were possible. But the results are so similar that we have the right to conclude that the number of answers available to the test subject does not in any way affect the throughput, which in this case is 3.25 dv. units

Rice. 4

Koonen and Klemmer repeated the experiment of Hake and Garner. And although they have not yet published their results, I have received permission to state that the throughput achieved in their experiments ranges from 3.2 dv. units for very short periods of presenting the position of the pointer on the scale and up to 3.9 dv. units, for longer presentations. These estimates are somewhat higher than those obtained by Hake and Garner, and we must therefore conclude that 10 to 15 positions can be clearly distinguished on a linear interval. This is the largest throughput value for any one-dimensional variables.

At present, these four experiments on absolute ratings of simple one-dimensional stimuli represent all that have appeared in psychological journals on this issue. However, a large number of studies that examined other stimulus variables have not yet been published. For example, Eriksen and Hake determined that the carrying capacity associated with estimated square sizes is 2.2 dv. units, or about 5 categories, with a wide variety of experimental conditions. In his separate experiment, Eriksen obtained the following data: 2.8 dv. units for sizes 3.1 doors units for shades and 2.3 doors. units, for brightness. Geldard measured the throughput of the tactile analyzer by applying vibrators to the chest area. A good subject could distinguish about 4 degrees of intensity, 5 durations of oscillations and about 7 locations.

One of. The most active group working in this area is the Air Force Operations Research Laboratory. Pollack was kind enough to make available to me the results of measurements regarding operator throughput when working with visual display devices. This group made measurements of capacity in the perception of area, curvature, line lengths and line directions. In one series of experiments, they used a very short stimulus exposure of 1/40 sec/s and then repeated the measurements with a 5-second exposure. With a short exposure, they obtained a throughput value for areas of 2.6 doors. units, and with long exposure - 2.7 dv. units. For the length of the lines they received about 2.6 dv. units at short exposure and 3.0 dv. units with long exposure. The capacity when assessing directions or angles of inclination was found to be 2.8 dv. units for short and 3.3 doors. units for long exposure. It turned out that assessing curvature is fraught with significant difficulties. The results at constant arc length and short exposure are 2.2 dv. units; if the length of the chord was estimated, then the results were only 1.6 inches. units This last value is the lowest ever measured by anyone. I must add, however, that the results obtained are too low because, before calculating the amount of information transferred, they combined the data of all subjects.

Let us now look at the results achieved. Firstly; It seems that throughput is a completely natural concept for describing the behavior of a human observer. Second, the capacity values ​​measured for the univariate stimulus variables range from 1.6 to 3.9 dv. units to determine the position of a point on an interval. Although the question here is not which differences between the variables are genuine and significant, it seems to me more important that they show significant similarities. If we take the upper values ​​of the estimates obtained in all the experiments mentioned, then the average value for all stimulus variables will be equal to 2.6 days. units, and the standard deviation will be only 0.6 dv. units Expressing this data in terms of distinguishable alternatives, this mean bandwidth corresponds to approximately 6.5 categories, the standard deviation includes 4 to 10 categories, and the overall range of variation extends from 3 to 15 categories. If we bear in mind the wide variety of different variables that we have examined, this overall range of variation appears to be strikingly narrow.

Apparently, our body has some kind of limit that limits our abilities and is determined, in turn, either by the learning process or by the very structure of our nervous system. From the results considered, it is probably possible to draw a reliable conclusion that we have a finite and rather small capacity for making such one-dimensional assessments and that this capacity changes little when moving from one simple sensory quality to another.

Absolute ratings of multidimensional stimuli

The reader will probably have noticed that hitherto I have been quite careful when I say that this magic number 7 refers to one-dimensional stimuli. Everyday experience suggests that we can accurately identify any face from several hundred faces, any word from several thousand words, any one from several thousand objects, etc. Our story, of course, would not be complete if we stopped at this place. We should somehow try to understand why assessments of unidimensional stimuli produce results in our laboratories that are so different from what we routinely observe outside the laboratory.
A possible explanation lies in the existence of a number of independent variable stimulus parameters on the basis of which the assessment is made. Objects, faces, words and the like differ from each other in many ways, while the simple stimuli we have been talking about so far differ from each other in only one respect.
Fortunately, we have some data regarding absolute ratings of stimuli that differ from each other in several ways. First of all, let us consider the results obtained by Clemmer and Frick on absolute estimates of the position of a point in a square. These results can be seen in Fig. 5. As can be seen from Fig. 5, where their results are presented, the throughput value increased to 4.6 dv. units, this means that a person is able to accurately indicate any of the 24 positions of a point inside a square.

Rice. 5. Data from Klemmer and Frick on capacity for absolute estimates of the position of a point in a square.

Determining the position of a point in a square is a perceptual task in two-dimensional space. To solve it, you need to determine the position both horizontally and vertically. It is quite natural to compare the throughput value for the case of estimating the position of a point in a square (4.6 double units) with the throughput value for the case of estimating the position of a point on a linear interval (3.25 double units). To determine the position of a point inside a square, it is necessary to make two estimates of the same type as when determining the position of a point in an interval. The capacity when evaluating intervals was 3.25 dv. units For two such estimates, we would obtain, when determining a point in a square, a value of 6.5 dv. units In fact, the addition of the second variable led to an increase in the value from 3.25 only to 4.6 days. units
Another example is provided by the work of Beebe-Center, Rogers and O'Connelly. When the subjects were given the task of distinguishing between solutions containing unequal concentrations of sugar and salt, both in terms of the degree of sweetness and the degree of salinity, it turned out that the throughput in this case was only 2.3 dv. units Since when determining salinity this value is 1.9 dv. units, in this case, when assessments are made in two features of a complex stimulus, one would expect a throughput value of about 3.8 dv, units. As with determining the spatial location of the stimulus, in this task the second dimension only slightly increased capacity. In Pollack's experiments, subjects had to determine the loudness and pitch of pure tones. Since the pitch gives 2.5 dv. units and the volume gives 2.3 dv. units, when combining estimates of height and volume level, one could expect to obtain 4.8 dv. units Pollack received 3.1 doors. units, which again indicates that adding a second dimension increases the throughput value only slightly.

A fourth example can be taken from the work of Halsey and Chapanis, in which the mixing of colors of the same illumination was studied. Although their results were not analyzed in information-theoretic terms, they believe that there are approximately 11 to 15 colors, which corresponds in our terminology to approximately 3.6 dv. units Since the colors varied in both hue and saturation, it is probably quite correct to view these stimuli as two-dimensional. If we compare the obtained result with Eriksen’s data – 3.1 dv. units for shades (leaving aside the question of whether such a comparison is admissible), we again get a number slightly less than the simple arithmetic sum that we would expect to get when adding a second dimension.

However, examples with two-dimensional stimuli are far from the case for discrimination of such multi-dimensional stimuli as faces, words, etc. We only have data from one experiment with auditory stimuli, conducted by Pollack and Fix. They selected 6 different acoustic variables, the values ​​of which could be varied within wide limits: frequency, intensity, interruption speed, ratio of sound intervals to interruptions, total duration and spatial location. As expected, each of the six variables could take any of 5 different values. As a result, 56, or 15625, tones differing from each other were obtained that could be presented to the subjects. Listeners made separate ratings on each of the 6 dimensions. Under these conditions, the amount of information transmitted was 7.2 days. units, which corresponds to approximately 150 different categories that could be absolutely and unmistakably identified. Here we are just beginning to approach the range of variability that we constantly encounter in everyday practice.

Let's imagine that we have plotted all this data on a graph, and now try to understand how the throughput changes as the dimension of the stimulus changes. Fig. will help us with this. 6. I even plotted a dotted line on the graph, which in schematic form displays the general trend exhibited by these data.

Rice. 6. General character dependencies between throughput and the number of independent variable stimulus features.

It is clear that adding an independent variable feature to a stimulus increases throughput, but this increase occurs in ever-decreasing proportion as new features are added. It is interesting to note that throughput increases even when these variables are not independent. Eriksen notes that in the case when the sizes, brightnesses and shades of stimuli change in strict connection with each other, the transmitted information is equal to 4.1 dv. units By. compared to its average of 2.7 dv. units, which is achieved when each feature changes separately, one at a time. By combining three features, Eriksen increased the dimension of the input, but the amount of input information did not increase (since the change in feature values ​​occurs interdependently). As a result, throughput increased approximately to the extent that would be expected based on the dotted curve in Fig. 6.
What appears to be the case is that as new variables are added to the image, throughput increases, but the accuracy of distinguishing any individual variable decreases. In other words, we can only make rather rough judgments about several objects at the same time.

It can be argued that during the evolutionary process, only those organisms survived that could most successfully respond to the widest range of stimuli coming from the environment. In order to survive in a constantly changing world, it is much better to have a small amount of information about many things than to have a huge amount of information about a small part. environment. The compromise that was reached as a result of evolution is the most suitable.

The results obtained by Pollack and Fix agree very well with the argument put forward recently by linguists and phoneticians. According to linguistic analysis of the sounds of human speech, there are from 8 to 10 measures - linguists call them distinctive features - by which one phoneme is distinguished from another. These distinguishing features are usually binary in nature, or at most ternary. For example, a binary distinctive feature underlies the separation of vowels from consonants, to distinguish oral consonants from nasals, a binary decision is also required, and to distinguish between front-lingual, mid-lingual and back-lingual phonemes, a ternary decision must be made, etc. This approach to phoneme recognition gives us a completely different picture of speech perception and allows us to take a different approach to the analysis of the ability of the human ear to determine relative differences in pure tones. Personally, I was very interested in the new approach and I can only express my regret that in this work I cannot dwell on this in more detail.

Pollack and Fix probably decided to conduct their experiments on a number of tonal stimuli precisely under the influence of linguistic theory. They varied the stimuli along 8 dimensions, but only required a binary decision on each dimension. As a result of measuring the transmitted information, they received 6.9 dv. units, or 120 distinguishable types of sounds. In this regard, an interesting and still unresolved question arises: is it possible to add new dimensions indefinitely in this way?
Human speech reveals a clear limit to a number of the dimensions we use. But in this case, however, it is not known whether this limit arises from the very nature of the perceptual mechanism designed to distinguish sounds, or from the characteristics of the speech mechanism that produces these sounds. Apparently, to find out this, special experiments should be done. However, in every language there is a well-studied limit of 8 or 9 distinctive features. Consequently, when we speak, we resort to another subtle trick to increase our throughput. Our language uses entire sequences of phonemes. Thus, when listening to words and sentences, we consistently make several assessments. In other words, we resort to both sequential and simultaneous methods of distinguishing sounds in order to expand the rather strict limits due to the inaccuracy of our absolute estimates of simple quantities.

These multivariate estimates closely resemble the Külpe abstraction experiments. As is known, he showed that setting subjects to certain features of stimuli led to the fact that subjects reported more accurately about these features than about features that were not included in their number. For example, Chapman used 3 different traits and compared the results of two series of experiments. In the first series, before the tachistoscopic presentation of the stimuli, the subjects received specific instructions regarding the signs; in the second series, the subjects were not told which of the signs they should pay attention to. It turned out that the judgments were more accurate when the subjects were given instructions in advance. When the instruction was given after the presentation of the stimuli, then the subjects apparently had to make preliminary assessments of all the features before assessing one of the three features, which, of course, reduced the accuracy of the answers. These data are fully consistent with the results just discussed, according to which the accuracy of judgment regarding each attribute decreases as the number of measurements increases. The essence of the matter is, of course, clear, but I would like to emphasize that abstraction experiments did not confirm the proposition that a person can judge only one feature at a time. They only showed that a person is less accurate in his judgments in the case when he has to make them about more than one sign at the same time.

Simultaneous perception

I cannot end this review without saying at least briefly about the experiments on number discrimination carried out at Mount Holyoke College by Kaufman, Lord, Reese and Volkman. They presented subjects with images randomly composed of dots on a screen for 5 seconds. In any presentation, from 1 to 200 dots could appear. The subjects' task was to report how many dots the image contained.

First of all, it should be noted that when the image contained up to five or six dots, the subjects simply did not make mistakes. The results of actions with these small numbers of points were so different from the results of actions with a large number of points that these actions should be given a special name. When the number of points does not exceed 7, they speak of “instant grasping” (subitise), with more They talk about an estimate. As you noticed, this is exactly what we once metaphorically called “attention span.”
Such an abrupt interruption at the number 7 is, of course, conjectural. Are we seeing the same process here that limits our ability to make unidimensional assessments to about seven categories?

In my opinion, this generalization is tempting but unfounded. These number estimation data have not been analyzed using information-theoretic concepts, but based on the published results, I suspect that subjects were conveying not much more than 4 dv. units information regarding the number of points. Using the same arguments as before, we might conclude that there are only about 20 or 30 distinguishable numerical categories. This greatly exceeds the amount of information that would be expected from a one-dimensional image. In fact, all this is very similar to a two-dimensional image. Although it is not yet clear how to determine the dimension of an image composed of randomly grouped dots, these results come close to Klemmer and Frick's data for a two-dimensional stimulus when finding the position of a dot in a square. Probably, when estimating the number of points, these two dimensions are the area occupied by the points and their density. In the case where the subject can perceive simultaneously, image area and density are not significant variables, but when the subject must judge, these parameters are likely to be significant. In any case, this is not as simple a matter as it might seem at first glance.

This is one of the areas in which I am haunted by the magic number 7. Here we are faced with two closely related types of experiments, each of which points to the significance of the number 7 as the limit of our abilities. And yet, with a more in-depth study of the problem, there remains, as it seems, a completely justified suspicion that all this can be explained by a simple coincidence.

Direct memory capacity

Let me summarize what has been said this way: there is a certain clearly expressed limit to the accuracy with which we can absolutely (i.e., without resorting to comparison with a standard) distinguish the value of a one-dimensional stimulus variable. I would propose to call this limit the scope of absolute assessment, and I argue that for one-dimensional estimates this scope lies somewhere in the vicinity of the number 7. Our abilities, however, are not completely dependent on this limited scope, because we have many ways to go beyond it and increase the accuracy of our judgments. Here are three of the most important: (a) one should resort to relative rather than absolute judgments, and if this is not possible, then (b) one should increase the number of dimensions along which the stimuli could differ, or (c) rearrange the problem in such a way , so that a series of several consecutive estimates can be compiled.

The study of relative evaluations is one of the oldest problems in experimental psychology, and I do not intend to review this research here. The second method, which consists in increasing the dimension of the stimuli, we have just considered in detail. It seems that by adding new dimensions and requiring only crude, binary yes-no ratings for each attribute, we can expand the scope of absolute ratings from 7 to at least 150. Based on our everyday experience, then the limit is probably somewhere around a few thousand, if it actually exists. In my opinion, it is impossible to combine dimensions indefinitely. I suppose that there is also a volume of perceptual dimension and that the numerical value of this volume lies somewhere around ten, but I must immediately add that I have no objective data to prove this assumption. This issue also requires experimental study.
As for the third technique - the use of sequential assessments, I would like to dwell on it in a little more detail, because here they resort to an interesting technique when memory is put at the service of the process of discrimination. And since mnemonic processes are no less complex than perceptual processes, one might think that understanding their interaction will not be so easy.

Let us suppose that we begin simply with a slight development of the experimental technique which we have already used. So far we have presented the observer with one stimulus and asked him to name it immediately after presentation. We can develop this technique if we require the subject not to rush to answer until he has been presented with a sequence of several stimuli. It must produce a response at the end of the stimulus sequence. Our experimental situation turns out to be the same as when measuring the transmitted information. But now we have moved from experiments on the development of absolute judgments to what are traditionally called experiments on the study of immediate memory.
Before we begin to look at the relevant data, I would like to offer a word of caution to help you avoid some obvious associations that may be misleading. Everyone knows that there is a finite amount of immediate memory and that for most types of test material this amount does not exceed 7 units. I just talked about a span of absolute judgment that corresponds to about 7 distinguishable categories, and a span of attention that is about 6 objects that can be seen at the same time. What could be more natural than the assumption that all these phenomena are different aspects of a single process underlying them? And it is in this assumption that the fundamental error lies. This obsessive, harmful idea haunted me as persistently as the magic number 7.

My mistake led to something like this. We have already seen that the amount of information that an observer can convey is an invariant property of the scope of absolute judgments. There are many operational similarities between absolute judgment experiments and immediate memory experiments. If the phenomena associated with immediate memory are in some way similar to absolute estimates, then it follows that the amount of information that an observer can remember is also an invariant property of the amount of immediate memory. If the amount of information in the volume of immediate memory is a constant value, then this volume should be small in the case when the individual units being memorized contain a lot of information, and large in the case when they carry little information. For example, each decimal digit carries 3.3 digits. units information, we have the ability to store approximately seven decimal digits in memory, which gives a total of 23 digits. units information. Isolated word in English carries approximately 10 doors. units each. If the total amount of information remains constant and equal to 23 days. units, then in this case we would only have to remember two or three words chosen at random. In this way, I came to the hypothesis that the volume of immediate memory varies depending on the amount of information per unit of test material.
Measurements of memory capacity, information about which is available in the literature, were carried out taking into account, but not sufficiently defined, this dependence. Therefore, it was necessary to carry out additional experiments. Hayes tried to do this by using five different types of test materials in his experiments: binary digits, decimal digits, letters of the Latin alphabet, these same letters plus decimal digits, and, in addition, 1000 monosyllabic words. Lists of symbols were read aloud at a rate of one symbol per second, and subjects were given as much time to respond as they needed. To count reactions, the method proposed by Wu-dworth was used.

The results obtained are presented in Fig. 7 black circles. The dotted line in the same figure shows what the volume of immediate memory should be if the amount of information remained constant. Solid lines show actual data.

Rice. 7. Hayes data, showing the dependence of the volume of immediate memory on the amount of information per unit of test material.

Hayes repeated his experiment using test dictionaries of varying lengths, containing only English monosyllabic words. This more uniform test material did not significantly change the final results. With binary symbols, the capacity of immediate memory is 9 units, and although it decreases to 5 units with monosyllabic English words, the resulting difference is much smaller than would be expected based on the hypothesis of a constant amount of information in the volume of immediate memory.
It should not be thought that any error could have been made in Hayes's experiments, since Pollack repeated them quite carefully and obtained the same results. Pollack paid special attention to measuring the amount of information transmitted, without relying on the traditional method of calculating responses. The results he obtained are presented in Fig. 8. From the figure it is clear that the amount of transmitted information is not a constant value; it increases almost linearly with the increase in the amount of input information per symbol.

Rice. 8. Pollack’s data (16), showing the dependence of the amount of information remaining after one presentation on the amount of information per unit of test material.

So the final result is quite clear. Despite the fact that the magic number 7, due to a random coincidence, appears in both cases, the volume of absolute estimates and the volume of immediate memory characterize two completely various types restrictions imposed on our ability to process information. Absolute estimates are associated with a limited amount of information, and immediate memory is limited by the number of memorized units. In order to express this difference in a visual form, I propose to distinguish between the two. units information, on the one hand, and chunks of information, on the other. In this case, it will be possible to say that the number of binary units of information is constant for absolute evaluation and that the number of pieces of information is constant for immediate memory. It seems that the amount of immediate memory is almost independent of the number of binary units per piece of information, at least within the limits that have been studied to date.

Contrasting the terms two units. and a piece of information also sheds light on the fact that we do not very accurately determine what exactly this piece of information is formed from. For example, Hayes's immediate memory capacity of 5 selected at random from 1000 English monosyllabic words can be called a memory capacity of 15 phonemes, since each word is formed by approximately three phonemes, and the logical difference between them is not so obvious. We are dealing here with the process of organizing or grouping input stimuli into familiar units or pieces of information, and a significant part of the learning effort should be directed towards the formation of such familiar units.

Recoding

Therefore, in order to be more precise, we must recognize the importance of processes of grouping or organizing input sequences into units or chunks of information. Since the memory capacity is limited number pieces of information, we can increase the number of binary units per piece of information by constructing larger and larger segments, so that each segment contains more information than before.

A person who is just beginning to study the radiotelegraph code perceives by ear each dot and dash separately, as a separate piece of information. But he soon gains the ability to organize these sounds into letters, and now he treats letters as pieces of information. Then the letters are organized into words, which in turn become even larger pieces of information, and the operator begins to perceive entire phrases. I don't mean that each step described is a discrete process or that there should be plateaus in the learning curve, because of course different levels organizations comply different speeds work and these levels overlap each other in the learning process. I am simply pointing out the obvious fact that during training, dots and dashes are organized into auditory images, and that as more and more pieces of information are formed, the number of messages that the operator is able to remember increases accordingly. Using the terms I propose, we can say that the operator learns to increase the number of binary units per piece of information.

In communication theory, this process is called recoding. Input messages are code that contains many pieces of information with a small number of binary units per piece. The operator recodes the input messages into new code, which contains fewer segments of information, but with a larger number of binary units per segment. There are many ways to perform these recoding operations, but perhaps the simplest is to form a group of input symbols, assign a new name to the group, and remember this new name instead of remembering the original input symbols.

Since I am convinced that this process is a very general and important process for psychology, I want to tell you about a demonstration experiment that will illustrate with complete clarity everything that I have said. This experiment was carried out by S. Smith in 1954.

Let's start with the already noted fact that a person is able to reproduce from memory eight decimal digits and only nine binary digits. Since in this case there is such a discrepancy in the amount of information reproduced in these two responses, we immediately assume that in order to increase the amount of immediate memory by binary digits it would be necessary to apply a recoding technique. The table shows a method for grouping and renaming. The upper column shows a sequence of 18 binary digits, which is much more than the subject can reproduce from memory after a single presentation. IN next line these same binary digits are grouped into pairs. The following four pairs can be formed here: 00 is renamed to 0, 01 is renamed to 1, 10 to 2 and 11 to 3. In other words, we have moved from binary arithmetic to quaternary arithmetic. In the recoded sequence there are now only 9 digits that need to be remembered, and this number almost does not exceed the capacity of immediate memory. . In the next row of the table, the original sequence of binary digits is recoded into pieces of information of 3 characters each. There are only 8 possible combinations of 3 symbols. We thus give each combination a new designation from 0 to 7. Now we have moved from a sequence of 18 binary digits to a sequence of 6 octal digits, and this number fits well into the amount of immediate memory. In the last two lines, the binary digits are grouped into 4s and 5s, and assigned binary decimal notations 0 to 15 and 0 to 31.

It is quite obvious that this method of recoding leads to an increase in the number of binary units per piece of information; it also makes it possible to convert the binary sequence into a form that can easily be retained in immediate memory.

Smith recruited 20 subjects and measured their immediate memory capacity for binary and octal digits. It turned out that the amount of immediate memory is 9 for binary and 7 for octal digits. Then each of the coding schemes shown in the table was given to 5 subjects. They studied the recoding process until they reported that they understood it, which lasted from 5 to 10 minutes. He then tested their memory capacity again for binary digits while they tried to apply the recoding schemes they had learned.
In each case, the use of recoding schemes increased the amount of their immediate memory by binary digits. But this increase was not as large as would be expected based on the volume values ​​for octal digits. Since the observed difference increased with increasing transcoding ratio, we can conclude that a few minutes allocated to familiarize ourselves with the transcoding schemes is clearly not enough. Obviously, translation from one code to another must occur almost automatically, otherwise the subject will lose part of the next group while he is trying to remember the translation of the last group.

Since the 4:1 and 5:1 cases needed further study, Smith decided to follow Ebbinghaus' example and conduct the experiments on himself. With purely German patience, he carefully studied all the conversion schemes in sequence and obtained the results presented in Fig. 9. The results are now very close to what would be expected based on the octal digit memory size. He was able to remember 12 octal units. When recoding with a ratio of 2:1, these 12 pieces of information correspond to 24 doors. units With a recoding ratio of 3:1, 12 segments are 36 dv. units, and at odds of 4:1 and 5:1 they amount to almost 40 binary digits.
When a person perceives 40 binary digits and then reproduces them accurately, it makes an amazing impression. However, if you think that all this can be considered simply as a mnemonic device for expanding memory capacity, then you are missing a much more important point that follows from almost all such mnemonics. It lies in the fact that recoding turns out to be exclusively powerful tool to increase the amount of information that we can process. In our daily practice, we constantly resort to recoding processes in one form or another.

Rice. 9. Dependence of the amount of immediate memory for binary digits on the transcoding technique used. The predicted function was obtained by multiplying the volume of immediate memory by octal digits by 2.3 and 3.3, recoding to bases 4.8 and 10, respectively.

In my opinion, the most common type of recoding that we resort to all the time is translation into a verbal code. When we want to remember a story, or an argument, or an idea, we usually try to retell it in our own words. When we want to remember an event that we have witnessed, we usually make a verbal description of this event and then remember this particular verbal description. When reproducing something from memory, we restore through secondary processing Details that seem compatible with the particular verbal recoding that we have done. The existence of this process was confirmed by the famous experiment of Carmichael, Hogan and Walter, which examined the effect of names on the memory recall of visual images.

In forensic psychology, discrepancies in the testimony of eyewitnesses are well known, but such discrepancies and distortions are not arbitrary - they stem from the fact that each witness used his own recoding system, which depended on his entire life experience. Our language is exceptionally suited to redistributing data into a few information-rich chunks. I suppose that imagination is also a form of encoding, but obtaining representations operationally and then studying them experimentally is a much more difficult task than studying symbol-related forms of recoding.

Studying the process of remembering events in the manner described seems quite possible. The process of memorization can be considered as the process of forming segments of information or groups of symbols combined together until a sufficiently small number of units is formed that we can subsequently reproduce completely from memory. Of particular interest in this regard is the work of Bousfield and Cohen on the formation of word groups when recalling them from memory.

Brief conclusions

Having come to the end of the presentation of my material, I would like to make brief concluding remarks.
Firstly, the amount of information that we can receive, process and remember is limited in some respects by the volume of absolute estimates and the volume of immediate memory. By simultaneously organizing input stimuli along several dimensions and sequentially ordering them into a series of pieces of information, we are able to eliminate or at least significantly weaken this limitation of our information processing processes.
Secondly, the process of recoding is a very important psychological process and deserves much more attention than it has received so far. In particular, the type of linguistic recoding that people use every minute seems to me to be the vital basis of thought processes. Clinicians, social psychologists, linguists and anthropologists are constantly confronted with recoding processes, but nevertheless, perhaps because recoding is less accessible experimental research than, for example, work with nonsense syllables or T-mazes, traditional experimental psychology has contributed very little or nothing to the analysis of this problem. Nevertheless, here too it is possible to develop experimental techniques, find recoding methods and indicators of behavior. And I hope that we will be able to discover a very coherent system of relationships describing what now seems only a disparate set of not very interconnected facts.
Thirdly, the methods and measures proposed by information theory make it possible to take a quantitative approach to solving some of these problems. Information theory provides us with a unit of measurement for calibrating stimulus material and for measuring the characteristics of subjects. In an effort to ensure clarity, I have omitted some technical details regarding how information is measured and have tried to express my thoughts in the simplest possible way. I hope this does not lead you to think that they are completely useless in research; information-theoretic concepts have already proven very useful in the study of recognition processes and linguistic problems, they promise to be also useful in the study of learning processes and memory-related phenomena, in addition , it was assumed that they could find application in the study of concept formation. Many questions that seemed useless twenty or thirty years ago may seem important now. Indeed, I feel that I should end my story here just when it is really getting very interesting.

And finally; what about the magic number 7? What can be said about the 7 wonders of the world, the 7 seas, the 7 deadly sins, the 7 daughters of Atlas - the Pleiades, the 7 ages of man, the 7 levels of hell, the 7 primary colors, the 7 tones of the musical scale, or the 7 days of the week? What can be said about a seven-digit rating scale, about 7 absolute rating categories, about 7 objects in the attention span, and about 7 units in the immediate memory span? For now, I prefer to reserve judgment for now. Probably, behind all these sevens there is hidden something very important and deep, calling us to discover its secret. But I suspect that this is just an evil Pythagorean coincidence.

Literature

(1) Beebe-Center, J. G., Rogers, M. S., & O’Connell, D. N. Transmission of information about sucrose and saline solutions through the sense of taste. J. Psychol., 1955, 39, 157-160.
(2) Bousfield, W. A., & Cohen, B. H. The occurrence of clustering in the recall of randomly arranged words of different frequencies-of-usage. J. gen. Psychol., 1955, 52, 83-95.
(3) Carmichael, L., Hogan, H. P., & Walter, A. A. An experimental study of the effect of language on the reproduction of visually perceived form. J. exp. Psychol., 1932, 15, 73-86.
(4) Chapman, D. W. Relative effects of determinate and indeterminate Aufgaben. Amer. J. Psychol., 1932, 44, 163-174.
(5) Eriksen, C. W. Multidimensional stimulus differences and accuracy of discrimination. USAF, WADC Tech. Rep., 1954, No. 54-165.

Ecology of consciousness. Psychology: Miller's wallet is a person's short-term memory, into which only seven "coins" can be "put" at a time. Moreover, it is important that memory does not try to analyze the meaning of information; only external ones are important. General characteristics. In other words, it doesn’t matter what “coins” are in the “wallet”, the main thing is that there are seven of them.

Seven plus minus two (7 ± 2). Miller's wallet

This pattern “seven plus or minus two” was discovered by an American psychologist George Miller as a result of a number of experiments and shows that A person's short-term memory is capable of remembering on average:

    nine binary numbers,

    eight decimal numbers,

    seven letters of the alphabet

    five one-syllable words.

This psychological pattern was first outlined in his work “The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Processing Information.” From all this it follows that a common person able to simultaneously remember 7 ± 2 elements. It turns out that a person can keep in mind (remember and repeat) no more than 9 elements, and often no more than 5.

Miller's wallet is a person's short-term memory, into which only seven "coins" can be "put" at a time. Moreover, it is important that memory does not try to analyze the meaning of information; only external, general characteristics are important.

In other words, it doesn’t matter what “coins” are in the “wallet”, the main thing is that there are seven of them. And if the number of elements is more than seven (in extreme cases, nine), then the brain breaks the information into subgroups so that their number is from five to nine.

George Miller (1920-2012)- American psychologist. In the forties of the last century, he received a Bachelor of Arts degree from the University of Alabama, and in 1946 he defended his doctorate in psychology at Harvard.

After which he became a professor of psychology at Rockefeller University in New York, at Princeton University. In 1969 he was elected president of the American Psychological Association. George Miller was awarded the William James Book Award for his book The Science of Words, and also received the National Medal of Science from the United States of America from President George H. W. Bush.

His most famous work, The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Processing Information, was published in 1956 in Psychological Review. This number is also called the Yngve-Miller number. published

Description of the principle

Application

This principle is used, for example, in the construction interfaces programs. If the number of menu items (menu items, buttons, bookmarks) is more than seven, or at most nine, then they try to group these elements.

Links

  • George A. Miller. The Magical Number Seven, Plus or Minus Two. // The Psychological Review, 1956, vol. 63, pp. 81-97.
  • DDR for the head, or how our memory works - Article on Habrahabr

Notes


Wikimedia Foundation.

2010.

    See what “Seven plus minus two” is in other dictionaries: Seven plus/minus two (7 +/_2) - – “magic number” D.A. Miller (in fact, this characteristic of the volume of short-term memory was first and previously established by D. Jacobs in experiments on memorizing numbers). This means that the volume or capacity of short-term memory does not concern... ...

    Encyclopedic Dictionary of Psychology and Pedagogy SEVEN PLUS"MINUS SEVEN PLUS"MINUS TWO (7±2) - Magic Number by George A. Miller. This term refers to the approximate number of discrete pieces of information that can be stored simultaneously in short-term memory. Please note that this limitation is based on the understanding... ...

    Explanatory dictionary of psychology - “The magic number seven plus minus two” (“Miller’s wallet”) a pattern discovered by the American scientist psychologist George Miller, according to which short-term human memory

    , as a rule, cannot remember and repeat more than 7 ±... ... Wikipedia - () subtraction sign in arithmetic. Dictionary of foreign words included in the Russian language. Pavlenkov F., 1907. MINUS (lat. minus smaller). 1) in arithmetic, a subtraction sign. 2) designation when something is missing. 3) Negative sign...- “The magic number seven plus minus two” is a pattern (also known as “Miller’s wallet”), discovered by the American scientist psychologist George Miller, the essence of which is that short-term human memory can remember and ... ... Wikipedia

    - “The magic number seven plus minus two” is a pattern (also known as “Miller’s wallet”), discovered by the American scientist psychologist George Miller, the essence of which is that short-term human memory can remember and ... ... Wikipedia