SPSS Basics

Before directly starting to process the data from the study in SPSS, it is necessary to properly organize data entry.

Entering research data into the program can be divided into 2 main stages:

· Preparing the basis of the questionnaire

Direct data entry

Let's take a closer look at these procedures.

The stage of preparing the basis of the questionnaire. In SPSS, data is entered in a specific format. In order to prepare a form for entering and further processing of data, you must initially enter the questionnaire template in a form acceptable to the program. General form The program window looks like shown in Figure 1.

Rice. 1. General view of the SPSS program after launch.

When the program is launched for the first time, the user is offered an additional dialog box in which it is proposed to select actions related to editing an existing database, opening existing file and so on. As a rule, in most cases this window does not bear a significant load. For this reason, we recommend checking the box next to “Don`t show this dialog in the future”. General initial view programs are, in principle, standard for most programs developed for the Windows operating system. General navigation bar, window appearance and window management are almost completely identical to most programs office applications. For this reason, we will focus on the distinctive features of the SPSS program itself.

Fig.2. SPSS workspace.

There are 2 fields in SPSS, organized as tabs, similar to the program Excel. However, these fields are far from equivalent. Figure 2 shows the working field of the program, into which the user directly enters data from questionnaires (data view). However, before entering data, you need to create a questionnaire template in the program, its basis. The questionnaire template is entered in the variable definition field - Variable View. In SPSS, data is entered in a specific format. In SPSS, all variables (when entered) are arranged vertically, and the observation is horizontal. Let's take a closer look at the Variable View field (Figure 3).

Fig.3. Variables window view.

Each variable is a question in the questionnaire. By default, SPSS has 10 basic characteristics that can be used to describe a variable: name, type, width, decimals, label, values, missing, columns, align, and measure. In principle, according to the significance and importance of filling, these variables can be divided into those that relate to the parameters of the variable definition and those that are responsible for the convenience of output.

Basic values ​​of the variable parameters:

Name- variable name that will be displayed in the input field. The program uses the same name to identify the variable. The name must not exceed 8 characters and be in English only. (In more later versions programs, you can use Russian text)

Type- definition of the variable type. In other words, what information is entered as values: number, date, random value, comma, etc. The most commonly used formats are “numeric” (Nymeric), date (Date) and string (text, String). In the first case, any number can be taken as a value, in the second - a date in a certain format, in the last - text.

Width- length of the variable. The number of digits that can fit in a cell.

Decimals- the number of decimal places after the decimal point.

Label- name, label, user variable, more detailed description variable. It is usually formulated precisely as the survey question itself. Used in reports and allows you to use any font.

Values- labels of the variable values ​​that the variable can take. In SPSS, data is presented primarily in numerical format, because... the text format is not amenable to statistical analysis. For example, gender can be coded as 1 - male, 0 - female. When entering values, it is very important to follow the sequence when determining the ranking scale - the values ​​​​must be in ascending order. An example of incorrect data entry will be discussed below. To determine the metric scale, the values ​​may not be specified.

Value labels are entered in an additional window.

Fig.4. Determining the type of a variable.

Missing- identification of missing values. They can be set automatically by the system (System-defined missing values) or by the user (User-defined missing values).

Columns- determining the column width.

Align- alignment in the cell (left edge, right, center).

Rice. 5. Determining the value of a variable.

Measure- definition of the variable scale. Scale - number, metric scale; ordinal - ranking scale; nominal - nominal. Extremely important characteristic, since processing will depend on the correct choice of scale type. The program contains a graphical hint - an icon opposite each type of scale (ruler - as a result of measurement - a number; increasing histogram - determination of rank; circles of sets - incomparable characteristics indicating disjoint sets).

Rice. 6. Selecting a scale type in SPSS.

Let's look at the types of measuring scales in a little more detail.

In principle, the type of scale itself is determined by the researcher already at the stage of searching for empirical indicators of the characteristics being measured during the preparation of a sociological research program. IN final form the scale is embodied directly in the survey question. It is very important to comply with the requirements for the formulation of alternative options. From the point of view of the SPSS program, the most important requirement- disjointness of the resulting subsets formed by alternative questions. Otherwise, it is quite difficult when processing data (more precisely, when entering data) to determine exactly the interval, the subset to which the respondent actually attributed this question.

For example, alternative answers to the age question may include intervals such as up to 15 years, 15-20, 20-25, 25-30, 30 and older. With this formulation, a problem arises in relation to such results when the respondent turns out to be 15, 20, 25 and 30 years old - i.e. when he hits the boundary. The respondent can randomly, based on some of his prejudices) mark any interval - both higher and lower. When processing data, this fact can distort the actual picture. If we consider the general classification of scales, it can be presented in the form of the following diagram.

Rice. 7. Classification of scales.

The dotted lines in the figure indicate arrows leading to the interval scale. The fact is that the interval scale is not metric in the strict sense, but is classified as non-metric. However, in some cases, for example, when the intervals are equal, you can perform some mathematical operations with it that are characteristic of the metric scale.

From the point of view of conducting research and processing data, it is very important to understand the possibilities and limitations of using a particular type of measurement scale. It is important to understand that metric scales, in SPSS - the scale type, have the most powerful measuring ability in terms of analytical capabilities, because All statistical procedures can be applied to this scale practically without restrictions. Nominal - on the contrary, provide the weakest capabilities. By by and large is simply a frequency distribution and mode as an indicator of a measure of central tendency.

In practice, it is extremely important to choose the right measurement scale already at the stage of designing a questionnaire survey. It is important to understand that the more we want to get information about a particular type of question, the more we need to strive to use the metric scale. The ideal questionnaire, from the point of view of its processing capabilities, is a list of questions, each of which is measured quantitatively. On the other hand, this is practically impossible to implement in practice, both due to the impossibility of “digitizing” variables (for example, it is unrealistic to completely convert a question regarding the respondent’s gender into a metric scale), and based on the principles of dramaturgy of the questionnaire instrument itself - monotonous questions reduce the respondent’s motivation and reliability received data.

Returning to the peculiarities of defining variable parameters in the SPSS program, it can be noted that the parameters that are largely responsible for the convenience of presenting information include: columns (column width), align (cell alignment) and, to some extent, width (length) and decimals (number of decimal places). In most cases, these parameters can simply be left unchanged by agreeing with the proposed values. But you need to be careful regarding the remaining parameters for defining variables, since they will have a significant impact on the process of entering and processing information.

After defining variables in SPSS, you can go directly to entering data, which is entered into the data view field in the form of numbers or other symbols (depending on the type of variable). The next section will look at the detailed algorithm for defining variables and entering values.


Recently, widespread use in the education system information Technology. To obtain quantitative indicators about the quality of test subjects’ preparedness, processing of large volumes of mass testing data is required. For this purpose various software environments, among which the SPSS program occupies a special place - universal system statistical analysis and data management. The main blocks of SPSS: data editor; viewer; multidimensional mobile tables; high quality; access to databases; data transformation; reference system; command language. Using the computer program SPSS, it is possible to accurately and fast processing test results. The SPSS program is an effective tool for practical work in the field of sociological and pedagogical analysis and provides fast and accurate data processing.

Unified State Exam.


computer testing

blank testing

mass centralized testing

suitability analysis

factor analysis

nonparametric methods


frequency analysis

spss program

latent characteristics



mass testing technology

systematic analysis

final examination


information Technology

1. SPSS the art of information processing A. Byul, P. Tsefel M.: DiaSoftYUP, 2005. - 608 p.

2. Efremova N.F. Testing and monitoring: recommendations for teachers // Standards and monitoring in education. 2001. – No. 3.

3. Efremova N.F., Meskhi B.Ch. Systematicity and continuity in the formation of a fund of evaluation means of a technical university // Council of Rectors. No. 5. 2011. - pp. 35-40.

4. Nasledov A.D. IBM SPSS 20 Statistics and AMOS: professional statistical data analysis. Practical guide. St. Petersburg: Peter, 2013. – 416 p.

5. Processing and analysis of sociological data using the SPSS package. Educational and methodological manual. E.V. Expensive. Surgut. Publishing center of Surgu State University. 2010. – 60 p.

6. Patsiorkovsky V.V., Patsiorkovskaya V.V. SPSS for sociologists. Textbook / V.V. Patsiorkovsky, V.V. Patsiorkovskaya. - M.: ISEPN RAS, 2005. - 434 p.

7. Usataya I.E., Davydova M.A. Evaluation as a tool for managing the quality of teaching in educational practice // International Student Scientific Bulletin. 2016. No. 2.; URL: http://www.?id=14357

Recently, information technologies have become widespread in the education system. They are used for training, control, final certification of graduates, self-study, self-control, etc. The most important condition for improving the quality of education is the systematic analysis of objective data from independent control of educational achievements, monitoring and diagnosing the readiness of students to obtain results that correspond to their capabilities and needs. Increasing attention of researchers to solve various problems of education and self-government educational activities attracted by the possibilities of mass testing technologies.

An important role in the development of monitoring learning outcomes should be played by systematic and continuous assessment, which provides a judgment about the student’s readiness to continue studying and his participation in social and industrial activities. The difficulty lies in the fact that not only high-quality training is required, but also high-quality assessment, high-quality assessment tools and procedures, as well as providing motivation when performing tests so that the manifestation of the latent characteristics of the subjects is maximized. Therefore, assessment should be carried out as a specifically focused and orderly process of determining the set and level of preparedness achieved, and the results should be expressed quantitatively, regardless of how simple or difficult they are to evaluate.

To obtain quantitative indicators about the quality of test subjects’ preparedness, processing of large volumes of mass testing data is required. For this, various software environments are used, among which a special place is occupied by the SPSS Statistics (“Statistical Package for the Social Sciences”) program - this is a “statistical package for the social sciences. It is the market leader in commercial statistical products for applied research in the social and educational sciences. SPSS is a universal system for statistical analysis and data management. This acronym originally stood for Statistical Package for the Social Science. The original acronym was then given a new interpretation: Superior Performance Software System software highest performance).

In the early 1970s, Norman Nie, Dale Bent and Hadlai Hull registered the SPSSR statistical software trademark. The company of the same name was created by them in 1968. In 1975, the company was transformed into a corporation with its main office in Chicago (Chicago, IL USA). Over the years of its existence, the corporation has developed many software products, including SPSS/PC+TM, the first version of which appeared in 1984. In 2009, the package became known as PASW Statistics (Predictive Analytics SoftWare - intelligent analytical software). Since July 2009, the package has been maintained by IBM (International Business Machines) under the name IBM SPSS Statisics. In 2013, the next version of the package was released - IBM SPSS Statistics 22, running under various operating systems Windows systems, MacOsX, Linux.

By all SPSS parameters is a complex and powerful statistical package. Using the SPSS package, you can carry out almost any data analysis, and the latest versions of the program are used in a wide variety of scientific fields, including in the educational sciences. Today SPSS is a software product and at the same time a protected trademark of the world famous American company SPSS Inc., whose board of directors remains in Chicago. This package occupies a leading position among programs designed for statistical processing of information in the social and pedagogical sciences. Together with all the software of the specified profile, it has gone through a long path of evolution: first from the first versions of SPSS for mainframe computers, to versions oriented on PC-DOS/MS-DOS, and then to versions running in Windows environment. SPSS provides a user-friendly interface that makes the entry and statistical analysis process accessible to the beginner and convenient to the advanced user. The package data editor allows you to conveniently ( tabular method) enter and correct input data. SPSS makes it possible to produce a variety of high-quality graphs and various charts. With the help of the package, using tables, simple menus and dialog boxes, you can, firstly, analyze huge data files with thousands of variables, and, secondly, do all this without writing commands in a programming language. Using SPSS you can: manage data; organize data; transform data, create new variables; analyze data.

Possible areas of application of SPSS: storage and analysis of survey data, marketing research and sales, the financial analysis etc. In sociology and pedagogy, the package allows you to automate the process of creating databases various information, their storage and processing. Stages of the analytical process implemented in SPSS: planning; data collection; providing access to data; preparing data for analysis; performing analysis; generation of reports; presentation and dissemination of results. In pedagogy, the package allows you to automate the processing and interpretation of test results.

The first version of SPSS for Windows was version 5.0. This was followed by versions 6.0, 6.1, 7.0, 7.5, 8.0, 9.0 and finally 10.0 and 11.5 and above. Starting with SPSS version 7.0, the shell is a minimum of Windows95 (NT).

As well as using its own data type, SPSS can read data from almost any type of file and use it to create reports in the form of tables, graphs and charts, as well as calculate descriptive statistics, perform complex statistical analysis and modeling.

The package has a modular structure. The package modules are an integrated set of software products that provide comprehensive research - from planning to data management, analysis and presentation of results.

Core SPSS modules: IBM SPSS Statistics Base, IBM SPSS Decision Trees, IBM SPSS Advanced Statistics, IBM SPSS Direct Marketing, IBM SPSS Bootstrapping, IBM SPSS Exact Tests, IBM SPSS Categories, IBM SPSS Forecasting, IBM SPSS Complex Samples, IBM SPSS Missing Values , IBM SPSS Conjoint, IBM SPSS Neural Networks, IBM SPSS Custom Tables, IBM SPSS Regression, IBM SPSS Data Preparation. The composition of the modules depends on the delivery option.

Basic blocks of SPSS:

Data editor is a flexible system, similar in appearance to a spreadsheet, for defining, entering, editing and viewing data.

Viewer - Makes it easy to view results by allowing you to show and hide individual output elements, change the order in which results are displayed, and move presentation-ready tables and charts to and from other applications.

Multidimensional mobile tables - used to display analysis results. You can explore tables by moving rows, columns and layers and thus identify important points that may be lost in standard reports. You can also compare groups by splitting the tables so that only one group is displayed at a time.

High-quality graphics - a means of generating full-color diagrams with high resolution: Pie and bar charts, histograms, scatterplots, 3-D charts and many others.

Database Access - Database reading designer that allows you to load data from any source with a few clicks of the mouse.

Data transformation is a data transformation tool that helps prepare data for analysis. Easily subset data, merge categories, append, aggregate, merge, split, transpose files, and perform other transformations.

Reference system:

An electronic textbook offering a detailed overview;

Context-sensitive help in dialog boxes helps you understand specific tasks;

Pop-up definitions in mobile tables explain statistical terms;

A statistics tutor helps with your search necessary procedure, - analysis examples help in interpreting the results.

Command language. Although many tasks can be performed using the mouse and dialog boxes, SPSS also has a powerful command language that allows you to save and automate many repetitive tasks. The command language also allows you to use some functionality, not accessible through menus and dialog boxes. Full documentation for command language integrated into help system and is available as a separate PDF document A guide to the syntax available from the Help menu.

The package structure includes commands for data definition, data transformation, and object selection commands. It implements the following methods of statistical information processing:

  • summary statistics for individual variables;
  • frequencies, summary statistics and graphs for an arbitrary number of variables;
  • construction of N-dimensional contingency tables and obtaining measures of connection; means, standard deviations and sums by group;
  • analysis of variance and multiple comparisons;
  • correlation analysis; discriminant analysis; one-way analysis of variance;
  • general linear model analysis of variance (GLM);
  • factor analysis;
  • cluster analysis;
  • hierarchical cluster analysis;
  • hierarchical log-linear analysis;
  • multivariate analysis of variance; nonparametric tests; multiple regression;
  • optimal scaling methods, etc.

In addition, the package allows you to obtain a variety of graphs - bar and pie charts, box charts, scatter fields and histograms, etc.

Until recently, training and quality control in education were carried out using traditional methods, mainly by those who conduct educational process, which from the point of view of management theory does not contribute to its improvement. Today, mass testing data is processed automatically using numerous computer programs. One of these programs is SPSS, it allows you to efficiently, accurately and save time to quantitatively process the results of mass testing in any subject.

Frequency analysis allows you to determine: the frequency of each answer option to a question from the test; percentage frequency of the answer to the total number of respondents (the share of correct answers to a given question, taken as a percentage of the total number of answers); acceptable percentage (missing values ​​are excluded); accumulated percentage values ​​(this is the sum of the percentages of acceptable values).

SPSS has a wide variety of procedures that can be used to analyze the relationship between two variables. The relationship between variables belonging to a nominal scale or to an ordinal scale with not very big amount categories are best presented in the form of contingency tables. For this purpose, SPSS implements the chi-square test, which tests whether there is a significant difference between the observed and expected frequencies. In addition, it is possible to calculate various measures of connectivity.

The advantage of nonparametric methods is most noticeable when there are outliers (extremely large or small values) in the data. SPSS provides users with a large number of nonparametric tests.

The most commonly used tests are tests for comparing two or more independent or dependent samples. These are Mann-Whitney U test, Kruskal-Wallis H test, Wilcoxon test and Friedman test. The Kolmogorov-Smirnov test for one sample also plays an important role, which can be used to test the presence of a normal distribution. Nonparametric tests can, of course, also be used in the case of normal distribution of values. But in this case they will only have 95% efficiency compared to parametric tests. If you want, for example, to make a multiple comparison of the means of two independent samples, where the samples are partially subordinate normal distribution, and partly not, then it is recommended to always use the Mann and Whitney U test.

Factor analysis is a procedure by which big number variables related to the available observations is reduced to a smaller number of independent influencing quantities, called factors. In this case, variables that are highly correlated with each other are combined into one factor. Variables from different factors are weakly correlated with each other. Thus, the goal of factor analysis is to find complex factors that explain as fully as possible the observed relationships between the available variables.

Factor analysis is possible if a number of criteria are met. Qualitative data cannot be factorized. The variables must be independent and their distribution must be close to normal. The relationships between the variables should be approximately linear, and in the original correlation matrix there are several correlations in magnitude above 0.3; the sample of subjects must be large enough.

Aptitude analysis (also: question analysis or task analysis) helps select questions (tasks) for tests. Using various criteria, it is determined which tasks are suitable for a particular test and which are not.

For this purpose, a certain population (sample) of respondents is offered a preliminary version of the test with all the proposed tasks and an analysis of these tasks is carried out. Using this analysis, inappropriate tasks are eliminated, and the remaining ones are included in the final test form. The tests are divided depending on the type of personality trait being studied, namely the level of education test, the ability test and the personality test. Test consists primarily of two parts: a problem or question and a solution to the problem or answer.

With the advent of mass centralized testing in our country, forms of independent certification of students appeared: blank and computer testing, teletesting, and a unified state exam. Distinctive feature Such control of the level of students' preparation is a procedure based on a pedagogical test as a measurement tool that has certain metric properties: accuracy, reliability, differentiating ability, validity, etc.

Modern testing methods now already allow for sufficient high level conduct final certification of graduates throughout the country at the same time using the same level of difficulty pedagogical meters or test and measuring materials (CMMs), new generation tests, with widespread use information technologies.

In addition, modern technology and software products automated verification test results significantly increase objectivity and reliability educational statistics, simplify the work of inspectors, provide the ability to compare and contrast average certification scores in any territory and for any sample of test takers, making it possible to analyze the level of training and the reasons that ensure it. Using the SPSS computer program, accurate and fast processing of test results is possible.

The reliability of the data is ensured by counting significant differences according to Student's T-test using the computer program "SPSS 17 for Windows".

Conclusion. The SPSS program is an effective tool for practical work in the field of sociological and pedagogical analysis and provides fast and accurate data processing. The main feature of this program is that the results of the analysis can be visually presented in the form of tables and charts of various types, distributed to network users, and implemented the results obtained in other software systems.

Bibliographic link

Davydova M.A., Usataya I.E. CAPABILITIES OF THE SPSS PROGRAM IN PROCESSING MASS TESTING DATA // International Student Scientific Bulletin. – 2017. – No. 2.;
URL: http://eduherald.ru/ru/article/view?id=16902 (access date: 03/28/2019). We bring to your attention magazines published by the publishing house "Academy of Natural Sciences"



1. Notes on the SPSS program, what kind of program it is, what are its advantages. 3

1.1. Data analysis in psychological research. 5

2. According to publications in periodicals, the Internet, etc. select sufficient information for analysis and carry it out with explanation, draw a conclusion. 9

2.1. An example of using the program when calculating the correlation coefficient 13

References.. 19

Notes on the SPSS program, what kind of program, what are its advantages

Analysis of literature on mathematical data processing in psychological research and the results of the survey made it possible to identify four main programs used by psychologists. These include software products such as Statistica, SPSS, Stadia and MS Excel. So famous math programs like MatLab, Maple, Mathematica and Mathcad are practically not used in psychological research due to their complexity. A more reliable and well-proven program is SPSS Statistics.

SPSS Statistics(English abbreviation) "Statistical Package for the Social Sciences"- “statistical package for social sciences”) - computer program for statistical data processing, one of the market leaders in the field of commercial statistical products designed for applied research in the social sciences.

SPSS is integrated system data analysis. SPSS can use data from almost all types of files and generate tabular reports, graphs, distributions and trends, descriptive statistics, and complex statistical analyses.

The program provides a full range of data analysis methods, from descriptive statistics to complex species analysis (variance, factor, spectral, etc.). The results are presented using various types charts and histograms. At the same time, the user is given the opportunity to create diagram templates himself. But the main feature of SPSS is its integration with a large number of external programs(MS Excel, dBASE, Lotus, SQL, SYSTAT, etc.) and formats (XML, HTML, PC, SAS, etc.). Another important feature of the program is support for modern software solutions. So, latest version SPSS programs are built on the basis client-server architecture, it was announced that the new version of the program will be fully compatible with Windows Vista.

Between 2009 and 2010, the name of the SPSS software was changed to PASW (Predictive Analytics SoftWare) Statistics.

On July 28, 2009, the company announced that it had been acquired by IBM for US$1.2 billion. As of January 2010, the company became "SPSS: An IBM Company".

Norman Nye, Hedley Hull and Dale Bent developed the first version of the system in 1968, then the package was developed within the University of Chicago. The first user manual was published in 1970 by McGraw-Hill, and in 1975 the project became a separate company. SPSS Inc. The first version of the package for Microsoft Windows was released in 1992. On this moment there are also versions for MacOs X and Linux.

In 2009, SPSS rebranded its statistical package to PASW Statistics (Predictive Analytics SoftWare). On July 29, 2009, SPSS announced that it was being acquired by IBM.

Features and benefits of the program.

· Data entry and storage.

· Ability to use variables of different types.

· Frequency of features, tables, graphs, contingency tables, diagrams.

· Primary descriptive statistics.

· Marketing research

· Analysis of marketing research data

IBM SPSS Statistics 18 runs under Windows XP, Windows Vista (32- or 64-bit editions), Windows 7, Mac OS X 10.5, Mac OS X 10.6 and Linux for x86. Requires 800 MB of hard disk space and 1 GB of RAM.

Modern psychology widely uses a wide variety of statistical methods. They allow you to clearly describe a phenomenon or process, identify patterns, draw conclusions or make a forecast. As E.V. writes Sidorenko: “It has become customary to use mathematical methods, just as it is customary for a young man to marry if he wants to make a diplomatic or political career...” At the same time, the “fashion” sometimes reaches the point that when planning an experiment it is proposed to build a hypothesis based on the calculation of certain statistical procedures for obtaining results, their evaluation and analysis, and statistical verification of conclusions is considered mandatory.
We can say that the SPSS program is the most functional and supports the most modern technologies. However, its price and modular structure mean that SPSS is intended for use in commercial projects.

The main advantage of the SPSS software package, as one of the most significant achievements in the field of computerized data analysis, is the widest coverage of existing statistical methods, which goes well with a lot of convenient means visualization of processing results. Software package SPSS has been developing for 35 years, the most recent version 11, released in May 2002, provides ample opportunities not only in the field of psychology, sociology, biology and medicine, but also in the field of marketing research and product quality management, which significantly expands applicability of the complex.

The proposed book contains the minimum required amount of information on the theory of statistical analysis. The main attention is focused on the features of using individual methods, the opportunities that these methods provide, as well as the interpretation of the results of using these methods. And of course, the book describes the presentation capabilities of SPSS 10/11, which significantly exceed the scope of functions provided by standard business programs such as Excel.

At the end of the book there is a table of correspondence between English and Russian SPSS 10/11 menu items, as well as the names of statistical procedures, in order to facilitate the transition to the Russian version.

The material presented in the book is sufficient for a student or young scientist to take their first steps in summarizing statistical data and searching for hidden patterns, and for experienced professionals to gain another most powerful tool, increasing the efficiency of practical activities.

The book is intended for a wide range of readers specializing in data processing in marketing, sociology, psychology, biology and medicine
Contents in full news

Illustrated tutorial on SPSS

Chapter 1. SPSS Program
Chapter 2. Installation
Chapter 3: Data Preparation
Chapter 4. SPSS for Windows - Overview
Chapter 5. Fundamentals of Statistics
Chapter 6. Frequency analysis.
Chapter 7: Data Selection
Chapter 8. Data modification
Chapter 9. Statistical characteristics
Chapter 10: Data Exploration
Chapter 11. Contingency tables
Chapter 12: Multiple Response Analysis
Chapter 13. Comparison of averages
Chapter 14. Nonparametric Tests
Chapter 15. Correlations
Chapter 16. Regression analysis
Chapter 17. Analysis of Variance
Chapter 18. Discriminant Analysis
Chapter 19. Factor analysis
Chapter 20. Cluster Analysis
Chapter 21. Suitability Analysis
Chapter 22. Standard graphs
Chapter 23. Interactive graphs
Chapter 24. Tables Module
Chapter 25: Exporting Output
Chapter 26. Programming
Chapter 27. Innovations in the 11th version of SPSS
Application. Overview of SPSS Procedures

Introduction to

SPSS For Windows

Brief information about the program.

SPSS For Windows powerful system statistical analysis and data management. Many features are especially useful for those involved in conducting surveys and market research.

Except simple interface for statistical data analysis designed to work with a mouse, in SPSS For Windows There is:

Data editor. A flexible spreadsheet-like system for defining, entering, editing, and viewing data.

Output Results Window (Viewer) . The Output window makes it easy to view results by allowing you to show and hide individual input elements, change the order in which results are displayed, and move presentation-ready tables and graphs from SPSS to other applications.

Table editor. You can explore tables by moving rows, columns, and layers to identify important points that might get lost in standard tables. You can also compare groups, split tables, and other possibilities.

Chart editor. High-quality graphics for pie and bar charts, histograms, scatter histograms, 3D charts, and many others are included in the base SPSS module.

Command editor. Although many tasks can be accomplished using the mouse and dialog boxes, SPSS also has a powerful command language that allows you to save and automate many repetitive tasks.

Database Reader Builder allows you to download data from any source with just a few clicks of the mouse.

Email, containing the results of the analysis, can be created with one click of the mouse button. You can also export tables and charts in HTML format for distribution over the Internet or Intranet.

reference system includes an Electronic Textbook offering a detailed overview; contextual Help in dialog boxes to help you understand specific tasks; pop-up definitions in mobile tables explaining statistical terms; Statistics tutor who helps in finding the required procedure; a Examples of analysis help in interpreting the results.

New additional module SPSSComplexSamples provides a specialized tool for designing and analyzing data from surveys and surveys that have used both simple and complex sampling.


Data editor- this is a window similar to appearance to a spreadsheet window for creating and editing data files. The Data Editor window opens automatically when you start SPSS.

The editor window simultaneously contains two sheets, two windows for working with data. In the lower left corner of the editor you can see two tabs: “Data” and “Variables”.

Data. In this mode, you can view and edit actual data values.

Variables. In this view, you can view and edit variable properties, including variable and value labels, data types (for example, text, date, or number), measurement scale types (nominal, ordinal, or scale), and user-defined missing values.

For example, let’s imagine that we are talking about an SPSS data file with the results of a simple employee survey.

In mode "data" we will see specific answers to questions received from each respondent. Moreover, each row in the spreadsheet is an observation, that is, one questionnaire (one respondent), and each column is a variable, that is, a specific question in the questionnaire (or indicator). Each cell contains the answer of an individual respondent to a particular question in the survey.

In mode "variables" we will see a description of the above-mentioned characteristics of each variable, that is, each survey question (observation program). Each line is a separate variable, or one question. Each column is a specific property of a particular variable.

Variable properties:

1. Variable name.

The name must begin with a letter and must not end with a period. The name should not contain spaces or special characters (!, ?, *, etc.), and the underscore _ should be avoided at the end of the name. The name length must not exceed 64 characters.

2. Variable type.

It indicates what kind of variable we are talking about: numeric, text, date format, or other options.

3. The number of digits or characters in the variable. Sets the maximum number of characters in the variable value.

4. Number of decimal places. Sets the number of decimal places to display.

5. and 6. Descriptive variable and value labels.

Variable labels explain the content of the variable (essentially the content of the question or indicator itself), can be up to 256 characters and contain spaces and symbols that are not allowed in variable names.

Value labels explain the content of each individual variable value (for example, clarify that 1 means male, 2 means female) can be up to 60 characters long and do not apply to long text variables.

7. Missing values.

Certain variable values ​​are set as user-missing values. For example, you want to summarize the results of a survey on a given issue without taking into account those questionnaires that do not have an answer to this question. Values ​​marked as user-missing are marked for special processing and excluded from most calculations.

Up to three separate custom missing values ​​can be specified for each variable at a time; missing value ranges can only be specified for numeric variables.

8. Column width.

9. Alignment of values ​​in a column. Possible left, right, or center alignment.

10. Measurement scale (important when constructing tables).

You can choose one of three measurement scales:

Quantitative. Data values ​​are numeric values ​​(eg, age, income).

Ordinal. Data values ​​represent categories (gradations) with some natural ordering (for example: low, average, high or: completely dissatisfied, somewhat dissatisfied, somewhat satisfied, completely satisfied). Ordinal variables can be text or numeric values ​​representing different categories (for example: 1-low, 2-medium, 3-high).

Nominal. Data values ​​represent categories (gradations) for which a natural ordering is not specified (examples include departments of a company, constituent entities of the Russian Federation).

All variable properties can be changed by changing the values ​​in the cells in the " tab variables." Clicking on a specific cell brings up a window where you can change the properties of the variable. Additionally, cell values ​​can be copied and pasted into other cells. This is especially useful when specifying value labels and missing values ​​for multiple variables of the same type.


You can enter data directly into the Data Editor in the Data tab in any cell. For variable types other than simple numeric, you must first set the variable type before entering data.

If you enter a value in an empty column, the Data Editor will automatically create a new variable and give it a name ( VAR00001 ) and default format ( numerical).

In addition, the data can be prepared in advance by other software tools. SPSS allows you to open and work with data files of any format. For example, to open a file in *.xls format, you need to click File...Open...Data...

If the data is stored in a database, then in order to open it, you need to use the Database Designer (File...Open database...New query...).


Calculation of variables.

Select from the menu:


Calculate variable...

Enter calculated variable name . It can be an existing or a new variable. If you have selected an existing one, you should keep in mind that the calculated new values ​​will replace the existing values ​​and there will be no return to the old values. Let's enter, for example, the name « godrab » , which will mean “Number of years of work at this place.” We will enter this label by clicking on “Type and label”.

After pressing the “continue” key, you can enter the calculation formula. In this case, you can use more than 70 built-in functions, including arithmetic, statistical, text and distribution functions. In our example we have a variable « jobtime" – operating time from the moment of receipt (months). In order to convert months into years, we just need to divide this variable by 12. We put this formula into the calculation:

After pressing the “OK” key, an additional column with a variable appears in the data editor « godrab » , where is the number of years worked at this place of work, and a new variable has been added to the variables tab.

Note that functions and arithmetic expressions handle missing values ​​differently. In the expression:

(var1 + var2 + var3) / 3

the result will be a missing value if the value of at least one of the three variables is a missing value.

In the expression:

MEAN (var1, var2, var3)

the result will be a missing value only if all three variables are missing values.

You can specify a minimum number of values ​​that must not have missing values, for example, the average of three variables can be calculated if at least two of them have values:

MEAN.2 (var1, var2, var3)

Using the “If” button, you can make calculations not for all values ​​of the source variable, but only for those for which one or another condition is met.

Recoding variables.

The initially collected data can be recoded using SPSS tools. This is necessary when the initial diversity of source data is not needed for subsequent analysis. Recoding in this case means reducing the amount of information processed.

Select from the menu:



Into other variables...

It is best to choose recoding into other variables rather than recoding into the same variables. Imagine that you are converting age in numeric values ​​into interval values. If the recoding mode into the same variables is selected, then the original age data will be erased by intervals and it will no longer be possible to restore them.

Enter a name for each output (new) variable and click Change.

Click the button Old and new meanings and set the recoding of values.

Old meaning– value(s) to be recoded. Meaning. A separate old value that needs to be recoded into a new one. System missed (or user too). Such values ​​(unfilled numeric fields, non-responses of respondents) sometimes need to be separated into a separate group. Range. Available only for numeric variables and allows you to combine several old values ​​in the selected range into one new value (interval grouping).

New meaning– the value into which one or more old values ​​will be recoded. Can choose Copy old value for those where recoding is not needed. You can also recode old values ​​of a numeric variable into new text ones by selecting New variables - text.


Sorting observations.

Select from the menu:


Sort observations...

You can select one or more variables. If, for example, you select floor And nationality, then first the observations are sorted by semi, and then within each resulting category are sorted by variable values nationality.


Select from the menu:



As a result of transposition, a new file is created in which the rows and columns are swapped.

Merging data files.

Files can be combined in two different ways:

– Merge files containing the same variables but different observations

– Merge files containing the same observations, but different composition of variables.

In the first case, select from the menu:


Merge files

Add observations...

After that, select the data file that you want to add to the open data file. Remove from list Variables in the new working data file all variables that should not be in the merged file. From the list Unpaired Variables add any variable pairs that represent the same variable but written under different names in two files.

In the second case, select from the menu:


Merge files

Add variables...

Before merging, you must ensure that the cases in both files are sorted in the same order, especially if you are using a key merge. Variable names in the second data file that match variable names in the working data file are excluded by default because they are assumed to contain the same information.

If one of the files is missing some individual observations, then key variables can be used for correct merging.

Time series transformations.

Time series transformations assume a data file structure in which each row (observation) represents a set of characteristics at a certain point in time, and the time intervals between observations are equal.

Procedure Set dates generates variables that can be used to isolate periodic components of a time series.

Observations are. Here you set the time units that will be used to create dates.

First observation. This specifies the start date value that will be assigned to the first observation. Subsequent observations will be assigned sequential values ​​based on the specified time interval.

Select from the menu:


Set dates...

Select a time interval from the list Observations are.

Enter date values ​​in the fields First observation.

Variables created by a procedure Set dates differ from variables formatted like Data, which is determined when setting the properties of the variables. Values ​​of variables created by the procedure Set dates, are positive integers, each of which represents the number of days, weeks, hours, or other units of time that have passed since the start time you specified.

Select from the menu:


Create a time series...

The Create Time Series procedure is used to create new variables that are functions of the existing variables that make up the time series.

Functions for creating time series include differences, moving averages, moving medians, lags, and leads.

Some time series analysis procedures do not work when there are missing values. The Replace Missing Values ​​window specifies parameters for new variables that contain time series in which missing values ​​are replaced with estimates that can be calculated in one of several ways.

Select from the menu:


Replace missing values...

Select the method you want to use to replace missing values.


The Frequencies procedure allows you to calculate statistics and construct charts that are useful for describing many types of variables.

Select from the menu:


Descriptive Statistics


Select one or more categorical or quantitative variables.

Additionally you can:

    Click the Statistics button to specify the calculation of descriptive statistics for quantitative variables (mean, mode, median, etc.).

    Click the Charts button to display bar charts, pie charts, and histograms.

    Click the Format button to specify the order in which the results will be displayed.

Example output:


Number of years spent in education

Number of years spent in education

Valid percentage

Cumulative percentage




The Descriptive Statistics procedure displays univariate summary statistics for multiple variables in a single table.

Select from the menu:


Descriptive Statistics


Example output:

Descriptive Statistics




Number of years spent in education

Starting salary

Current salary

Working time from the moment of admission (months)

N valid (entire)


Select from the menu:


Descriptive Statistics

The Cross Table procedure generates two-dimensional and multidimensional tables and also calculates a range of criteria and measures of relationship strength for two-dimensional tables. Thus, contingency tables are used when we are interested in bivariate analysis, and also when we need to find out whether a relationship exists between two variables.

Contingency tables...

Additionally you can:

Select one or more row variables and one or more column variables.

Select one or more variables for layers;

Click on the Statistics button and select the desired criteria and measures of connection strength for two-dimensional tables and subtables;

Click the Cells button to display observed and expected values, percentages, and balances;

Example output:

Click the Format button to specify the order in which the categories should be arranged.

Belonging to a national minority

Secretariat employee

Mid-level employee