Basics of working with the spss program lesson. Basics of working in SPSS. Basic values ​​of variable parameters

1

Recently, widespread use in the education system information Technology. To obtain quantitative indicators about the quality of test subjects’ preparedness, processing of large volumes of data is required mass testing. For this purpose various software environments, among which a special place is occupied by the SPSS program - a universal system for statistical analysis and data management. The main blocks of SPSS: data editor; viewer; multidimensional mobile tables; high quality; access to databases; data transformation; reference system; command language. Using the computer program SPSS, it is possible to accurately and fast processing test results. The SPSS program is an effective tool for practical work in the field of sociological and pedagogical analysis, provides fast and accurate data processing.

Unified State Exam.

teletesting

computer testing

blank testing

mass centralized testing

suitability analysis

factor analysis

nonparametric methods

education

frequency analysis

spss program

latent characteristics

assessment

education

mass testing technology

systematic analysis

final examination

monitoring

information Technology

1. SPSS the art of information processing A. Byul, P. Tsefel M.: DiaSoftYUP, 2005. - 608 p.

2. Efremova N.F. Testing and monitoring: recommendations for teachers // Standards and monitoring in education. 2001. – No. 3.

3. Efremova N.F., Meskhi B.Ch. Systematicity and continuity in the formation of a fund of evaluation means of a technical university // Council of Rectors. No. 5. 2011. - pp. 35-40.

4. Nasledov A.D. IBM SPSS 20 Statistics and AMOS: professional statistical data analysis. Practical guide. St. Petersburg: Peter, 2013. – 416 p.

5. Processing and analysis of sociological data using the SPSS package. Educational and methodological manual. E.V. Expensive. Surgut. Publishing center of Surgu State University. 2010. – 60 p.

6. Patsiorkovsky V.V., Patsiorkovskaya V.V. SPSS for sociologists. Textbook / V.V. Patsiorkovsky, V.V. Patsiorkovskaya. - M.: ISEPN RAS, 2005. - 434 p.

7. Usataya I.E., Davydova M.A. Evaluation as a tool for managing the quality of teaching in educational practice // International Student Scientific Bulletin. 2016. No. 2.; URL: http://www.?id=14357

Recently, information technologies have become widespread in the education system. They are used for training, control, final certification of graduates, self-study, self-control, etc. The most important condition Improving the quality of education involves systematic analysis of objective data from independent monitoring of educational achievements, monitoring and diagnosing the preparedness of students to obtain results that correspond to their capabilities and needs. Increasing attention of researchers to solve various problems of education and self-government educational activities attracted by the possibilities of mass testing technologies.

An important role in the development of monitoring learning outcomes should be played by systematic and continuous assessment, which provides a judgment about the student’s readiness to continue studying and his participation in social and industrial activities. The difficulty lies in the fact that not only high-quality training is required, but also high-quality assessment, high-quality assessment tools and procedures, as well as providing motivation when performing tests so that the manifestation of the latent characteristics of the subjects is maximized. Therefore, assessment should be carried out as a specifically focused and orderly process of determining the set and level of preparedness achieved, and the results should be expressed quantitatively, regardless of how simple or difficult they are to evaluate.

To obtain quantitative indicators about the quality of test subjects’ preparedness, processing of large volumes of mass testing data is required. For this, various software environments are used, among which a special place is occupied by the SPSS Statistics (“Statistical Package for the Social Sciences”) program - this is a “statistical package for the social sciences. It is the market leader in commercial statistical products for applied research in the social and educational sciences. SPSS is a universal system for statistical analysis and data management. This acronym originally stood for Statistical Package for the Social Science. The original acronym was then given a new interpretation: Superior Performance Software System.

In the early 1970s, Norman Nie, Dale Bent and Hadlai Hull registered the SPSSR statistical software trademark. The company of the same name was created by them in 1968. In 1975, the company was transformed into a corporation with its main office in Chicago (Chicago, IL USA). Over the years of its existence, the corporation has developed many software products, including SPSS/PC+TM, the first version of which appeared in 1984. In 2009, the package became known as PASW Statistics (Predictive Analytics SoftWare - intelligent analytical software). Since July 2009, the package has been maintained by IBM (International Business Machines) under the name IBM SPSS Statisics. In 2013, the next version of the package was released - IBM SPSS Statistics 22, running under various operating systems Windows, MacOsX, Linux.

By all measures, SPSS is a sophisticated and powerful statistical package. Using the SPSS package, you can carry out almost any data analysis, and the latest versions of the program are used in a wide variety of scientific fields, including in the educational sciences. Today SPSS is a software product and at the same time a secure trademark the world famous American company SPSS Inc., whose board of directors remains in Chicago. This package occupies a leading position among programs designed for statistical processing information in social and educational sciences. Along with all software specified profile it has come a long way of evolution: first from the first versions of SPSS for mainframe computers, to versions oriented on PC-DOS/MS-DOS, and then to versions running in the Windows environment. SPSS Presents Friendly user interface, which makes the entry and statistical analysis process accessible to the beginner and user-friendly to the advanced user. The package data editor allows you to conveniently ( tabular method) enter and correct input data. SPSS makes it possible to produce a variety of high-quality graphs and various charts. With the help of the package, using tables, simple menus and dialog boxes, you can, firstly, analyze huge data files with thousands of variables, and, secondly, do all this without writing commands in a programming language. Using SPSS you can: manage data; organize data; transform data, create new variables; analyze data.

Possible areas of application of SPSS: storage and analysis of survey data, marketing research and sales, financial analysis, etc. In sociology and pedagogy, the package allows you to automate the process of creating databases of various information, their storage and processing. Stages of the analytical process implemented in SPSS: planning; data collection; providing access to data; preparing data for analysis; performing analysis; generation of reports; presentation and dissemination of results. In pedagogy, the package allows you to automate the processing and interpretation of test results.

The first version of SPSS for Windows was version 5.0. This was followed by versions 6.0, 6.1, 7.0, 7.5, 8.0, 9.0 and finally 10.0 and 11.5 and above. Starting with SPSS version 7.0, the shell is a minimum of Windows95 (NT).

Along with using your own type Data, SPSS, can read data from virtually any type of file and use it to create reports in the form of tables, graphs and charts, as well as calculate descriptive statistics, perform complex statistical analysis and modeling.

The package has a modular structure. The package modules are an integrated set of software products that provide comprehensive research - from planning to data management, analysis and presentation of results.

Core SPSS modules: IBM SPSS Statistics Base, IBM SPSS Decision Trees, IBM SPSS Advanced Statistics, IBM SPSS Direct Marketing, IBM SPSS Bootstrapping, IBM SPSS Exact Tests, IBM SPSS Categories, IBM SPSS Forecasting, IBM SPSS Complex Samples, IBM SPSS Missing Values , IBM SPSS Conjoint, IBM SPSS Neural Networks, IBM SPSS Custom Tables, IBM SPSS Regression, IBM SPSS Data Preparation. The composition of the modules depends on the delivery option.

Basic blocks of SPSS:

The data editor is a flexible system that looks similar spreadsheet, to define, enter, edit and view data.

Viewer - Makes it easy to view results by allowing you to show and hide individual output elements, change the order in which results are displayed, and move presentation-ready tables and charts to and from other applications.

Multidimensional mobile tables - used to display analysis results. You can explore tables by moving rows, columns and layers and thus identify important points that may get lost in standard reports. You can also compare groups by splitting the tables so that only one group is displayed at a time.

High-quality graphics - a means of generating full-color, high-resolution charts: pie and bar charts, histograms, scatterplots, 3-D charts and many others.

Database Access - Database reading designer that allows you to load data from any source with a few clicks of the mouse.

Data transformation is a data transformation tool that helps prepare data for analysis. Easily subset data, merge categories, append, aggregate, merge, split, transpose files, and perform other transformations.

Reference system:

An electronic textbook offering a detailed overview;

Context-sensitive help in dialog boxes helps you understand specific tasks;

Pop-up definitions in mobile tables ah explain statistical terms;

A statistics tutor helps in finding the required procedure, and examples of analysis help in interpreting the results.

Command language. Although many tasks can be accomplished using the mouse and dialog boxes, SPSS also has a powerful command language that allows you to save and automate many repetitive tasks. The command language also allows you to take advantage of some functionality that is not available through menus and dialog boxes. Complete command language documentation is integrated into help system and is available as a separate PDF document A guide to the syntax available from the Help menu.

The package structure includes commands for data definition, data transformation, and object selection commands. It implements the following methods of statistical information processing:

  • summary statistics for individual variables;
  • frequencies, summary statistics and graphs for an arbitrary number of variables;
  • construction of N-dimensional contingency tables and obtaining measures of connection; means, standard deviations and sums by group;
  • analysis of variance and multiple comparisons;
  • correlation analysis; discriminant analysis; one-way analysis of variance;
  • general linear model analysis of variance (GLM);
  • factor analysis;
  • cluster analysis;
  • hierarchical cluster analysis;
  • hierarchical log-linear analysis;
  • multivariate analysis of variance; nonparametric tests; multiple regression;
  • optimal scaling methods, etc.

In addition, the package allows you to obtain a variety of graphs - bar and pie charts, box charts, scatter fields and histograms, etc.

Until recently, training and quality control in education were carried out traditional methods mainly by those who carry out educational process, which from the point of view of management theory does not contribute to its improvement. Today, mass testing data is processed automatically using numerous computer programs. One of these programs is SPSS, it allows you to efficiently, accurately and save time to quantitatively process the results of mass testing in any subject.

Frequency analysis allows you to determine: the frequency of each answer option to a question from the test; percentage frequency of the answer to the total number of respondents (the share of correct answers to a given question, taken as a percentage of the total number of answers); acceptable percentage (missing values ​​are excluded); accumulated percentage values ​​(this is the sum of the percentages of acceptable values).

SPSS has a wide variety of procedures that can be used to analyze the relationship between two variables. The relationship between variables belonging to a nominal scale or to an ordinal scale with not very big amount categories are best presented in the form of contingency tables. For this purpose, SPSS implements the chi-square test, which tests whether there is a significant difference between the observed and expected frequencies. In addition, it is possible to calculate various measures of connectivity.

The advantage of nonparametric methods is most noticeable when there are outliers (extremely large or small values) in the data. SPSS provides users with a large number of nonparametric tests.

The most commonly used tests are tests for comparing two or more independent or dependent samples. These are Mann-Whitney U test, Kruskal-Wallis H test, Wilcoxon test and Friedman test. The Kolmogorov-Smirnov test for one sample also plays an important role, which can be used to test the presence of a normal distribution. Nonparametric tests can, of course, also be used in the case of normal distribution of values. But in this case they will only have 95% efficiency compared to parametric tests. If you want, for example, to make multiple comparisons of the means of two independent samples, with the samples partly following a normal distribution and partly not, then it is recommended to always use the Mann and Whitney U test.

Factor analysis is a procedure by which a large number of variables related to existing observations is reduced to a smaller number of independent influencing quantities, called factors. In this case, variables that are highly correlated with each other are combined into one factor. Variables from different factors are weakly correlated with each other. Thus, the goal of factor analysis is to find complex factors that explain as fully as possible the observed relationships between the available variables.

Factor analysis is possible if a number of criteria are met. Qualitative data cannot be factorized. The variables must be independent and their distribution must be close to normal. The relationships between the variables should be approximately linear, and in the original correlation matrix there are several correlations in magnitude above 0.3; the sample of subjects must be large enough.

Aptitude analysis (also: question analysis or task analysis) helps select questions (tasks) for tests. Using various criteria, it is determined which tasks are suitable for a particular test and which are not.

For this purpose, a certain population (sample) of respondents is offered a preliminary version of the test with all the proposed tasks and an analysis of these tasks is carried out. Using this analysis, inappropriate items are eliminated, and the remaining ones are included in the final test form. The tests are divided depending on the type of personality trait being studied, namely the level of education test, the ability test and the personality test. Test consists primarily of two parts: a problem or question and a solution to the problem or answer.

With the advent of mass centralized testing in our country, forms of independent certification of students appeared: blank and computer testing, teletesting, and a unified state exam. A distinctive feature of such control of the level of students’ preparation is the procedure, which is based on a pedagogical test as a measurement tool that has certain metric properties: accuracy, reliability, differentiating ability, validity, etc.

Modern testing methods now make it possible to carry out at a sufficiently high level the final certification of graduates throughout the country at the same time using the same level of difficulty pedagogical meters or control and measuring materials (CMMs), new generation tests, with a wide range of using information technology.

Besides, modern technology And software products automated verification testing results significantly increase the objectivity and reliability of educational statistics, simplify the work of inspectors, provide opportunities to compare and contrast average certification scores in any territory and for any sample of test takers, making it possible to analyze the level of training and the reasons that ensure it. Using the SPSS computer program, accurate and fast processing of test results is possible.

The reliability of the data is ensured by counting significant differences according to Student's T-test using the computer program "SPSS 17 for Windows».

Conclusion. The SPSS program is an effective tool for practical work in the field of sociological and pedagogical analysis and provides fast and accurate data processing. The main feature of this program is that the analysis results can be visually presented in the form of tables and diagrams various types, distribute to network users, implement the results obtained in other software systems.

Bibliographic link

Davydova M.A., Usataya I.E. CAPABILITIES OF THE SPSS PROGRAM IN PROCESSING MASS TESTING DATA // International Student Scientific Bulletin. – 2017. – No. 2.;
URL: http://eduherald.ru/ru/article/view?id=16902 (access date: 03/28/2019). We bring to your attention magazines published by the publishing house "Academy of Natural Sciences"

Before directly starting to process the data from the study in SPSS, it is necessary to properly organize data entry.

Entering research data into the program can be divided into 2 main stages:

· Preparing the basis of the questionnaire

Direct data entry

Let's take a closer look at these procedures.

The stage of preparing the basis of the questionnaire. In SPSS, data is entered in a specific format. In order to prepare a form for entering and further processing of data, you must initially enter the questionnaire template in a form acceptable to the program. General form The program window looks like shown in Figure 1.

Rice. 1. General view of the SPSS program after launch.

When the program is launched for the first time, the user is presented with an additional dialog box in which it is proposed to select actions related to editing existing base, opening existing file and so on. As a rule, in most cases this window does not bear a significant load. For this reason, we recommend checking the box next to “Don`t show this dialog in the future”. The general initial appearance of the program is, in principle, standard for most programs developed for the Windows operating system. General navigation bar, window appearance and window management are almost completely identical to most programs office applications. For this reason, we will focus on the distinctive features of the SPSS program itself.

Fig.2. SPSS workspace.

In SPSS there are 2 fields organized as tabs, similar to Excel program. However, these fields are far from equivalent. Figure 2 shows the working field of the program, into which the user directly enters data from questionnaires (data view). However, before entering data, you need to create a questionnaire template in the program, its basis. The questionnaire template is entered in the variable definition field - Variable View. In SPSS, data is entered in a specific format. In SPSS, all variables (when entered) are arranged vertically, and the observation is horizontal. Let's take a closer look at the Variable View field (Figure 3).

Fig.3. Variables window view.

Each variable is a question in the questionnaire. By default, SPSS has 10 basic characteristics that can be used to describe a variable: name, type, width, decimals, label, values, missing, columns, align, and measure. In principle, according to the significance and importance of filling, these variables can be divided into those that relate to the parameters of the variable definition and those that are responsible for the convenience of output.

Basic values ​​of the variable parameters:


Name- variable name that will be displayed in the input field. The program uses the same name to identify the variable. The name must not exceed 8 characters and be in English only. (In more later versions programs, you can use Russian text)

Type- definition of the variable type. In other words, what information is entered as values: number, date, random value, comma, etc. The most commonly used formats are “numeric” (Nymeric), date (Date) and string (text, String). In the first case, any number can be taken as a value, in the second - a date in a certain format, in the last - text.

Width- length of the variable. The number of digits that can fit in a cell.

Decimals- the number of decimal places after the decimal point.

Label- name, label, user variable, more detailed description variable. It is usually formulated precisely as the survey question itself. Used in reports and allows you to use any font.

Values- labels of the variable values ​​that the variable can take. In SPSS, data is presented primarily in numerical format, because... the text format is not amenable to statistical analysis. For example, gender can be coded as 1 - male, 0 - female. When entering values, it is very important to follow the sequence when determining the ranking scale - the values ​​​​must be in ascending order. An example of incorrect data entry will be discussed below. To determine the metric scale, the values ​​may not be specified.

Value labels are entered in an additional window.

Fig.4. Determining the type of a variable.

Missing- identification of missing values. They can be set automatically by the system (System-defined missing values) or by the user (User-defined missing values).

Columns- determining the column width.

Align- alignment in the cell (left edge, right, center).

Rice. 5. Determining the value of a variable.

Measure- definition of the variable scale. Scale - number, metric scale; ordinal - ranking scale; nominal - nominal. An extremely important characteristic, since processing will depend on the correct choice of scale type. The program contains graphic hint- a pictogram opposite each type of scale (ruler - as a result of measurement - a number; increasing histogram - determination of rank; circles of sets - incomparable characteristics indicating disjoint sets).

Rice. 6. Selecting a scale type in SPSS.

Let's look at the types of measuring scales in a little more detail.

In principle, the type of scale itself is determined by the researcher already at the stage of searching for empirical indicators of the characteristics being measured during the preparation of a sociological research program. In its final form, the scale is embodied directly in the survey question. It is very important to comply with the requirements for wording alternative options. From the point of view of the SPSS program, the most important requirement- disjointness of the resulting subsets formed by alternative questions. Otherwise, it is quite difficult when processing data (more precisely, when entering data) to determine exactly the interval, the subset to which the respondent actually attributed this question.

For example, alternative answers to the age question may include intervals such as up to 15 years, 15-20, 20-25, 25-30, 30 and older. With this formulation, a problem arises in relation to such results when the respondent turns out to be 15, 20, 25 and 30 years old - i.e. when he hits the boundary. The respondent can randomly, based on some of his prejudices) mark any interval - both higher and lower. When processing data, this fact can distort the actual picture. If we consider general classification scales, then it can be represented in the form of the following diagram.

Rice. 7. Classification of scales.

The dotted lines in the figure indicate arrows leading to the interval scale. The fact is that the interval scale is not metric in the strict sense, but is classified as non-metric. However, in some cases, for example, when the intervals are equal, you can perform some mathematical operations with it that are characteristic of the metric scale.

From the point of view of conducting research and processing data, it is very important to understand the possibilities and limitations of using a particular type of measurement scale. It is important to understand that metric scales, in SPSS - the scale type, have the most powerful measuring ability in terms of analytical capabilities, because All statistical procedures can be applied to this scale practically without restrictions. Nominal - on the contrary, provide the weakest capabilities. By by and large is simply a frequency distribution and mode as an indicator of a measure of central tendency.

In practice, it is extremely important to choose the right measurement scale already at the stage of designing a questionnaire survey. It is important to understand that the more we want to get information about a particular type of question, the more we need to strive to use the metric scale. The ideal questionnaire, from the point of view of its processing capabilities, is a list of questions, each of which is measured quantitatively. On the other hand, this is practically impossible to implement in practice, both due to the impossibility of “digitizing” variables (for example, it is unrealistic to completely convert a question regarding the respondent’s gender into a metric scale), and based on the principles of dramaturgy of the questionnaire instrument itself - monotonous questions reduce the respondent’s motivation and reliability received data.

Returning to the peculiarities of defining variable parameters in the SPSS program, it can be noted that the parameters that are largely responsible for the convenience of presenting information include: columns (column width), align (cell alignment) and, to some extent, width (length) and decimals (number of decimal places). In most cases, these parameters can simply be left unchanged by agreeing with the proposed values. But you need to be careful regarding the remaining parameters for defining variables, since they will have a significant impact on the process of entering and processing information.

After defining variables in SPSS, you can go directly to entering data, which is entered into the data view field in the form of numbers or other symbols (depending on the type of variable). The next section will look at the detailed algorithm for defining variables and entering values.

Test

"STATISTICAL PROCESSING IN PSYCHOLOGICAL RESEARCH"

1. Notes on the SPSS program, what kind of program it is, what are its advantages. 3

1.1. Data analysis in psychological research. 5

2. According to publications in periodicals, the Internet, etc. select sufficient information for analysis and carry it out with explanation, draw a conclusion. 9

2.1. An example of using the program when calculating the correlation coefficient 13

References.. 19

Notes on the SPSS program, what kind of program, what are its advantages

Analysis of literature on mathematical data processing in psychological research and the results of the survey made it possible to identify four main programs used by psychologists. These include software products such as Statistica, SPSS, Stadia and MS Excel. So famous math programs like MatLab, Maple, Mathematica and Mathcad are practically not used in psychological research due to their complexity. A more reliable and well-proven program is SPSS Statistics.

SPSS Statistics(English abbreviation) "Statistical Package for the Social Sciences"- “statistical package for the social sciences”) is a computer program for statistical data processing, one of the market leaders in the field of commercial statistical products designed for applied research in the social sciences.

SPSS is a comprehensive data analysis system. SPSS can use data from almost all types of files and generate tabular reports, graphs, distributions and trends, descriptive statistics, and perform complex statistical analyses.

The program provides a full range of data analysis methods, from descriptive statistics to complex species analysis (variance, factor, spectral, etc.). The results are presented using various types of charts and histograms. At the same time, the user is given the opportunity to create diagram templates himself. But the main feature of SPSS is its integration with a large number of external programs (MS Excel, dBASE, Lotus, SQL, SYSTAT, etc.) and formats (XML, HTML, PC, SAS, etc.). Another important feature of the program is its support for modern software solutions. Thus, the latest version of SPSS programs is based on a client-server architecture, it is announced that a new version the program will be fully compatible with Windows Vista.

Between 2009 and 2010, the name of the SPSS software was changed to PASW (Predictive Analytics SoftWare) Statistics.

On July 28, 2009, the company announced that it had been acquired by IBM for US$1.2 billion. As of January 2010, the company became "SPSS: An IBM Company".

Norman Nye, Hedley Hull and Dale Bent developed the first version of the system in 1968, then the package was developed within the University of Chicago. The first user manual was published in 1970 by McGraw-Hill, and in 1975 the project became a separate company. SPSS Inc. The first version of the package Microsoft Windows published in 1992. On this moment there are also versions for MacOs X and Linux.

In 2009, SPSS rebranded its statistical package to PASW Statistics (Predictive Analytics SoftWare). On July 29, 2009, SPSS announced that it was being acquired by IBM.

Features and benefits of the program.

· Data entry and storage.

· Ability to use variables of different types.

· Frequency of features, tables, graphs, contingency tables, diagrams.

· Primary descriptive statistics.

· Marketing research

· Analysis of marketing research data

IBM SPSS Statistics 18 operates under Windows control XP, Windows Vista (32-bit or 64-bit editions), Windows 7, Mac OS X 10.5, Mac OS X 10.6 and Linux for x86. Requires 800 MB of hard disk space and 1 GB of RAM.

Modern psychology widely uses a wide variety of statistical methods. They allow you to clearly describe a phenomenon or process, identify patterns, draw conclusions or make a forecast. As E.V. writes Sidorenko: “It has become customary to use mathematical methods, just as it is customary for a young man to marry if he wants to make a diplomatic or political career...” At the same time, the “fashion” sometimes reaches the point that when planning an experiment it is proposed to build a hypothesis based on the calculation of certain statistical procedures for obtaining results, their evaluation and analysis, and statistical verification of conclusions is considered mandatory.
We can say that the SPSS program is the most functional and supports the most modern technologies. However, its price and modular structure mean that SPSS is intended for use in commercial projects.

The main advantage of the SPSS software package, as one of the most significant achievements in the field of computerized data analysis, is the widest coverage of existing statistical methods, which is successfully combined with a large number of convenient means of visualizing processing results. Software package SPSS has been developing for 35 years, the most recent version 11, released in May 2002, provides ample opportunities not only in the field of psychology, sociology, biology and medicine, but also in the field of marketing research and product quality management, which significantly expands applicability of the complex.

The proposed book contains the minimum required amount of information on the theory of statistical analysis. The main attention is focused on the features of using individual methods, the opportunities that these methods provide, as well as the interpretation of the results of using these methods. And of course, the book describes the presentation capabilities of SPSS 10/11, which significantly exceed the scope of functions provided by standard business programs such as Excel.

At the end of the book there is a table of correspondence between English and Russian SPSS 10/11 menu items, as well as the names of statistical procedures, in order to facilitate the transition to the Russian version.

The material presented in the book is sufficient for a student or young scientist to take their first steps in summarizing statistical data and searching for hidden patterns, and for experienced professionals to gain another most powerful tool, increasing the efficiency of practical activities.

The book is intended for a wide range of readers specializing in data processing in marketing, sociology, psychology, biology and medicine
Contents in full news

Illustrated tutorial on SPSS

Chapter 1. SPSS Program
Chapter 2. Installation
Chapter 3: Data Preparation
Chapter 4. SPSS for Windows - Overview
Chapter 5. Fundamentals of Statistics
Chapter 6. Frequency analysis.
Chapter 7: Data Selection
Chapter 8. Data modification
Chapter 9. Statistical characteristics
Chapter 10: Data Exploration
Chapter 11. Contingency tables
Chapter 12: Multiple Response Analysis
Chapter 13. Comparison of averages
Chapter 14. Nonparametric tests
Chapter 15. Correlations
Chapter 16. Regression Analysis
Chapter 17. Analysis of Variance
Chapter 18. Discriminant Analysis
Chapter 19. Factor analysis
Chapter 20. Cluster Analysis
Chapter 21. Suitability Analysis
Chapter 22. Standard graphs
Chapter 23. Interactive graphs
Chapter 24. Tables Module
Chapter 25: Exporting Output
Chapter 26. Programming
Chapter 27. Innovations in the 11th version of SPSS
Application. Overview of SPSS Procedures

Two students Norman Nie and Dale Bent, majoring in political science in 1965, tried to find computer program, suitable for analyzing statistical information. They soon became disillusioned with their attempts, since the available programs turned out to be more or less unusable, poorly constructed, or did not provide a clear presentation of the processed information. In addition, the principles of use changed from program to program.

So, without hesitation, they decided to develop their own program, with their own concept and uniform syntax. At that time they had at their disposal the FORTRAN programming language and Calculating machine type IBM 7090. A year later, the first version of the program was developed, which, another year later, in 1967, could run on the IBM 360. By this time, Hadlai Hull had joined the development team.

As is known from the history of the development of computer science, programs then were packages of punched cards. This is precisely what the original name of the program that the authors gave to their product indicates: SPSS is an abbreviation for Statistical Package for the Social Science.

In 1970, work on the program was continued at the University of Chicago, and Norman Nye founded the corresponding company - by that time sixty installations had already been made. The first user manual described eleven different procedures.

Five years later, SPSS has already been installed six hundred times, and under different operating systems. From the very beginning, program versions were assigned corresponding serial numbers. In 1975, the sixth version (SPSS6) was developed. Versions 7, 8 and 9 followed until 1981.

The SPSS command language (syntax) at that time was not as well developed as it is now, and was naturally focused on punched cards. Therefore, the so-called SPSS control cards consisted of an identification field (columns 1-15) and a parameter field (columns 16-80).

In 1983, the SPSS command language was completely redesigned, and the syntax became much more convenient. To mark this fact, the program was renamed SPSSX, where the letter X was intended to serve as both a version number in Roman numerals and an abbreviation for extended.

Since the use of punched cards had already become history by this point, the SPSS program and the information to be processed were stored in separate files on the hard drives of large computers, which were then used everywhere. The number of procedures has constantly increased from year to year.

With the advent personal computers A PC version of SPSS was also developed, and in 1983 a PC version of SPSS\PC+ appeared. designed for MS-DOS. Later, with the establishment of the European Trade Office in Gorinchem in the Netherlands in 1984, SPSS became widely used in Europe. It is currently the most widely used statistical analysis software worldwide.

To reflect the program's ability to be used in all areas relevant to statistical analysis, the X has again been removed from the brand name and the original acronym has been given a new meaning: Superior Performance Software System.

If the PC version of SPSS/PC+ was a slightly improved version for mainframe computers, then SPSS for operating system Windows (SPSS for Windows) was a big step forward. Firstly, this version of SPSS has all the capabilities of the version for mainframe computers, and secondly, with some few exceptions, the program can be used without special knowledge in the field application programming. Call necessary procedures Statistical analysis occurs using standard techniques used in Windows, that is, using the mouse and the corresponding dialog boxes.

The first version of SPSS for Windows was version 5. This was followed by versions 6.0 and 6.1 with some innovations in the statistical and graphical areas; version 6.1 was the first statistical program for Windows to use the 32-bit Windows 3.1 architecture. This could be noticed by the higher speed of calculations. Improvements have also been made to the user interface. In the end, version 6.1.3 was released, which could already run under both Windows 95 and NT.

In early 1996, version 7 of SPSS appeared, first as version 7.0 and then 7.5. Along with the expansion of capabilities in the field of statistics, the difference between these two versions was that in version 7.5 both the menu and the program interface were made not only in English, but also in other most common languages.

The most significant difference between version 7 compared to previous versions was absolutely new approach to display information on the screen. So, firstly, the so-called Viewer has received a new shape, and, secondly, tables of calculation results (mobile tables) have acquired a more pleasant appearance. The emerging technology of mobile tables allows you to rearrange the resulting tables in various ways.

If the predecessor of this version - version 6.1.3 could work both under the old Windows 3.1 and under new Windows 95 (NT), then SPSS version 7 could only work with availability of Windows 95 (NT).

Version 7.5 was followed by version 8.0, which progressed in improving graphical shell. The ability to create interactive charts provides many advantages over traditional charts that are standard in many other packages.

Version 9.0 included several new statistical methods, incl. multinomial logistic regression, and several new graphics capabilities that expand the scope of interactive graphs.

Since 2005, version 13 of the SPSS package has been distributed.

SPSS Modules

The core of the SPSS program is SPSS Base, which provides a variety of data access and data management capabilities. It contains analysis methods that are used most often.

Traditionally, two more modules are supplied with SPSS Base (base module): Advanced Models and Regression Models. These three modules cover the range of analysis methods included in earlier version programs for mainframe computers.

In Appendix A you can find information about which analysis methods apply to which module. A user who has purchased all these three modules may not pay attention to this application.

In addition to the three mentioned, there are a number of special additional modules and stand-alone programs, the number of which is constantly growing, so users should constantly familiarize themselves with information about innovations in SPSS.

This book covers the Basic module, as well as the Regression Models, Advanced Models, and Tables modules. The purpose of the last module is to compile presentation tables. This book does not cover loglinear models, survival analysis, multidimensional scaling, or presentation procedures.

SPSS Base

SPSS Base is included in the basic package. It includes all data entry, selection, and correction procedures, as well as most of the statistical methods offered in SPSS. Along with simple methods of statistical analysis, such as frequency analysis, calculation of statistical characteristics, contingency tables, correlations, plotting, this module includes t-tests and a large number of other non-parametric tests, as well as sophisticated methods such as multivariate linear regression analysis, discriminant analysis, factor analysis, cluster analysis, analysis of variance, suitability analysis (reliability analysis) and multidimensional scaling.

Regression Models

This module includes various methods regression analysis, such as: binary and multinomial logistic regression, nonlinear regression and probit analysis.

This module includes various methods of analysis of variance (multivariate, taking into account repeated measures), general linear model, survival analysis, including Kaplan-Meier and Cox regression, log-linear, and logit-log-linear models.

The Tables module is used to create presentation tables. It provides greater capabilities compared to simplified frequency tables and contingency tables that are built in SPSS Base (base module).

Below, in alphabetical order, is a list of other modules and programs offered for expanding SPSS.

Amos (Analysis of moment structures) includes analysis methods using linear structural equations. The purpose of the program is to test complex theoretical relationships between various features of a random process and describe them using suitable coefficients. The test is carried out in the form of causal and path analysis. In this case, the user in graphical form must define a theoretical model in which, along with direct observational data, so-called hidden elements can be included. Amos is included in the SPSS extension modules as a successor to L1SREL (Linear Structural RELationships).

AnswerTree (decision tree) includes four different methods for automatically dividing data into separate groups(segments). The division is carried out in such a way that the frequency distributions of the target (dependent) variable in different segments differ significantly. A typical example of the application of this method is the creation of characteristic buyer profiles in consumer market research. AnswerTree is the successor to the Chi squared interaction Detector program - a chi-square based interaction detector).

The module contains various methods for analyzing categorical data, namely: correspondence analysis and three different optimal scaling methods (homogeneity analysis, nonlinear principal component analysis, nonlinear canonical correlation analysis).

Clementine is a program for data mining (knowledge extraction), in which the user is offered numerous approaches to building models, for example, neural networks, decision trees, various types of regression analysis. Clementine is an analyst’s “workbench” with which you can visualize the modeling process, double-check models, and compare them with each other. For ease of use of the program, there is an auxiliary environment for implementing results.

Conjoint (joint analysis)

Conjoint analysis is used in market research to study the consumer properties of products for their attractiveness. At the same time, the interviewed respondents, at their own discretion, must arrange the proposed sets of consumer properties of products in order of preference, on the basis of which so-called detailed indicators of the utility of individual categories of each consumer property can then be derived.

Data Entry

Data program Entry is designed for quickly creating questionnaires, as well as entering and cleaning data. The questions and response categories specified during the questionnaire creation stage are then used as variable and value labels.

Exact Tests

This module is used to calculate the exact value of the probability of error (p value) in conditions of limited data when checking using the Chi-Quadrat-Test and non-parametric tests. If necessary, the Monte Carlo method can also be used for this purpose.

The program contains a special regression model for regression analysis of ordered dependent and independent variables.

Using SamplePower, the optimal sample size can be determined for most statistical analysis methods implemented in SPSS.

SPSS Missing Value Analysis

This module is used to analyze and restore patterns that govern missing values. It provides various options replacing missing values.

The Trends module contains various methods for time series analysis, such as: ARIMA models, exponential smoothing, seasonal decomposition and spectral analysis.