Data migration definition. Methodology for organizing data migration. Identified problems when developing a work and communication plan at the data migration stage

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http:// www. allbest. ru/

Introduction

1. Data migration in CIS implementation projects

1.1 Goals and strategy of the data migration process in IS implementation projects

1.2 Stages of the data migration process in IS implementation projects

1.3 Features of developing business requirements for data migration

1.4 Setting the task of developing a methodology for data migration

2. Analysis of project experience in data migration

2.1 Brief description of the IP implementation project

2.2 Identified problems when developing a work and communication plan at the data migration stage

2.3 Identified problems when developing requirements for unloading from the source system

2.4 Identified issues in documenting data migration requirements

2.5 Problems identified when testing loaded data into the receiving system

3. Methodology for organizing data migration

3.1 Sequence of steps for organizing data migration

3.2 Evaluation of the application of the developed methodology for organizing and conducting data migration

Conclusion

List of sources used

Applications

Introduction

Research into the problems of implementing corporate information systems implementation projects is relevant for possible research due to the growing interest of customers in CIS implementation projects, the purpose of which is to replace existing information systems, which is confirmed by CNews analytics. CNews analysts note the following features and trends in customer behavior:

· Interest in the quality of the project result;

· Focus of projects on optimizing existing business processes, including through improving existing automation tools;

· Increased customer attention to the quality of IS implementation project management.

Another reason for research is the quality indicators of completed projects. According to the analytical publication PCWeek, up to 75% of all IT projects fail.

Currently, quite a large number of customers already have implemented information systems, so data migration is an integral part of such projects. The results of the data migration stage have an impact on further stages of the project; as a rule, they are a condition for accepting the system into commercial operation.

Data migration is a required part of most projects. In particular, in accordance with the legislation of the Russian Federation, a detailed analysis of which in this area will be carried out later in the work, public sector organizations do not have the opportunity to refuse part of the historical data and are required to transfer electronic archives to a new information system. In the case of non-governmental organizations, the process of working with historical data may be regulated by internal corporate policies and regulations, as well as by the policies of clients of these organizations. Thus, the problems that arise at the stage of migration of historical data are relevant and can become material for further research of the topic.

When considering analytical materials on data migration issues, the following existing contradiction becomes obvious: private software solutions and methodological materials in the field of data migration are being developed, but at the same time there is no general approach to developing requirements for the migration process. There are a number of practical developments from the world's leading software vendors, for example, IBM Best Practices or Oracle White paper, but there is no generalized approach to organizing data migration that would take into account the diversity of characteristics of customers and projects. Accordingly, we can make the assumption that the proposed research problem is insufficiently studied.

The object of this study is projects to modernize or replace already implemented information systems.

The subject of the study is the stage of data migration in projects to replace existing information systems and approaches to organizing work at this stage of projects.

The goal of the work is to develop a methodology for organizing the process of data migration from the operating information system to the information system being implemented. The general goal of the work can be detailed taking into account the stages of data migration, namely: to propose approaches to planning the data migration stage, describe a methodology for developing requirements for data migration, develop approaches to testing and evaluating the results of the data migration stage.

The following thesis is considered as a research hypothesis: a methodology for organizing the data migration stage can be developed, which will improve the quality of development of business requirements for the migration process and reduce the likelihood of “triggering” the risks of this stage.

To achieve the goal of the work, it is necessary to consistently solve the following research problems:

1. Identify the most bottlenecks in the migration process that create risks for business.

2. Analyze existing methodologies for data migration, best practices of software vendors for data migration.

3. Identification of customer characteristics that influence the organization of data migration and the formation of a package of requirements for data migration.

4. Identifying business problems that arise during data migration that cannot be solved using existing automation tools;

5. Development of a methodology for organizing the data migration process that will eliminate identified business problems at all stages of migration (planning, collecting requirements, design, implementation, testing);

6. Evaluation of developed approaches and methods for organizing and conducting data migration.

As a prerequisite for conducting the study, the need to migrate the data accumulated in the existing system to the new system is accepted. Carrying out this stage of work is not required in all projects, but for the purposes of this study it is worth assuming that the customer has such a need.

As a theoretical basis for the study, we use the requirements development methodology formulated by K. Wiegers in the book “Development of Software Requirements”, as well as the basic principles of the MSF methodology, methods for solving design problems at the data migration stage, set out in the White Paper of software vendors IBM, Oracle, Microsoft.

The result of the work should be a methodology for organizing the data migration process. The expected result of the work is new, since each data migration project has individual characteristics, at the same time, the developed methodology should allow taking into account such project features and leveling out the potential risks of the project.

In particular, a set of tools for planning and structuring the work of the data migration stage will be proposed, as well as a number of methods for developing software requirements specific to the data migration stage. Such a set of tools will be offered for each stage of data migration.

The expected results of the work are a methodology for organizing data migration, which will be tested in a customer organization of a certain type (state enterprise).

The results obtained can be used when organizing projects that include data migration.

The approaches and methods proposed by the author for solving design problems can be a way to reduce the risks of the business customer that arise from errors made during the migration of data from the operating IS to the implemented IS.

The structure of the work involves a sequential description of the stages of solving the research tasks. The first part of the work will provide an overview of existing methods for data migration. The second part of the work will be devoted to analyzing the subject area of ​​projects, as well as identifying typical problems at the data migration stage that cannot be solved using existing tools and methods analyzed in the first part of the work. The third part of the study will describe the sequence of steps that will optimize the data migration process and solve the business problems identified in the second part of the work. In the third part of the work, an assessment of the proposed methodology for organizing and conducting data migration will be carried out, based on the testing of this methodology within the framework of a real project.

1. Data migration in CIS implementation projects

1.1 Goals and strategy of the data migration process in IS implementation projects

Data migration is the process of transferring information when changing an information system, storage, or changing an application version. The data migration process is usually part of one of the stages of a corporate information system implementation project, but carrying out this process may be the goal of a separate project.

The information system being implemented must replace the system or systems currently in operation. Upon completion of the CIS implementation project, as a rule, further operation of existing systems in the customer organization is not planned. In terms of the data migration process, the operating system or systems are the source systems. The implemented system is a receiver system. The implemented system should replace the automated functions of the source systems; accordingly, the implemented system will use historical data accumulated in the two operating systems.

The presence of a data migration stage in CIS implementation projects is determined by the relevant business requirements. The need to migrate data arises when solving one of the following cases:

- Creation of a new system as a result of a fundamental change in customer requirements. This situation may arise when there are critical requests for changes in functional requirements, changes in external factors affecting the business, and insufficient flexibility of the solution used.

- The merger of business units (departments, departments or organizations) makes it necessary to use a single information system instead of several systems that the customer’s employees previously worked with.

- Changing the IT infrastructure when using current CIS. Here we are talking about the need to migrate data from disparate systems when creating a common corporate data warehouse for storing data from corporate applications.

When solving the above problems, it is necessary to determine a migration strategy, guided by the principles of which, work on data migration will be carried out. Oracle experts define two types of strategies: the big bang strategy and the trickle migration strategy.

In the first case, migration is carried out simultaneously; during migration, the operation of the source system and the target system is stopped. This approach may seem attractive because it reduces the time required to carry out the migration process, but performing a “big bang” migration is often a risky decision. First of all, the risks are associated with difficulties in the organization’s operation during system shutdowns. As a rule, business customers refuse to stop information systems. To perform an effective migration process, you must make at least one test attempt before migrating the actual data. In addition to the time for testing, it is necessary to take into account when planning a possible additional migration date - a reserve day - required for re-migration in case of the first failure. It is quite difficult to plan such a process without significant loss of business performance at the time of migration, so the quality of migrated data with this approach usually suffers due to insufficient testing and lack of time to validate the migration results.

When using a smooth data migration strategy, the source and target systems work in parallel, applications are not suspended, which is usually positively perceived by customers. Taking this approach will likely add complexity to the migration tools, as you will need to keep track of the data sets that have been migrated in real time, as well as monitor changes to the data made by users of the source system. With a smooth migration, the data transfer process can be synchronized with the process of switching user groups to work with the new system. When groups of users gradually switch to work with the target system, parallel operation of both systems occurs, which may affect the safety of the migrated data. Changes to the migrated data can be made while the two systems are running in parallel, which will lead to repeated migration of this data.

When formulating a data migration strategy, there are a number of typical risks that require additional assessment and can lead to problems during the implementation of the migration process. According to Oracle experts, typical problems that arise when formulating a data migration strategy can be divided into several groups:

1. Risks of drawing up a technical specification

The technical specification for data migration includes a description of data sets, types and formats, as well as migration algorithms. When compiling technical documentation for the migration stage, little attention is usually paid to studying the documentation on the functionality and information support of the source system. When drawing up specifications, an unrepresentative or insufficient sample of data from the old system may be used, and therefore details may be missed. The focus is shifted towards technical rather than business requirements for the new system; this situation leads to incorrect data mapping and errors during migration.

2. Risks of testing

The risks of the testing phase are most often associated with a lack of time; this situation occurs not only during data migration, but can be typical for the entire project in general. The migration stage requires careful testing on large amounts of data; unit tests are often used, the results of which may not be sufficient in this case.

3. Risks of the process of obtaining and loading data

Typically, when migrating data, the step of validating the data received from the source system may be missed. This omission can lead to unexpected developments when loading information into a new system. This situation can arise when there were meta-information and content matching errors in the source system, and the migration algorithms were designed based on the content metadata.

4. Risks of placing data on the target system

Incorrect loading of data during migration can lead to the following negative consequences: incorrect operation of the functionality of the target system due to conflicts and errors caused by the loaded data. To solve problems, additional time is required to develop workarounds to solve problems by improving the functionality of the target system, cleaning or additional conversion of downloaded data. The result of the described failures and errors is a decrease in the efficiency of the new IS, dissatisfaction of system users and business losses.

5. Risks associated with the work of the project team

The data migration stage, like any other stage of an information system implementation project, requires a clear distribution of roles in the project team and the involvement of all interested parties. Lack of involvement of business users of the new system in the migration process leads to omissions in requirements gathering and increases the risk of errors. When developing data transfer algorithms, the involvement of experts with knowledge of the structure and purpose of data for migration is required. Thus, shifting responsibility to technicians working only with the target, new system is a common but irrational solution.

Formulating a strategy and assessing the risks of the migration phase are among the most important steps when implementing data migration. The results of the risk assessment and strategy selection activities are the input to the planning phase of the migration process.

1.2 Stages of the data migration process in IS implementation projects

The data migration process can be one of the stages of an IS implementation project, or it can be organized as a separate project. By the process of data migration in the framework of this work we mean design work that covers the full cycle of tasks associated with data migration: from planning data migration work to evaluating the results of the data migration stage.

In any case, the data migration process is divided into several interconnected sequential stages; this study will sequentially examine all steps of the migration process according to the Oracle and IBM methodology.

The life cycle of the migration process begins after the strategy is formed and the risks of the data migration stage are assessed. An outline of the migration process is presented in the process diagram.

The goal of any data migration process is to map the information, data types and formats of the old system with the data types and formats of the new system. When migrating data, the “Data Extraction” stage corresponds to the selection and unloading of data from the old system, and the “Data Loading” stage corresponds to the transfer of the received data from the old system and their loading into the new system. Below, the migration process will be discussed in more detail.

After completing the data migration planning stage, the stage of determining the requirements for the migrated data begins. This stage includes the development of customer requirements and their description in the relevant design documents. At the requirements gathering stage, the responsible role in the project team for the result of the stage is a business analyst or systems analyst. This stage of migration will be covered in more detail in the third chapter of this work. The output of the stage of determining data requirements for migration is a description of the structure and composition of the data for migration.

The stage of collecting data requirements for migration, as a rule, is very closely interconnected with the next stage - developing algorithms for transferring data from the source system to the target system. During the design phase, analysts create detailed specifications describing the data types of the source system and their relationships with the data types of the target system. Such specifications describe the structure of data for migration, its volume, source, and purpose. The specification is a source for setting tasks for the developer who will design and develop specialized software for data transfer. At the design stage, an analysis of the existing data architecture in the source system is carried out - analysis of “as is” and development of a data architecture in the target system - “to be”. When analyzing the existing data architecture, all limitations in the IT infrastructure are identified and taken into account, as well as their impact on the operation of the target system with migrated data. The output artifacts of data architecture analysis can be documents such as logical data models (ER diagrams, database models), dictionaries and reference books with a detailed description of each element and its attributes, descriptions of business rules for working with data, information about systems interacting with source system for information exchange and integration.

The results of requirements gathering and design are the basis for choosing a method and determining the data migration technology. Migration can be done offline or online, and the categorization of methods depends on whether applications are maintained during the migration process. The choice of migration method and means is determined by a combination of factors, including available system downtime, business dependence on partners, volume of data, physical location of data storage of the source system, information security policy of the source system and target system.

The analysis and planning stages described above can be combined into a general preparatory stage. The developed procedures and migration mechanisms regulate the stages of extracting, transferring and loading data into the new system, that is, all steps of the ETL process are carried out sequentially. After receiving the data necessary for migration, the phase of loading this data into the target system begins, before which it is necessary to highlight a separate stage - verification of the migrated content.

Checking the compliance of the downloaded data with the requirements can occur online - directly at the entrance to the target information system or offline - as an intermediate step in the migration process. Upon completion of loading data into the target system, an additional check is performed, often both systems are launched for parallel operation. Test activities for parallel work are planned when designing the rules and procedures of the migration process. As part of the migration process, the parallel operation of two systems can be considered as trial operation. The result of trial operation may be confirmation of the full functionality of the new system with migrated data. If massive errors are detected during parallel operation of the source and target systems, a decision may be made to re-migrate the data and reload the content. The agreed migration results are recorded in the log of the trial operation of the target system with loaded data, completed test cases, and questionnaires can be compiled to check whether the migrated data meets the requirements of the target system.

Testing activities are not limited to the parallel operation of the source system and the target system. Tests can be carried out on samples from the migrated data in order to early identify errors and correct them before the development of migration software begins. Eliminating errors earlier allows you to save budget and avoid repeated data downloads. Testing activities may include data audit activities during the migration process. Data audit allows you to monitor the status of data and avoid errors caused by changes in content that can be made by users already during migration work.

After agreeing on the migration results, the stage of post-migration work begins, including checking, cleaning and testing the performance of the target system, in general, after data migration. Cleaning can be done manually or using software. Data cleaning is performed to remove outdated information and meet the information requirements of the new system.

The methodology for data migration given above assumes that the most bottleneck in organizing this stage of the project is the stage of planning and working with the customer’s business requirements, that is, collecting requirements and design, so we will consider approaches to solving the problems of these stages in more detail in the following parts of the work. In addition to the stages of planning and developing business requirements, special attention should be paid to the stage of assessing the results of work at the data migration stage, since in accordance with the Deming cycle (PDCA), it is the implementation of work assessment activities that is a condition for the success of carrying out similar work in similar projects.

1.1. Features of data migration planning

Data migration planning is the first stage of the process life cycle and is carried out taking into account an understanding of the main risks of the process and the migration strategy. In addition to the migration strategy, input information may be a section of the technical specification or document about the entire project framework dedicated to data migration. At the planning stage, the framework of the data migration process is determined, and the goals of the data migration process that are achievable under the design constraints (data sources, top-level requirements) are established. To define the scope of the migration process, it is advisable to involve business users who have an understanding of how the system has worked with data in the past and how it should work with it in the future. Next, depending on the migration method, a deadline is set and the necessary resources are allocated within the given budget. When planning data migration, an important point is to identify the process participants on the customer's side, that is, those business users and technical specialists of the customer who are responsible for data management. At the output of the data migration process planning process, the following design artifacts can be generated:

Data Migration Framework Document;

Work plan for data migration indicating responsible members of the project team;

Communication plan during the migration stage.

The organization of the migration stage in IS implementation projects begins with the planning stage, where it is necessary to draw up a work plan, calculate the necessary resources and deadlines.

Work packages at the migration stage must correspond to the phases of the process life cycle; the approximate structure of the work schedule could be as follows:

Planning and defining the scope of data migration;

Business analysis and documentation of requirements;

Selection, configuration or design and development of specialized software;

Data transfer;

Validation of migrated data;

Trial operation;

Post-migration cleaning and testing activities;

Coordination of migration results, evaluation and closure of the implementation project stage.

The appointment of responsible members of the project team for the implementation of work packages of the data migration stage occurs at the planning stage after drawing up the work plan.

The selected project roles are matched with clusters - areas of responsibility defined in the MSF methodology. It is worth noting separately that by product management in the context of migration we will understand the management of the quality of migrated data and the performance of the target system after migration. Release management in terms of the migration process - performing iterations of the migration process, receiving and loading migration data.

In accordance with the MSF model, the following distribution of areas of responsibility between role clusters is assumed:

System Analyst - program management, customer satisfaction;

Development Manager - program management, product management, release management;

Developer - development of algorithms or specialized software for transferring data to the target System, specialized software (if necessary);

Tester - testing, release management.

In order to clearly demonstrate the participation of involved human resources in the activities of the data migration process, we will draw up a RACI matrix - given in Appendix 1 to the work (see Appendix 1 - RACI Matrix for data migration work).

It should be noted that the development manager (technical manager) is considered as the leader of the team involved in data migration, so he is responsible for carrying out the entire process as a whole. However, if data migration is carried out as part of a large-scale IS implementation project, where a manager for the entire project is appointed, then the technical manager of the migration stage will only be the executor in tasks related to determining deadlines and recruiting personnel. In this case, decisions on personnel, resources and deadlines are made by the project management collectively.

1.3 Features of developing business requirements for data migration

The stage of developing business requirements during data migration is key in preparing data transfer tools, and the quality of the migration process as a whole will depend on how well the requirements of business users were collected and analyzed.

Data migration requirements can be captured in various design artifacts, and guidance on developing such design artifacts will be provided in Chapter 3.

Various standards can serve as a methodological basis for organizing project work to identify and develop business requirements for data migration. For the purposes of this study, the theoretical basis for business analysis is the approach of Karl Wiegers.

First, let's consider using the tools recommended by Wiegers for modeling business requirements in the context of developing requirements for data migration.

Let's look at the application of Wiegers' approach to developing data migration requirements below. Requirements for data migration are identified in parallel with the development of functional requirements for the implemented system. In the process of developing functional requirements, input data sets for automated business processes are designated, thus, top-level business requirements for the composition of migrated data can be designated.

Wigers proposes the use of ER diagrams (entity-relationship diagram) to define the logical structure of a subject area. An entity-relationship diagram is used to build a domain data model. The logical data structure (conceptual data model - CDM) allows you to describe the subject area of ​​the system in terms of data objects (entities).

Using a data model in the form of an ER diagram, according to Wiegers, makes it possible to facilitate the process of identifying requirements for organizing the data structure in the designed system due to the clarity and relative simplicity of the model’s presentation. By working with an “as is” ER diagram, key business users will be able to determine a list of data objects that need to be transferred from the source system to the target target system.

In addition to entity-relationship diagrams, another modeling tool can be used in the process of identifying data migration requirements - a state transition diagram or state transition diagram. A state transition diagram provides an accurate, complete, and clear view of a finite-state mechanism. The state transition diagram is part of the modeling approach - UML (unified modeling language) and allows you to visualize the life cycles of data objects. The transition to each next stage of the life cycle is determined by a specific set of business rules. When developing such a model during the development of functional requirements for the information system being designed, it is necessary to conduct a comparative analysis of such a model with a similar model for the operating system. When upgrading an IS, some states of data objects can be changed or excluded, and the rules for changing states can be changed. Such cases must be taken into account when developing migration requirements in order to correctly determine in what state the data objects should be transferred to the receiving system. In addition, data migration requirements must take into account logical inconsistencies that may arise due to modernization of state transition rules. For example, the transition of a data object to the next state may depend on the completion or value of some entity attribute. When developing requirements for the receiving system, the attribute composition of entities can be changed in such a way that this attribute will be excluded. In this case, changing states for the migrated entity will not be possible. Accordingly, failures and errors will occur in the functionality of the receiving system. To avoid such errors, each such case of a change in the attribute composition must be identified, and the requirements for data migration must reflect a logical check or rule for processing such a situation. An example of a developed state diagram for one of the entities in the receiving system is given in Appendix 3 to the work (see Appendix 3 - Examples of life cycle diagrams for entities in the source system).

The above discussed the use of business analysis tools to identify high-level requirements for data migration. Now we will analyze in more detail the methodological basis for a detailed study of business needs: in particular, for developing requirements for the profile of migrated data.

When interacting with the customer on data profiling for requirements migration, the analyst needs to obtain information on the following issues:

1. What is the source of the data: an enterprise application, system, or sources outside the customer organization?

2. Will the migrated data be input to any business process?

3. What are the data requirements of the target system? Which processes in the new system will use this data? Does the data structure allow you to work with it correctly in the target IS (is it possible to correlate the structure of the migrated data and the target data model)? Are there fields and types in the data structure that cannot be filled in the target structure and vice versa?

4. What is the quality of the data? Does the current level of data quality meet the level required for the functioning of the target system? Are there any critical errors (for example, required fields not filled in, data type mismatches between the source and destination systems). Is there a need to develop procedures to improve data quality before the data migration process begins?

5. Does the meta-information of the migrated content allow us to develop migration algorithms? Is it possible to analyze the data structure based on metainformation or is it necessary to analyze the content itself intended for migration?

6. Assessing the criticality of errors when transferring data to the target system. Will this data be used in the user interface or is it sufficient to validate it at the database level?

7. Is the selected data related to historical data? Over some selected period (for example, a year or 5 years), have there been significant changes in the business process that could lead to changes in the structure or storage form of the selected data model element?

8. What are the business rules for working with the selected data?

9. Who is the owner of the data (content, documents, images) and who is responsible for the safety and quality of the selected data?

10. What limitations of the source system are reflected in the data structure, forms of storage and work with data? Will these restrictions apply when working with data on the target system?

11. Is there a need to support online data migration?

12. Will changes be made to the data being migrated during the migration process? Is there a need for an iterative process of uploading and loading data?

The identified business needs should be recorded and described in the form of a package of migration requirements. The relationship between business needs and functional requirements is observed when the functional requirements are traceable.

In addition to tools for modeling the domain and business processes, Wiegers pays considerable attention to such a concept when working with business requirements as “traceability” of requirements. Requirements traceability also means that it is possible to trace the relationship of the top-level requirements recorded in the project concept or technical specifications with more detailed requirements for subsystems and modules recorded in functional and technical specifications.

When developing requirements for data migration, traceability of requirements must also be respected. Top-level requirements, reflecting the need and strategy for migration, are recorded in the document on the scope of the IS implementation project or terms of reference. The business need to migrate data from an operating system to a designed system is traced to a package of migration requirements: the composition of the migrated data, the method of migration, specific rules and migration scenarios, which are described in the terms of reference for the implementation of the IS or the technical specification of the migration process.

A tool for tracking the traceability of requirements according to Wiegers can be a requirements traceability matrix. Using the traceability matrix, you can visualize the relationships between the requirements for the software being developed.

Appendix 2 to the work (Appendix 2 - Example of tracing requirements for data migration) presents an example - a fragment of the compiled matrix of traceability of requirements for data migration.

The following fragment of the matrix represents the different types of user needs:

· Non-functional requirement for the migration method;

· Requirements directly for the functions of the data migration utility;

· Business requirement for the functionality of the designed system, associated with the functional requirements for the migration utility.

There is no test case in the first line of the example, since this user need is traced to the data migration work plan, and not to a specific element of the migration utility. Wiegers allows such exceptions when compiling requirements traceability matrices. High-level requirements are input information when planning the work of the migration stage. When drawing up a project communications plan, the system analyst and technical manager should select not only technical specialists, but also business users, both the source system and the target system, to conduct the survey.

The other user needs and functional requirements listed in the traceability matrix are general examples of business requirements developed as part of a specific migration project. The characteristics of this project, as well as a detailed analysis of the data migration project experience, will be carried out in Chapter 2.

1.4 Setting the task of developing a methodology for data migration

The expected result of the work should be a methodology for organizing and conducting data migration. Such a methodology should consist of a sequence of interrelated steps, the implementation of which will allow you to plan, carry out and test the results of data migration.

Each part of the methodology should contain a description of the sequence of activities for a separate stage of migration, namely: planning, developing requirements, conducting migration (uploading and loading), testing the results. Thus, the methodology proposed in the work should contain recommendations for completing the tasks of each of the stages presented in the data migration life cycle diagram (Fig. 1).

The first part of the methodology should provide and describe ways to develop general requirements for data migration. In particular, this part of the methodology should describe how to deal with the potential risks of the data migration phase. This part of the methodology also describes how to interact with key users to identify migration strategy requirements and other high-level business requirements.

The second part of the methodology should be devoted to tools for planning and organizing the migration process. Based on the analysis of project experience, the most bottlenecks of the process are identified, for which the methodology offers solutions.

The next part of the methodology should cover the stage of surveying and developing requirements for data migration. This part of the methodology should contain instructions for conducting survey activities (meetings, other types of communications), as well as for documenting the results of the stage - documentation templates and examples of completion.

The final part of the migration methodology should contain recommendations for conducting migration and testing during the migration stage. Testing at the migration stage is divided into testing the migrated software, as well as testing the operation of the receiving system after placing the migrated data in it. This part of the methodology should contain a description of ways to control the migration process, identify possible defects and their causes.

2. Analysis of project experience in data migration

2.1 Brief description of the IS implementation project

The project experience that will be considered in the work is several data migration projects carried out as part of a portfolio of projects for implementing a departmental information system. As part of the analysis of project experience, a brief description of the project will be given and some features of the customer organization will be considered.

As part of this work, the project experience of implementing an information system in a state organization that is part of the system of executive authorities at the regional level will be analyzed. The main project was divided into several subprojects, each of which involved the automation of a group of customer business processes. Accordingly, each project involved the transfer of some part of the customer’s business units to work with the new system. Each of the projects provided for the migration of data accumulated in source systems with which the corresponding business unit works. The main activity of an organization of this type is related to the provision of public services to the population. As part of this analysis, we will consider the stage of data migration in projects for the implementation of information systems that automate exactly this type of activity of the customer organization. In the following parts of the work, to conduct a more in-depth business analysis and demonstrate the practical results of the work, a narrower part of the subject area will be described - at the level of providing a specific public service.

In order to describe in more detail the profile of the customer organization, it is necessary to introduce several concepts. “Applicant” or “public user of IP” is a person who applies for the provision of a public service. “Data object” will be one of the entities or artifacts of the subject area of ​​activity of the customer organization. We will call the “receiver system” the implemented corporate information system belonging to the ECM class. The customer operates two separate information systems, not integrated with each other, to store content and automate the main business process. These systems will be data sources - “source systems” for the process of data migration into the implemented system. The first of the two existing systems is developed on the IBM Lotus Notes platform and is intended to automate a business process in one of the customer's functional departments. The second system was developed by the customer’s employees independently and is intended to automate business processes in other business units and store documents in electronic form, collect and store reporting information on the organization’s business processes.

An organizational feature of the project is the use of the following model for the distribution of responsibility for the results of the project between the following main participants on the part of the Customer:

The functional customer is responsible for the quality result of the project;

The non-functional customer is responsible for the financial result of the project.

Organization - the functional customer is the owner of the automated business processes. The employees of this organization have knowledge of the specifics of the subject area. Employees of the functional customer will be the end users of the implemented information system.

Employees of a non-functional customer may not have knowledge and in-depth expertise in a narrow subject area of ​​activity. Their responsibilities include monitoring the implementation of the IS implementation project from an organizational point of view, organizing post-project technical support for the implemented system on the customer’s side. It is also worth noting that the area of ​​responsibility of the non-functional customer may include the supervisory function over the operational activities of the organization - the functional customer.

The specific distribution of responsibility led to a conflict of interests between the functional and non-functional customer during the project. This organizational structure is a source of risk when conducting data migration.

2.2 Identified problems when developing a work and communication plan at the data migration stage

When developing a work plan for the data migration stage, project management relied on the MSF project team model, and also used the RACI matrix as a tool for distributing responsibilities. This approach was analyzed above in Chapter 1.

At the same step, a strategy for data migration within the project was formulated. The strategy was chosen based on the high-level requirements of the functional customer, who refused to suspend the business process to carry out migration work. Thus, the “big bang” strategy was chosen, a one-time migration of data from two sources to the receiving system.

The initial hierarchical structure of the work of the data migration stage looked as shown.

According to the initial project plan, work directly on downloading data from source systems and loading data into the receiving system should have taken 4 calendar days. 3 days were allocated for receiving data and loading files into the implemented system, and 1 day was reserved to reduce the risk of disruption of data loading work. However, the work was not completed on time: during the course of the project, several risks emerged that were identified during the analysis of the theory in Chapter 1.

The failure to meet deadlines during the first data migration project occurred due to incorrect planning of communication with stakeholders on the customer’s side. As mentioned above, the customer organization has a complex organizational structure. When planning the survey work and selecting users to collect business requirements, it was necessary to involve non-functional customer employees in the customer survey. It is these customer employees who have the knowledge to supervise the activities of the functional customer. Thus, the employees of the non-functional customer had knowledge of the requirements for the time horizon for downloading data from source systems. Requirements for the unloading time horizon could have been developed based on the analysis of the regulatory framework and agreed with the employees of the non-functional customer, but these works were not planned and carried out. Thus, we can talk about two problems of the migration stage at once:

1) Incorrect planning of the communication plan with the customer’s employees;

2) Incorrect planning of work in the WBS;

3) Incomplete business requirements for data migration, as a consequence of the first problem. In particular, downloads from source systems were received only for the last 2 years of operation of the functional customer. At the same time, the data upload had to cover a much longer period.

The executive authorities of the regions of the Russian Federation are obliged to store historical data (archives) for the periods established by the Federal Law. Thus, the requirements for the composition of data downloads from source systems are determined in accordance with the data storage regulations, which, in turn, refers to federal law. In addition to the requirements for the composition of data upload from the source system, it is the requirements for the time horizon of data storage that largely determine the required volume of data storage in the receiving system.

A narrower subject area in which the functional customer operates is the provision of public services to construction and design organizations in the field of environmental supervision.

In accordance with the Federal Law, the customer organization is obliged to store:

Project documentation for capital construction for 20 years;

Technological and design documentation for 20 years.

The contracting organization commissioned the existing information systems in 2005 and 2000, respectively. Thus, the upload range for transferring data to the receiving system includes all documents stored in source systems, as well as all objects referenced by such documents.

Similar to the example described above, an analysis of legislative norms for other public sector organizations or commercial organizations can be carried out if the internal regulations of such organizations are drawn up on the basis of federal laws.

In addition to the time horizon for downloading from source systems, federal law also determines the requirements for the composition of downloading from source systems: a list of document names, the need to transfer their versions and additional materials - appendices, additions to each document - to the new system.

Federal legislation defines the types and composition of documentation that the customer organization must keep. Accordingly, the norms of the Federal Law must be analyzed taking into account the characteristics of the activities of a particular customer and the composition of the documentation that is required to provide a certain type of public services. Artifacts of the customer's subject area must be compared with legal norms and data model objects of the source system being operated. The results of such an analysis will make it possible to determine the composition of the download from the source system and determine the requirements for the receiver, as well as for the software for data transfer.

Returning to the example of an organization that is part of the regional Construction Department, the following objects can be defined as part of the data for transfer from the source system to the destination system:

Electronic images of sets of technological, design and construction documentation for capital construction projects, as well as data objects storing information about relevant documents;

Electronic images of technological and organizational-administrative documentation of construction waste processing and disposal facilities, as well as data objects storing information about relevant documents;

Electronic images of permitting documentation, as well as the history of the issuance of duplicates of such documents, as well as data objects storing information about the relevant documents;

History of work with all types of documentation listed above, stored in the source system.

Thus, the risk of incorrect planning was triggered and, as a result, the risk of developing incomplete requirements for data downloading was partially triggered. The possibility of triggering the risk of incomplete requirements for unloading from source systems will be considered in more detail in the next part of the work.

2.3 Identified problems when developing requirements for unloading from the source system

As part of the planning of data migration work, the project did not include work on checking the data included in the download from source systems. It was assumed that the entire volume of data at the input to the receiving system would be validated according to the data model of the system being implemented. An analysis of the quality of algorithms for logical checks and data validation will be given in the next section of the work. This part of the work analyzes preparatory operations for obtaining data from source systems and analyzing the state of data in these systems. In order to examine the state of the data and identify potential problems with data placement in the receiving system, a test download from the source systems is formed. The goal of this step is to successfully place the test upload on the target system, identify any potential problems with data loading, and thus reduce the risk of a failed migration due to errors in loading invalid data into the target system.

...

Similar documents

    Design of the information system "Accounting for the work of the clinic": analysis of software products, description of business process diagrams, description of IDEF0, DFD, IDEF3 data flow diagrams and documentation of processes using AllFusion Process Modeler r7.3.

    course work, added 08/20/2012

    Database management system for tasks and their constituent enterprise processes. Information system requirements. Composition of queries to the database. Connections and relationships between information objects. Algorithms of operation and architecture of the information system.

    course work, added 02/02/2014

    Detailing of system functions and requirements for the information system. Analysis of user categories. Stages of implementation of an automated information system in an enterprise. Description of database tables. Data protection from unauthorized access.

    thesis, added 07/22/2015

    Characteristics of objects of automation of information systems. Documentation requirements. Procedure for control and acceptance of the system. Description of data flows and business processes. The structure of the information system, the composition of functional and supporting subsystems.

    course work, added 09/18/2013

    Selecting a design methodology and developing an information system "Payroll calculation" for the enterprise OJSC RTP "Avtoremontnik". Architectural design of the information system database and development of its interface. Testing the software module.

    thesis, added 05/25/2014

    Development of a restaurant information system, defining its boundaries for the implementation of a database. List of requests, reports and operations for entering information in the "Restaurant" information system. Database design, selection of means for its implementation.

    course work, added 04/27/2011

    course work, added 07/10/2014

    Development of software requirements for the military registration department, methodology for designing an information system. Implementation and certification of the information system, interaction of the application with data sources, its economic efficiency.

    thesis, added 11/30/2010

    The purpose of creating the “Electronic Journal” information system is to automate control of the educational process. Construction of logical and relational data models. Development of a client-server application for working with a database; software implementation.

    thesis, added 01/19/2017

    Analysis of input information and processes, level of automation at the enterprise. Identification of the object and task of automation. Development of a concept for constructing an information model of an information system. Development of database structure and client application.


how to implement free software

Migrating to free software is similar to migrating to a newer operating system. As an example, we can mention the appearance of the first versions of Windows in our country. An equally striking example is the migration to Windows NT, the ideology of which was sharply different from Windows 9x. One more example can be given - each new version of the MS Office package differs from the previous one not only in differences in the interface, but also in file formats. So, the task of migration is relevant even in the case when software from a single manufacturer is used.

This article offers a general description of the migration methodology, highlighting the essential points. The general principle of migration is to carry out the process thoughtfully and carefully through gradual changes. Migration consists of several logically integral sections and stages.

creation of a working group (who does it)

When carrying out migration, it is necessary to provide for the resolution of issues of both a technical and non-technical nature.

It is important to consider legal problems that have recently become extremely relevant for some CIS countries, in particular Ukraine. In some cases, it makes sense to discuss the administrative tasks of the employer-user-administrator relationship. Historically, these relationships have not been sufficiently regulated by internal corporate rules and regulations.

In the process of preparing the material, conversations were held with professionals in the field of security, computer law and system administrators. The vast majority stated the need to document the rules for how users work with the organization’s information system.

Proper planning also includes addressing financial issues. It is necessary to assess the costs of legalizing an existing information system, the cost of implementing a new one, and estimate the cost of ownership in the foreseeable future.

Any project, including a migration project, may face an underestimation of the human factor. Naturally, the application of human resource management methods will be required. Most system administrators and IT managers known to the author are not specialists in the fields of personnel management or finance. Such a complex task cannot be solved by the IT department alone.

The first task on the path to migration to a new target system is to create a migration planning working group. This group is responsible for conducting migration and, therefore, should have fairly broad powers.

The goal of the project is to build an economically viable IT infrastructure. A good candidate for the position of work group leader would be a top manager of an enterprise or organization, for example, a financial director. Naturally, this group includes the head of the IT department, who has a vision of the entire IT infrastructure, both at the moment and in the future. The group must include an experienced system administrator, preferably with experience in operating free software. The size of the group cannot be estimated - in some cases, other employees of the company are involved if necessary. It is possible to attract a third-party consultant with experience, or a company specializing in such solutions. The result of the work of this group is a detailed migration plan with an estimate of the cost of migration. Or – an explanation of the ineffectiveness of migration to free solutions for the organization.

research (what is)

The first stage should be an audit - a description of the existing (inherited) system.

It is no secret that, with years of “emergency” informatization applied without taking into account the return on funds spent on software, the result, as a rule, was not only economically, but also technologically unbalanced systems. An enterprise software audit is a revision of installed programs, determining their compliance with business requirements.

The results of the audit process are:

Description of the technical characteristics of the installed software;

List of identified risks associated with the use of unlicensed software;

Calculation of the cost of purchasing licenses for installed software;

Calculating the cost of removing unlicensed and installing licensed software;

Determining the feasibility of further use of the software;

List of identified risks associated with the use of the software;

software inventory

Some studies show that most organizational leaders pay more attention to the functionality of the software they use and much less to respecting the rights to the products they use.

Unfortunately, most organizations lack an IT culture. Sometimes even representatives of the IT service do not really know what and where is installed on employees’ computers, and ordinary employees independently decide and install software products obtained from dubious sources.

Software inventory allows you to identify unlicensed software in an organization. It should be emphasized that this event is always beneficial. The inventory result can be used to estimate the costs of software legalization using both free and proprietary software.

Organizational managers are often interested in clear financial planning of the costs of using and developing software. The interest of top managers in such detail is quite understandable - top management of companies is interested in turning software into a company asset that is accounted for, controlled and developed in the same way as other types of assets.

Software audit is a procedure that usually takes quite a lot of time, requiring highly qualified personnel and knowledge of various specific information at the stage of information analysis. A recommendation may be to contact a company specializing in these services. However, it is quite possible to conduct an audit by the IT department.

In some organizations, it is inconvenient (often impossible) to conduct a large-scale software inventory at once. Reasons may be the size of the organization or security policies. It is necessary to find a compromise between the efficiency of inventory and the factors that complicate such a process.

There are two audit options to consider. The first option is a full audit, which involves examining all computing facilities, the local network and peripherals. The advantage of this method is high accuracy, the disadvantage is high cost, high time consumption and inconvenience for users. Additional advantages of this method are the ability to identify software installed by users themselves and study user requirements for software at their workplaces using specially prepared questionnaires. The second option is an audit of some typical computing facilities, local networks and peripherals. In this case, the choice of audit objects is dictated, as a rule, by the functional responsibilities of users. This approximation method significantly reduces the cost of inventory, but has a large error.

Small organizations can conduct a manual audit and enter information about computers and servers, as well as the software installed on them, into a simple spreadsheet. In this case, you should indicate the presence or absence of the necessary licenses, certificates of authenticity and copyright agreements for each of the software products found.

For medium and large ones, you can recommend the use of specialized software or invite a third-party organization specializing in such services. In the process of creating the document, work was carried out to review tools for automatic inventory of software and hardware (the GASP, PC inventory, MSIA programs are known). A recommendation would be eXponent Navigator (http://www.e-x.ru/pages/expnav.html), produced by eXponent.

Exponent Navigator

The product is designed for auditing equipment and software over a network. Information about computers includes data on components (processor, motherboard, hard drives, memory modules, video card, network cards, printers and other devices), operating system, drivers and software.

According to the creators of the program, after organizing the automatic collection of information about computers, it is possible to view and organize this information, prepare printed reports and web publications, and upload data to Microsoft Excel, XML and other formats. Possibilities:

Automatic diagnostics of computer configurations;

Automatic collection of information about computers over the network;

Determination of installed equipment;

Determination of installed programs;

Determination of file characteristics;

Advanced capabilities for sorting, searching and selecting data;

Preparation of printed reports;

Export data to MS Excel;

Automatic generation of web publications.

In the free version of the program there is a limitation - accounting for up to 25 computers; The license cost is $1 per 1 computer.

we design (what we want)

The processed results of the study of the existing system are the basis for modeling a new, target system. This question is extremely important and complex. The consideration of this issue is also complicated by the historical lack of familiarity with free software, in particular Linux, by most IT managers and system administrators.

There is a lot of literature, including Russian, about Linux, which describes the advantages of this platform from a technological point of view. However, all these advantages are important together with the main issue - the existence of a wide range of application software in different directions. There is a long-standing and widespread myth that there is a limited amount of application software for corporate use, including office automation, for the Linux platform. The vast majority of these myths are created and fueled by the creators and sellers of proprietary software and have little to do with reality. Debunking this myth is not the main purpose of this book. However, it is worth noting that, for example, the breadth of application software is an absolute phantom, taking into account historically established standards for document exchange. So, for example, the de facto standard office editor is Microsoft Office, the raster graphics editor is Adobe Photosop, and Corel Draw is extremely common as vector graphics.

Another issue is the often excessive functionality of proprietary products, dictated not by the needs of the market, but by the opinion of marketers. And for this excess functionality the user pays quite a lot of money: the cost of a license for the right to use programs, increasing complexity of operation and increased hardware requirements.

Recently, the situation has changed - a lot of information appears on the application of Linux. Probably the best material will be this document :-), in which it is planned to cover many administrative and technical issues.

However, at the moment the document is just being created and information can be found in different sources. It is impossible to ignore the materials of Valery V. Kachurov, Artem Nesov “Analogs of Windows programs in Linux - correspondence table.” (http://linuxshop.ru/linuxbegin/win-lin-soft/). This resource contains a lot of valuable information. Unfortunately, the authors seem to have abandoned this work. This section of the site is not regularly accessible, but a copy of the table can be found at http://www.blif.net/modules.php?name=LinWin. I can recommend the resources of the Open Source Applications Foundation
(http://www.osafoundation.org/), especially http://www.osafoundation.org/desktop-linux-overview.pdf.

The result is:

Creation of prototype workstations;

Calculation of the cost of licenses for the proposed software;

User training;

Creation of an approximate implementation calendar plan;

List of risks when implementing free software;

Support for free solutions;

Calculation of the economic efficiency of the new system for up to five years.

pilot project (test in action)

Due to the large number of factors in a legacy system and the large number of users potentially impacted by migration, it is recommended that migration be experimented on a small scale - a pilot project. This stage is necessary to test and adjust the migration plans and prototype of the new system. It is the pilot project that is the basis for making a decision to implement a new information system based on open source software. The second value of the pilot project is the opportunity to inform users about the new system and receive feedback from users. The third argument in favor of a pilot project will be the opportunity to experimentally determine the cost of the project more accurately.

When choosing a test object, it is necessary to find the golden mean. First, the selected object must provide reliable data for the assessment. Secondly, the pilot project should not have a critical impact on business.

At this stage, system administrators and end users are also trained: a prototype of teaching materials, documentation, and Internet resources are provided. It is recommended that users who participate in the pilot project be “unloaded” and given the opportunity to use part of their time to master the new system.

The experimental stage is especially in demand:

If the ability to migrate users from the legacy system to the new system has not been proven;

If there is user skepticism that will be able to slow down the migration process;

The organization lacks a corporate culture (which, unfortunately, is common in the CIS);

If there are limited resources for large scale migration;

The organization is large and the pilot project does not involve many users;

If legacy proprietary systems have rapidly evolved both in - technical terms and in cost reduction;

The economic effect of migration is not fully understood.

To succeed, a pilot project:

Should not relate to a critical area of ​​the business;

Must be sufficiently important to the business;

Should not demand excessive resources from people who are already limited in time;

Must have a significant support team;

Feedback from users (Help Desk systems) must be provided;

It should not be in an otherwise limited area (for example, a period important for business).

Depending on the size of the organization, one or more experimental stages are possible to more accurately determine the lines and cost of migration to free. It is important to evaluate its results and determine whether the vision and goals are practical or whether they should be changed or perhaps even abandoned. Data from these pilot projects should be used to adjust plans and calculate final costs. During the experiment, it is necessary to clarify the following questions:

Description of the workstation prototype;

Description of user specific settings:

Average deployment costs for station types;

Transferring data from the legacy system to the new one;

User training;

Calculating the cost of software implementation;

Support for free solutions.

planning (what and how)

1. Create a migration plan. The migration plan should answer the questions:

Description of the phases of system construction;

Identification of support needs;

Description of the migration completion.

In fact, the final choice is made on how the migration will take place. The cost estimate for migration is approved.

2. Description of the phases of building the system. An assessment must be made of the phases of system construction that best support the priorities of the user needs.

The plan should answer the following questions:

To what extent and in what stages should the system be installed and deployed to best meet user needs?

What is required for each phase of migration to a new system from the organization's clients and system users?

What will be the impact and risk of using the system at each incremental stage?

The stages of system deployment should be clearly outlined in the migration plan; clients, developers, and users should be aware. Risk assessments must be completed before finalizing the system build plan. It is necessary to ensure that the planning estimates are reasonable, the approach is well designed in accordance with the organization's priorities, and the potential impact on clients and users is acceptable.

3. Determine support needs. The optimal level of support must be provided to help users use the new system. In addition, users often require support to help them understand the general capabilities of the target system.

Issues that need to be addressed include:

What kind of training and operational assistance will users require?

What is the overall level of migration support that users will require to ensure a successful migration?

How to achieve user acceptance of the target system and avoid user resistance to the implementation of a new system?

How will inevitable changes in system features and services be communicated to customers and users?

Is it possible for a free solution to be supported by an organization's IT department or is outsourcing the best option?

The migration plan should focus on addressing these issues, planning to support users in the following areas:

Trouble reporting system;

Technical assistance services for new systems;

Technical assistance to users migrating to new systems;

User manuals for transition and subsequent periods;

Training for users to learn and adapt to the new system, to perform the same types of tasks;

Ability to test the use of new systems;

Demonstrations of the use of new systems to show existing users of legacy systems how the new system works and how they can perform comparable tasks;

Overcoming current operational problems.

During the user education phase, pay special attention to those who are adherents of the old system and/or opponents of new systems.

4. Description of the completion of the migration. At the end of each new deployment and training phase, developers and implementers must ensure that users migrate to the new target system as smoothly as possible. The migration plan should make provisions to expedite the migration effort and to remove the legacy system as soon as possible.

Consideration must be given to the additional effort that may be needed for late adopters and other users who experience unexpected problems. Another aspect of this activity is assessing the time and cost to complete the migration of all users to the new system and to remove older systems built on unlicensed proprietary software.

Organizational leadership, developers and implementers should consider incorporating several approaches to help migrate users of legacy systems:

Communicate to each user group how and when they should migrate their tasks to new systems, and how the workload will change during the migration period from legacy systems;

Establish incentives to fully migrate to a new free system and eliminate dependencies on legacy systems;

Provide assistance (software and additional personnel) to convert legacy data to the new system; - the procedure for decommissioning legacy systems;

Archiving data from legacy systems and storing them.

migration (we do)

All that remains at the last stage is to work according to the plan.

Actively manage and control migration processes:

Establish measurement criteria and track migration stages and migration resource costs;

Conduct periodic reviews of the situation and communicate them, in accordance with authority and organizational policy, to interested people (management, project managers and sponsors);

Establish a tracking system to manage process progress, issues, solutions, and other business issues that relate to migration planning and execution.

Vadim Mashkov, UA-FOSS, [email protected]

The problems of database version control and migrations between versions have been raised more than once both on Habré (, , etc.) and on the Internet (mainly in English).

In the first section of this article, I discuss the main problems that arise in programming teams when making any changes to the database structure. In the second section, I tried to highlight the main general approaches to how changes to the database structure can be stored and maintained during development.

Terminology

Database - a collection of all database objects (tables, procedures, triggers, etc.), static data (immutable data stored in lookup tables) and user data (which changes while working with the application).

Database structure - a collection of all database objects and static data. User data is not included in the concept of database structure.

Database version - a certain state of the database structure. Typically a version has a number associated with the version number of the application.

Migration , in this context, is an update of the database structure from one version to another (usually a newer one).

In this sense, the term migration seems to be used in many sources (especially the migrations from the Active Record gem included in Ruby on Rails). However, there is an ambiguity in using this term: a person who does not know the context is more likely to think that we are talking about transferring a database from one DBMS to another (MySQL => Oracle), or even about migrating processes/data between cluster nodes. Therefore, I suggest, in cases where the context is not obvious, to use a more precise term: versioned migration databases.

Why is this necessary?

Developers who have already encountered the problem of database and application versions being out of sync can skip this section. Here I will remind you why you need to maintain parity between application and database versions and what a common problem this creates.

The database version must match the application version

So, imagine the following situation: a team of several programmers is developing an application that actively uses a database. From time to time, an application is released into production - for example, it is a website that is deployed to a web server.
Any programmer in the process of writing application code may need to change the structure of the database, as well as the data itself that is stored in it. Let me give you a simple example: let’s say there is a non-nullable string field in one of the tables. This field does not always contain data, in which case an empty string is stored there. At some point, you decided that storing empty strings is semantically incorrect in some cases (see, ), and that it is correct to store NULLs. In order to implement this, you will need the following steps:

1. Change the field type to nullable:

ALTER myTable CHANGE COLUMN myField myField VARCHAR (255) NULL DEFAULT NULL ;

2. Since this table in the production database probably already has empty rows, you make a volitional decision and interpret them as a lack of information. Therefore, you will need to replace them with NULL:
UPDATE myTable SET myField = NULL WHERE myField = "" ;

3. Change the application code so that when receiving data stored in this field from the database, it responds adequately to NULLs. Now you also need to write NULLs to this field instead of empty strings.

From point 3 you can see that the application and the database are inseparable parts of one whole. This means that when a new version of the application is delivered to production, it is necessary to update the database version, otherwise the application simply will not be able to work correctly. In this example, if only the application is updated to the new version, then at some point a NULL will be inserted into a non-nullable field, and this is an obvious error.

Thus, updating the application version requires correct database version migration.

Is it that simple?

Once you realize that version parity between the database and the application is necessary, you need to make sure that database migrations to the correct version will always be performed correctly. But what is the problem here? After all, at first glance, there is nothing complicated here!

Here again we turn to a living example. Let's say that programmers, during the development process, record their changes to the structure and data of the database in a separate file in the form of SQL queries (both DDL and DML queries). And with each deployment of the latest version of the application, you simultaneously update the database to the latest version, running queries from that same SQL file... But wait, from which version are you updating the database to the latest version? "From the past"? But do you remember so well what exactly the previous version was (it was released 2 months ago)? If not, how are you going to update it? After all, without accurate information about the state of the structure and data, it is impossible to perform a correct migration: if you unintentionally execute queries that have already been executed, this can lead to data loss or violation of its integrity.
A simple example is replacing passwords with their MD5 sums. If you run such a request again, the data can only be restored from a backup. And in general, any UPDATEs, DELETEs, and even INSERTs performed repeatedly can lead to extremely undesirable consequences. Not to mention untimely TRUNCATEs and DROPs (although such cases are much less likely).
By the way, from this point of view, underfulfillment is no less dangerous for the performance of the application than overfulfillment.

Thus, we can conclude that during the version migration process, all requests must be executed just one time and then, in the correct sequence. Consistency is important because some changes may depend on others (as in the nullable field example).

General principles of version migration

In the previous section, we highlighted important criteria that show what is required from the version migration process. This:
  • one-time execution of each change (SQL query);
  • strictly predetermined order of changes.
Now let’s highlight more practical criteria in order to understand what to require from the process of creating and storing migrations. In my opinion, for most development teams it will be important to:
  • so that any version of the database can be updated to any (usually the latest) version;
  • so that a set of SQL queries that implement migration between any two versions can be obtained as quickly and easily as possible;
  • so that you can always create a database from scratch with the structure of the latest version. This is very useful both during development and testing, and when deploying a new production server;
  • so that, in the case of working on different branches, during their subsequent merging, manual editing of database files is minimized;
  • so that rolling back the database to an earlier version was as easy as updating to a newer one.

Reason for migration

As it turns out, most approaches have a common principle: they need base (baseline) - some reference state of the database from which you can build. This concept is described quite well in the article "Versioning Databases - The Baseline" by Scott Allen.

Simply put, a foundation is a dump of the database structure for a version that is considered the base version. Having in hand base, you can always create a database from scratch later. After applying all the migrations created during the development process to this database, we will obtain a database with the structure of the latest version.

Incremental change method

This method is well described in the article “Versioning Databases – Change Scripts” by the same Scott Allen. A similar approach is also described in the article “Managing SQL scripts and continuous integration” by Michael Balon.

File structure

An example of what a folder with migration files might look like in this case:
Database
|- Baseline.sql
|- 0001.03 .01.sql
|- 0002.03 .01.sql
|- 0003.03 .01.sql
|- 0004.03 .02.sql
|- 0005.03 .02.sql
|- 0006.03 .02.sql
"- 0007.03 .02.sql

In this example, the folder stores all the files created during the development of the version 03 . However, the folder can be common to all versions of the application.

In any case, the very first file that will appear in such a folder is base(Baseline.sql). After this, any change in the database is submitted to the repository in the form of a new migration file with a name like [file number].[version].[subversion].sql .

In fact, in this example, the file name contains the full version number of the database. That is, after executing a migration file named 0006 .03.02.sql database will be updated from the state corresponding to version 03.02. 0005 , up to version 03.02. 0006 .

Storing version history

The next step is to add a special table to the database, which will store the history of all changes in the database.
CREATE TABLE MigrationHistory
ID INT ,
MajorVersion VARCHAR (2),
MinorVersion VARCHAR(2),
FileNumber VARCHAR(4),
Comment VARCHAR (255),
DateApplied DATETIME

PRIMARY KEY (Id)
)



This is just an example of what the table might look like. If necessary, it can be both simplified and supplemented.

In the Baseline.sql file, you will need to add the first record to this table:

INSERT INTO
MigrationHistory (MajorVersion, MinorVersion, FileNumber, Comment, DateApplied)
VALUES ("03" , "01" , "0000" , "Baseline" , NOW())


After each migration file is completed, a record with all the migration data will be entered into this table.
The current database version can be obtained from the record with the maximum date.

Automatically perform migrations

The finishing touch to this approach is a program/script that will update the database from the current version to the latest.

Performing database migration automatically is quite simple, because... The number of the last completed migration can be obtained from the MigrationHistory table, and after that all that remains is to apply all files with higher numbers. You can sort files by number, so there will be no problems with the order in which migrations are performed.

Such a script is also responsible for adding records of completed migrations to the MigrationHistory table.

As an additional convenience, such a script can be able to create the current version of the database from scratch, first rolling onto the database base and then performing the standard migration operation to the latest version.

Pros, cons, conclusions

Quick and easy migration to the latest version;
Version numbering mechanism. The current version number is stored directly in the database;
For maximum convenience, migration automation tools are needed;
It is inconvenient to add comments to the database structure. If you add them to Baseline.sql, then they will disappear in the next version, because the base will be generated from scratch again, as a dump of a new version of the structure. In addition, such comments will quickly become outdated;
Problems arise during parallel development in several branches of the repository. Since the numbering of migration files is sequential, files with different DDL/DML queries may appear under the same numbers in different branches. As a result, when merging branches, you will have to either manually edit the files and their sequence, or in a new, “merged” branch, start with a new Baseline.sql that takes into account changes from both branches.

This method is quite widespread in various forms. In addition, it can be easily simplified and modified to suit the needs of the project.
On the Internet you can find ready-made scripts for incremental migrations and embed them into your project.

Idempotent change method

This method is described in the article “Bulletproof Sql Change Scripts Using INFORMATION_SCHEMA Views” by Phil Hack. A similar approach is also described in the answer to this question on StackOverflow.

Idempotency refers to the property of an object to remain unchanged when another attempt is made to change it.
I remember a funny topic scene of friends":)

The main idea of ​​this approach is to write migration files in such a way that they can be executed more than once on the database. The first time you try to execute any of the SQL commands, the changes will be applied; Nothing will happen on all subsequent attempts.

The easiest way to understand this idea is with an example. Let's say you need to add a new table to the database. If you want to ensure that the query does not throw an error if it already exists, MySQL has a short syntax for this purpose:

CREATE TABLE IF NOT EXISTS myTable
id INT (10) NOT NULL ,

PRIMARY KEY (id)
);

Thanks to the IF NOT EXISTS keyword, MySQL will only try to create a table if a table with that name does not already exist. However, this syntax is not available in all DBMSs; Moreover, even in MySQL it cannot be used for all commands. Therefore, let's consider a more universal method:
IF NOT EXISTS
SELECT *
FROM information_schema.tables
WHERE table_name = "myTable"
AND table_schema = "myDb"
THEN
CREATE TABLE myTable
id INT (10) NOT NULL ,
myField VARCHAR (255) NULL ,
PRIMARY KEY (id)
);
END IF ;

In the last example, the conditional expression parameter is a query that checks whether a table myTable exists in a database named myDb. And only if the table is missing will it actually be created. Thus, the above query is idempotent.

It's worth noting that for some reason MySQL does not allow running DDL queries inside conditional statements. But this prohibition is easy to bypass - just include all such requests in the body of the stored procedure:

DELIMITER $$

CREATE PROCEDURE sp_tmp() BEGIN

IF NOT EXISTS
--
-- Condition.
--
THEN
--
-- A query that changes the structure of the database.
--
END IF ;

END ;
$$

CALL sp_tmp();

DROP PROCEDURE sp_tmp;

What kind of bird is information_schema?

Complete information about the database structure can be obtained from special system tables located in the database named information_schema. This database and its tables are part of the SQL-92 standard, so this method can be used on any of the modern DBMSs. The previous example uses the information_schema.tables table, which stores information about all the tables. In a similar way, you can check the existence and metadata of table fields, stored procedures, triggers, schemas, and, in fact, any other objects in the database structure.

A complete list of tables with detailed information about their purpose can be found in the text of the standard. A short list can be seen in the above-mentioned article by Phil Hack. But the easiest way, of course, is to simply open this database on any working database server and see how it works.

Usage example

So, you know how to create idempotent SQL queries. Now let's look at how this approach can be used in practice.

An example of what a folder with sql files might look like in this case:

Database
|- 3.01
| |- Baseline.sql
| "-Changes.sql
"- 3.02
|- Baseline.sql
"-Changes.sql

This example creates a separate folder for each minor version of the database. Whenever a new folder is created, a base is generated and written to Baseline.sql. Then, during the development process, all necessary changes are written to the Changes.sql file in the form of idempotent queries.

Let’s assume that during the development process, at different times, programmers needed the following changes to the database:
a) create a table myTable;
b) add newfield to it;
c) add some data to the table myTable.

All three changes are written so as not to be repeated. As a result, no matter what intermediate state the database is in, executing the Changes.sql file will always migrate to the latest version.

For example, one of the developers created a table myTable on his local copy of the database, wrote the change a) to the Changes.sql file stored in the common code repository, and forgot about it for some time. Now if he executes this file on his local DB, change a) will be ignored and changes b) and c) will be applied.

Pros, cons, conclusions

Very convenient implementation of migrations from any intermediate version to the latest - you just need to execute one file on the database (Changes.sql);
There are potentially situations in which data will be lost; this will have to be monitored. An example would be dropping a table and then creating another table with the same name. If only the name is checked when deleting, then both operations (deleting and creating) will occur every time the script is executed, despite the fact that they have already been performed;
In order for changes to be idempotent, you need to spend more time (and code) writing them.

Due to the fact that updating the database to the latest version is very simple and can be done manually, this method works well if you have many production servers and need to update them frequently.

Method of assimilating the database structure to the source code

Unfortunately, I did not find any separate articles devoted to this approach. I would be grateful for links to existing articles, if any. UPD: In its article, Absent talks about its experience of implementing a similar approach using a home-written diff utility.

The main idea of ​​this method is reflected in the title: the database structure is the same source code as PHP, C# or HTML code. Therefore, instead of storing migration files (with queries that change the database structure) in the code repository, you only need to store the actual database structure - in declarative form.

Implementation example

For simplicity of the example, we will assume that in each revision of the repository there will always be only one SQL file: CreateDatabase.sql. In parentheses, I note that in analogy with the source code, you can go even further and store the structure of each database object in a separate file. Also, the structure can be stored in the form of XML or other formats that are supported by your DBMS.

The CreateDatabase.sql file will store the CREATE TABLE , CREATE PROCEDURE , etc. commands that create the entire database from scratch. If changes to the table structure are necessary, these changes are made directly to existing DDL queries for creating tables. The same goes for changes to stored procedures, triggers, etc.

For example, the current version of the repository already has a table myTable, and in the CreateDatabase.sql file it looks like this:

CREATE TABLE myTable
id INT (10) NOT NULL ,
myField VARCHAR (255) NULL ,
PRIMARY KEY (id)
);

If you need to add a new field to this table, you simply add it to the existing DDL query:
CREATE TABLE myTable
id INT (10) NOT NULL ,
myField VARCHAR (255) NULL ,
newfield INT(4) NOT NULL,
PRIMARY KEY (id)
);

After this, the modified sql file is submitted to the code repository.

Performing migrations between versions

In this method, the procedure for updating the database to a newer version is not as straightforward as in other methods. Since only a declarative description of the structure is stored for each version, for each migration you will have to generate the difference in the form of ALTER -, DROP - and CREATE -queries. Automatic diff utilities will help you with this, such as Schema Synchronization Tool, included in SQLyog, TOAD, available for many DBMSs,

Development of general requirements for data migration

As part of developing overall data migration requirements, the business analyst should define a data migration strategy. Based on the needs of the business and the operating mode of the customer organization, “big bang” migration or smooth migration should be chosen.

When choosing a “big bang” strategy, the amount of time that a business may experience downtime during data migration must be determined. After determining the amount of time a business might spend downtime during data migration, a data migration schedule can be created. When scheduling data migration, a reserve day or days should be allocated to re-migrate in the event of serious errors.

When choosing a smooth migration strategy, all business users of the target system should be divided into groups for which a schedule should be drawn up for switching to working with the target system. Once business users are divided into groups, their corresponding data sets and entities must be identified. When distributing business users into groups, the relationships of these groups within the business process automated in the target system must be taken into account.

In parallel with the choice of data migration strategy, the project management and/or business analyst should identify typical data migration risks. For each identified risk, a strategy (methods) for dealing with the risk must be selected and a set of measures to prevent the risk must be developed. The risk prevention strategy may be one of the following:

  • Refusal from excessively risky activities (refusal method),
  • · prevention or diversification (reduction method),
  • · outsourcing of costly risk functions (transfer method),
  • · formation of reserves or stocks (acceptance method).

Table 1 shows a matrix of typical data migration risks and ways to deal with each of these risks. The matrix cells provide examples of activities to deal with each risk within each risk prevention method.

When compiling the table, it was assumed that the “Reject” strategy was not acceptable for the project team, since a prerequisite for the study was the need to perform migration work. Thus, there is no possibility of completely or partially excluding such work from the project scope.

Table 1 Matrix of risks and possible strategies for dealing with risks of the migration stage

Risk management strategy

Decline

Broadcast

Adoption

Risks of writing a technical specification

Elaboration of migration requirements together with business, owners and consumers of migrated data

Involvement of consultants in the preparation of technical specifications

Formation of a budget and time reserve on the project for working with potential change requests

Risks of Testing

Development of test plans and scenarios with the involvement of business users

Transferring a package of testing work to a separate team with predetermined conditions for accepting their results

Forming a reserve of time and budget for potential additional work due to incomplete testing of migration software and hosted data

Risks of the process of obtaining and loading data

Elaboration of requirements for test unloading. Performing a “test” migration using a test upload

Forming a budget and time reserve for repeated downloads and iterative downloads

Risk management strategy

Decline

Broadcast

Adoption

Risks associated with the work of the project team

Risks of placing data on the target system

Parallel development of the “as is” and “to be” data model to level out differences in formats and methods of working with data in the source and receiver

Forming a reserve of time and budget for improving the functionality of the receiving system

Organization of planning of data migration work

The input to the migration work plans is the data migration strategy formulated as part of the development of general data migration requirements.

The result of planning work on data migration should be a set of design artifacts that record the sequence of steps to implement data migration. The outputs of the planning stage are the following project artifacts: a migration work plan and a communication plan for the data migration stage.

The migration work plan defines a set of tasks and those responsible for their implementation, distributed using the RACI matrix. The project plan for migration work must contain packages of tasks and work for each stage of migration: from the preparatory to the post-migration stage of work. Taking into account the results of the analysis of project experience, a WBS was developed that avoids risks and corresponds to “reference” theoretical approaches to organizing data migration.

Shown in Fig. 3, the work structure covers all stages of migration and can be traced to the lines of the project work plan. As part of the project work plan, the deadlines for completing each task must be determined and the outputs of each task must be indicated. When drawing up a work plan for each task, a condition must be defined, the fulfillment of which will clearly determine that the task is completed.

An equally important design artifact of the data migration phase is the communications plan. The communications plan at the migration stage determines the goals and types of interaction between project team members, as well as between project team members and customer employees - business users.

The preparation of the communications plan for the migration phase should be based on the previously drawn up work plan. For each task within the project plan, the following attributes must be defined:

  • 1. Subject (topic) of communication.
  • 2. The need for interaction between role groups within the team;

Who should participate in communication? What role groups?

  • 3. What type of interaction is acceptable?
  • 4. What should be the output of communication?
  • 5. The need for interaction with the project team and business users;
  • 6. Who should participate in communication from the project team and from the business side?
  • 7. What type of interaction with business users is acceptable?
  • 8. What should be the output of communication with business users?
  • 9. Who should be aware of the results of communication?

A communication plan template for the data migration phase is shown in Table 2 below (Table 2).

Table 2 Template for communication plan during data migration phase

Communication topic

indicates the main issue or task for which this communication is needed, a section of migration requirements, data objects

Participants from the project team

role groups and roles are indicated

The need to interact with business

sign is indicated: yes or no

Business participants

the functional unit and responsible employee are indicated

Interaction type

the type of interaction is indicated, for example: written, meeting, workshop

Develop and document business requirements for data migration

Developing and documenting data migration requirements should be carried out in several stages. As part of the developed methodology, the stage of developing business requirements for data migration will also include the step of conducting and documenting the results of a survey of the customer organization. The purpose of the development stage of business requirements for data migration is to develop a complete package of requirements for specialized software for data migration, as well as for the composition of the data migrated to the target system.

1. Inspection of the operating information system

A chronological examination of the operating information system should be the first stage of work as part of the development of requirements for data migration.

A survey of the existing information system as part of developing requirements for data migration can be carried out as part of a business survey of the entire IS implementation project. The survey as part of developing requirements for data migration should be aimed at creating an `as is' data model. The `as is' data model will make it possible to prepare requirements for the composition of the data being migrated from the source system to the destination system.

The `as is' data model describes all data objects that are used in the operating system. After compiling such a model, it will be easier for the customer to select a list of data objects for transfer to the receiving system. Such a model should indicate the names of entities, as well as describe their attribute composition and relationships between entities indicating their cardinality.

In addition to the data model of the operating system, at the survey stage, business process diagrams must be drawn up that use the data objects selected for transfer to the target system. Business process models will allow you to identify missing data objects if they were missed when analyzing the data model.

In addition to compiling an “as is” data model, at the survey stage the rules for working with dictionaries and reference books in the operating system must be described. In particular, the following points must be described for each dictionary:

  • - Name of the dictionary or reference book;
  • - Functional unit responsible for administering the dictionary;
  • - Availability of uniqueness check for codes and values ​​of dictionary entries;
  • - Availability of rules for regularly updating entries in dictionaries;
  • - Description of the rules for working with dictionary entries that have lost their relevance;
  • - If data migration is carried out from several sources, it is necessary to describe whether there is duplication of dictionaries in several source systems. If there are duplicate dictionaries in several source systems, a description of the rules for deduplication and mapping of records for each dictionary is required;

The format of the specification of requirements for migrating dictionaries from the source system to the target system is given in the table and an example of filling out such a specification is given.

2. Development of data profile requirements for migration

The development of data profile requirements for migration should involve key business user groups in accordance with the communications plan developed during data migration planning. A document template and an example of filling out for recording the survey results are given in Table 4 (Table 4).

Business users are the source for answering the following data profile questions for data migration:

  • 1) Will the migrated data be input to the business process automated in the designed system? The answer to this question should be a table describing the correspondence between data objects of the source system and business processes automated in the target system.
  • 2) What time period should be determined for data migration? In this case, the provisions of regulatory and organizational and administrative documents must be taken into account. The answer to this question should be a table of correspondence between the data object and the time horizon for downloading data from the source system.
  • 3) Which data object of the target system corresponds to each of the data objects of the source system selected for migration? The answer to this question should be a mapping table of data objects of the source system and the destination system.
  • 4) Are there fields or a group of attributes in the attribute composition of data objects of the source system that will not be used in the target system? The answer to this question should be a table describing the attribute composition of data objects with a flag set: used/not used in the target system.
  • 5) Are there fields or groups of attributes in the attribute composition of the data objects of the source system that do not correspond in data type to the fields in the target system? The answer to this question should be a table describing the attribute composition of data objects with a flag and a decoding of inconsistencies.
  • 6) Has there been an upgrade to the source system that has resulted in a significant change in the structure of the data that needs to be migrated? Examples of changes:
    • - Changing the composition of mandatory attributes;
    • - Changing the rules for storing data objects;
    • - Significant change in the structure of a data object, leading to a change in data volumes.

The answer to this question should be a description of the data objects of the source system with a description of the features of working with the data object during the time period for which the data is migrated.

7) Which functional unit or specific users are the owners of the data? The answer to this question should be a table of correspondence between data objects and customer employees.

IT department staff are a resource for answering the following questions about the data profile for migration:

  • 1) Where is the data selected by business users for migration to the target system physically stored? What is the source of the data: an enterprise application, system, or sources outside the customer organization? The answer to this question should be a table of correspondence between data objects and their storage (name and type of database, name of table/tables in the database).
  • 2) Does the meta-information of the migrated content allow us to develop migration algorithms? Is it possible to analyze the data structure based on metainformation?
  • 3) Are there technological limitations of the source system that affect the data structure, forms of storage and work with data? The answer to this question should be a complete description of the technological limitations of the source system in the context of data objects for migration.

IT staff and business users must work together to assess the severity of potential errors when moving data from the source system to the target system. In particular, business users and IT department employees should analyze the relationships of data objects at the level of the database being used to determine the list of possible data integrity checks when transferred to the target system.

The development of requirements for a data profile for migration should be completed by receiving a test download of data from the source system(s) in accordance with the developed data profile. The resulting test download will be used to debug the migration software and evaluate the quality of the data provided.

  • 3. Developing requirements for a migration utility
  • 1) Requirements for data cleaning and logical checks

After developing the requirements for the data profile for migration, it is necessary to develop rules for cleaning or transforming data downloaded from source systems so that they are correctly placed in the target system. For this purpose, it is necessary to correlate the business requirements for the data model of the system being designed and the description of the data model of the operating system.

For the purpose of determining the level of data quality, the following types of logical checks must be applied within each data object being migrated:

  • - Are all required attributes filled in?
  • - Is the physical data type of each attribute the same in the source and destination systems?
  • - Do the attribute value lengths in the data object satisfy the requirements in the target system?
  • - Does the data storage format (dates, decimal numbers) meet the requirements in the target system?
  • - Do the identifiers of data objects in the source system allow them to be clearly distinguished?

Requirements for logical checks should be recorded in the functional specification for the data migration utility.

Identified inconsistencies in the download data can be resolved in one of the following ways: Data can be removed from the download if it has no business value or is converted using specific algorithms.

To correctly place data in the target system, the following types of transformation algorithms can be developed for data objects:

  • - Missing required attributes can be filled with predefined values ​​- “stubs”.
  • - Attribute values ​​whose length does not meet the requirements of the target system must be truncated.
  • - The date format of the source system must be converted to the date storage format of the destination system.
  • - The decimal numeric data storage format must be matched to the decimal numeric data storage format in the target system.
  • - Attributes whose physical data type does not match the requirements of the target system must be converted. For example, text values ​​can be cast to integer values.

Requirements for data cleaning and transformation algorithms, as well as the types of data objects for which they will be applied, must be recorded in the functional specification for the data migration utility.

2) Requirements for naming entities in the receiving system

Requirements for the development of algorithms for naming entities in the destination system should be developed in the event that it is impossible to apply in the target system the algorithms used for naming entities in the source system(s).

Algorithms for naming and numbering data objects should be developed in the following cases:

  • - If data objects with the same business meaning are migrated from two source systems, then a common algorithm must be used in the target system to generate their names and numbers.
  • - If the source system used algorithms to generate entity numbers that are no longer relevant due to changed regulatory or organizational documents.
  • - If the source system used algorithms to generate entity numbers that do not satisfy the business rules for working with entities in the designed system.

Requirements for algorithms for generating names and numbers of migrated entities must be recorded in the functional specification for the data migration utility.

3) Requirements for logging data migration

To track the status of the data migration process, the migration utility must support logging of data loading into the target system. In this case, the log of the migration utility should reflect information about all logical checks performed and the removal of data from the data set for loading into the target system.

Requirements for the migration utility log must be recorded in the functional specification for the migration utility.

The migration utility log should allow you to track each operation as part of the process of loading data into the target system.

Each data migration log entry must have at least the following set of attributes:

  • · Type of data object being migrated;
  • · Source system;
  • · Unique identifier of the data object in the source system;
  • · Unique identifier of the data object in the receiving system;
  • · Date and time of loading for this data object;
  • · The name/number generated by the migration utility for a given data object, if an algorithm for generating entity names/numbers was applied for this data object;
  • · Flag: whether the data object was migrated to the receiving system or was excluded from the loaded data;
  • · The reason why the object was excluded from the loaded data is the composition of logical checks and rules;
  • · Description of the error that occurred during the migration of the data object, if any.

The Error Description attribute displays a parameterized message from the data migration utility if the migration utility was unable to load a data object into the target system.

If the flag is set for a data object - the object is removed from the migrated data, then the attribute “Reason for exclusion of data object deletion” should display a parameterized message describing the data cleaning algorithm, according to which this object was removed from the migrated data.

A fragment of the migration utility log is given in Appendix 4 (Appendix 4 - Fragment of the migration utility log).

Conducting testing as part of data migration work

Testing work consists of the following work packages:

  • 1. Initial testing of migration software for compliance with technical specifications;
  • 2. Business testing of migration software for compliance with business requirements for data transformation algorithms during migration;
  • 4. Testing the functionality of the receiving system after placing the migrated data;

Below is a sequence of steps that must be taken by the project team to conduct each of the testing phases.

Primary testing of migration software should identify and eliminate technical failures and defects of the developed migration utility. Such testing should be carried out by a team of testers in accordance with the developed test cases. At the same time, after modifications to the software, in case of identified errors, it is mandatory to conduct regression testing of the utility.

Business testing of migration software for compliance with business requirements and data transformation algorithms should be performed using the migration utility log to ensure completeness of checks and tests. The procedure for business testing can be as follows:

  • 1. Determine the data objects ignored during loading from the migration utility log.
  • 2. Make sure that these objects in the source system satisfied the conditions for applying logical checks or transformation mechanisms.
  • 3. If a discrepancy is detected, record the error in the migration utility.

Errors recorded during testing must be recorded in such a way that it is possible to unambiguously determine the source of their occurrence - an algorithm or rule that worked incorrectly.

Automated testing tools can be used for business testing of the migration utility. If they are not available, the set of unit tests should cover all possible types of data objects that have been migrated to the target system, as well as all checks and transformation mechanisms designed for each type.

Test cleaning and loading of data involves working with data obtained after developing a package of requirements for the profile of migrated data. Test upload should be a tool to check the following characteristics of the migration software:

  • - Work of business algorithms, logical checks and conversion mechanisms on real data;
  • - Performance of migration software on data volumes close to productive.

Working with test unloading is the final stage of testing migration software. After the migration utility has processed the test upload, it becomes possible to estimate the required planned time to load the entire array of migrated data. Successful placement of a test upload on the receiving system should mean internal acceptance of the migration utility and the ability to proceed to work on loading a production download - a full unload from the source system to the target system.

Test cleaning and loading of data into the receiving system involves performing all operations necessary to migrate data to the target system, including testing the functionality of the receiving system. This testing stage is described in more detail below.

Testing the functionality of the receiving system should ensure that all system functions that use the migrated content are operating normally. In this case, such testing can be carried out in two stages:

  • 1. Testing the performance of the receiving system;
  • 2. Detailed testing of the functionality of the receiving system.

To test the functionality of the receiving system, immediately after the download is complete, a short test script must be drawn up, which specifies what exactly needs to be tested for functionality, for example:

  • - a list of screen forms that need to be opened,
  • - the number of objects in directories (dictionaries) that should be visible to the user;
  • - the number of objects in the system registries that should be visible to the user,
  • - a list of system functions that must be launched (generating reports, search engines, access to migrated data objects).

Detailed testing of the functionality of the receiving system involves the launch of all automated business processes for which the input data is the migrated data. If the loaded data includes more than one type of data object, then test scripts must be provided for each migrated data type. The results of testing the functionality of the receiving system must be recorded in the test log.

In accordance with the ANSI standard, the data migration testing log should have the following structure:

  • 1. Test case ID;
  • 2. Description of the operation within the test case;
  • 3. Description of the error during the execution of the operation within the test case;
  • 4. The impact of the error on the functionality of the system;
  • 5. Impact of the error on related testing operations.

The description of the operation within the test case must necessarily include:

a) Incoming data for the operation;

b) Expected results;

c) Results obtained;

d) Date and time of the operation;

e) Number of attempts to perform the operation;

f) Project team members responsible for testing;

g) Environment of the operation within the test case.

A fragment of the functionality testing log of the receiving system is given in Appendix 5 (Appendix 5 - Migration results testing log).

Leif Poulsen for InTech

Systems for automating production and collecting information about production processes are relatively short-lived. They often have to be upgraded or replaced before the process equipment reaches the end of its life. For many companies, managing the replacement or upgrade of such automation systems without stopping production is a real challenge. Therefore, the objective need for modernization or replacement is ignored until something happens. This article is about how you can successfully accomplish this task through careful planning and organization.

Two main factors are behind the need to modernize and replace industrial automation systems and industrial IT systems: the technical degradation of these systems, as well as changes in the requirements of the business processes that these systems support.

The reliability of technical systems will decrease over time if companies ignore the need to modernize operating systems, databases, and application software. The operational risk of equipment failure increases accordingly.

Through careful planning, operational risk can be kept to an acceptable level, while also protecting investments and minimizing life cycle costs. For a typical automation or IT system, only 20-40% of the investment goes into purchasing the system. The remaining 60-80% goes to maintaining its high availability and adapting to periodically changing requirements.

In addition to assessing the activities required to prevent technical degradation, it is necessary to consider new challenges as well as potential business opportunities. The business environment is constantly changing, and all opportunities to improve existing or introduce new technologies must always be considered. Typical business opportunities that may drive the migration of high-cost automation systems are speed to market, competitiveness, growth, quality and regulatory compliance.

Long term migration plan

Developing a long-term system migration plan allows companies to maintain systemic operational risks at an acceptable level. In addition, it ensures risk management and timely support of business goals. The migration plan must take into account restrictions such as “best manufacturing practices”, technology functionality, and inevitable production downtime.

In general, the approach to long-term planning is depicted in Fig. 1. A migration plan is developed to determine where the company wants to be in five years, what actions need to be taken to get there, and whether the resources required to get there are available. This approach is based on the architectural design principles outlined in the TOGAF standard, which is widely used in the development of system architecture for industrial enterprises.

Figure 1. General approach to creating a long-term migration plan.

It is necessary to distinguish between the existing architecture and the target, desired one. The difference between them reflects the difference between the company's current position and the position it wants to occupy in the future. The migration plan charts the path from the existing architecture to the target architecture - possibly through several transition stages.

Each architecture can be described as a series of "layers" that bridge the gap between business and technology - as shown in Figure 1. 1. Attention must be paid to the following “layers”:

  • Business goals it is part of the overall strategy planning effort. They allow you to choose the right direction of the process.
  • Business model provides context in which production and business processes are understood. Typically, it includes a high-level description of material flows and processes.
  • Description production and business processes is important for the successful application of technologies and the correct assessment of their value from a business point of view.
  • Information, data and documents important for connecting processes and applications. Interoperability and management of information flows between applications are especially important.
  • Descriptions applications allow you to formulate high-level requirements and define interfaces.
  • Definition infrastructure, computing and network requirements (hardware, fault tolerance, performance).
  • Provided services define requirements for ensuring effective operational management and decision support.

Developing a migration plan

Developing a migration plan for an entire organization, or even a single production site, can be a truly complex task involving many people. It is recommended to divide the development process into several stages, described below.

Part II

Stage 1: Mobilization

Basic goals:

  • achieve a common understanding of tasks and goals
  • mobilize the organization where the project is planned
  • detail the plan by describing the milestones and results of the project phases
  • collect all necessary/available information
  • provide a proper understanding of concepts, practices and theory
  • scheduled meetings
  • workshop dedicated to the start of the project

Results:

  • detailed consultation plan
  • common goals
  • process overview

Stage 2: Analysis

The goals of the analysis phase are:

  • analysis of business and production processes in order to:

Assess the readiness of personnel servicing IT and automation systems

Understand the data and functionality needs for the future architecture

Identify key benefits of the future architecture to set goals and implement the business case

  • analysis of existing architecture

Determination of existing production processes in their connection with automation systems, data collection, production management systems

Identification of existing business processes and their connections with production automation systems

Identify existing applications, data, logical and physical infrastructure, and technical support services

During this stage the following activities are carried out:

  • seminars and discussions on various processes
  • site visits to obtain contextual information
  • seminars and discussions on existing systems
  • assessing services to determine their maturity and compliance with regulatory requirements

Results:

  • identification of existing infrastructure
  • analysis documentation
  • list of ideas on the challenges and opportunities of the new architecture list of ideas on the challenges and opportunities of the new architecture

Stage 3: Goal

The purpose of this stage is to identify and describe the needs formulated during the analysis stage.

The solution, or target architecture, will describe:

  • future business processes and functionality
  • target application types, with their functionality, users, information and interfaces
  • infrastructure needs and revised support standards

During this stage the following activities are carried out:

  • seminars and discussions on process improvement
  • workshops and discussions on improving architecture

Results:

  • future architecture (presentation)
  • Brief Description of Application Types

Stage 4: Justification

The purpose of the justification phase is to provide an initial business case based on rough estimates of the costs and benefits of the project.

The gap between the existing and the desired situation usually leads to the emergence of a number of ideas. Justification of ideas will allow you to distinguish “necessary” from “desirable”, and, after that, present and develop ideas to top management.

During this stage the following activities are carried out:

  • rough estimate of costs and benefits
  • first version of the presentation

Results:

  • common goals
  • prioritization of business ideas
  • assessment of required resources

Stage 5: Plan

The purpose of this stage is to plan the project based on priorities, resources and dependencies:

  • planning the sequence of implementation of the stages of a consolidated project
  • providing the resources and competencies needed for the next steps
  • initiation of project management activities
  • completion of consulting and transfer of results of all stages to the customer

During this stage the following activities are carried out:

  • development of an implementation plan
  • development of an investment plan
  • risk assessment

Results:

  • implementation plan
  • assessment of the workload of personnel involved in the project
  • project risk assessment
  • investment plan (as a first approximation)
  • final version of the project presentation

Case Study

The following example illustrates the application of the described approach in real conditions. To comply with confidentiality conditions, anonymity is maintained in the description. We are talking about a fairly large enterprise that produces active ingredients for pharmaceutical products. The production facilities were commissioned more than 20 years ago, and although some modernization has been carried out since then, a number of outdated systems require replacement. Building automation systems and DCS are in the first place, as they are based on outdated technologies that are difficult to maintain. In addition, production must adapt to new business demands, including the discontinuation of some products and the launch of others. In general, there is a need to work on a migration plan that covers both technical and business requirements.

First, you need to create a master list of the equipment that is currently used throughout the enterprise. This information is often “hidden” in various documents (and employees’ memories). It must be extracted and visualized so that it becomes the basis for migration planning. For this purpose, we typically create a Process Module Diagram showing the major equipment and raw material movements in each production unit. As separate layers "on top" of the hardware, we show which systems support which hardware.

An example is shown in Fig. 2. Data about installed systems is also contained in the system storage (or simply in Excel files), and can be used for further analysis and planning.

Figure 2. Automation “layer” allows you to evaluate existing systems

Before discussing a migration plan, it is necessary to identify the main business reasons for changes in production. In this case, management identified the following motives:

1. Consistent and error-free compliance with regulatory requirements

2. Minimum time required to enter the market, flexibility

3. Success, competitiveness, operational excellence

4. Uncompromising quality

5. Growth in production volumes

These goals need to be translated into more specific tasks, the implementation of which can be quantified.

Next, we must find out how well existing systems support current and future business processes. To do this, we use a standard reference model (based on the ANSI/ISA-95 series of standards). It includes 19 high-level business processes, detailed to the extent that it allows you to see weaknesses in their practical implementation and the need for change for the sake of effective business.

In addition, we also need to evaluate the technical capabilities of existing systems to support business processes in the future. This is done systematically, using the information described above from the system storage. For each system for which information is in the repository (in our case, about 70 systems), the following aspects need to be assessed:

  • Equipment condition (failure history, mean time between failures, equipment age, spare parts availability)
  • Status of the software (vendor support, availability of documentation, personnel with the necessary competencies)
  • System recovery capabilities (redundancy, average service life before repair)
  • Business impact assessment (information provision, data errors, unavailability)
  • Indicative indicators (system reliability, system criticality, etc.)

The technical assessment identified the need to modernize and replace a number of systems:

  • Process control systems are based on a conventional, outdated DCS and many different PLCs, some of which are already “ripe” for replacement.
  • The building automation system is based on a newer platform, but also requires upgrading to meet new requirements.
  • A number of secondary systems also require modernization, or even replacement.
  • The infrastructure serving all systems requires better segmentation and protection to meet today's security requirements.

Part III

After analyzing the business goals for the future, it became obvious that none of the existing systems fully meet future needs. This understanding gave rise to a number of ideas regarding the introduction of new technologies, as well as the production execution system. As a result of the analysis, 16 different projects were proposed that, if implemented consistently, will help the company meet future technical and commercial requirements.

The content of technical work and the cost of each project are assessed; For each project, a short one-page summary is prepared for management to discuss. (See Figure 3).

Rice. 3. One-page description of a potential migration project

In order to select priority projects, the potential results of each of them are assessed. The results are assessed in terms of business goals, as well as the reliability of the process control system.

Typically, you will need to evaluate multiple implementation scenarios to estimate the overall resource and funding requirements for each plan (Figure 7). One of the main constraints to consider is the windows in the production process during which systems can be replaced or modified. As a rule, these “windows” occur on weekends - and this is a serious bottleneck.

Rice. 7. Consolidated overview of the migration schedule

Since there is always little time to replace systems and set them up, preparation must be very thorough. Everything must be planned out in detail. An important aspect of planning is testing the implemented systems.

In the case we describe, the implementation of the long-term migration plan was carried out in six different streams, see Fig. 8.

Rice. 8. Organizing migration projects in six different streams

Part of the preparation is a thorough assessment and prevention of project risks. In Fig. Figure 9 shows typical risks associated with migration projects.

Rice. 9. Assessing typical risks of migration projects

Business Support Processes

The lifecycle management and long-term migration planning approach described in this article is driven by business needs. It includes an assessment of current and future business goals, as well as a thorough analysis of how technical systems will be maintained or replaced to best support those goals. The approach is based on TOGAF principles, which provide for sequential project implementation depending on the availability of budgets and qualified personnel. Assessing current and future system architectures is a key element in determining future migration projects. Finally, it is necessary to adhere to organizational change management principles that ensure the timely involvement of key project stakeholders, which is so important to the success of migration projects. The effectiveness of this approach has been repeatedly demonstrated in practice.

Leif Poulsen) ( ), leading automation and IT specialist at NNE Pharmaplan. He holds a master's degree in process management. At NNE Pharmaplan, Poulsen is responsible for the development of technologies, methods and competencies in the field of industrial automation and IT, and works as a senior business consultant.