MONICA Manual, Part V: Data Transfer and Analysis


Section 1: Data Transfer to the MONICA Data Centre

November 1998


This section provides the timetable and an explanation of the methods to be used for transfer of data from a MONICA Collaborating Centre to the MONICA Data Centre.

Contents

Forms


Copyright World Health Organization (WHO) 1999. All rights reserved.

Queries and comments on this section to be addressed to:

Esa Ruokokoski
MONICA Data Centre
Department of Epidemiology and Health Promotion
National Public Health Institute
Mannerheimintie 166
SF - 00300 Helsinki
Fax: + 358 9 4744 8338
Email: esa.ruokokoski@ktl.fi

Earlier versions

Changes made after March 1992 revision


Introduction

In this section, data transfer from MONICA Collaborating Centres (MCC) to the MONICA Data Centre (MDC) is considered. The procedures for shipment of all data are considered. In addition, for data transferred on magnetic tapes, diskettes or by e-mail, the preparation of the tapes, diskettes and e-mail, and the problem communication concerning the data are also considered.

1. Data transfer deadlines

The following list gives the annual deadlines for transfer of data to the MDC. It is important that the schedule is followed in order that the MDC can routinely process the data and produce reports and analyses with short delay.

If for some reason you are not able to submit data by deadline, please notify the delay to the MDC by the deadline using the Problem Communication Form (Form P, attached to this section). Indicate in the Problem Communication Form both the reason for and the expected length of the delay.

The MCCs are encouraged to submit data to the MDC as soon as they have been checked locally and are ready for being submitted, even if there were still a long time until the deadline. In the MDC the data will be checked, commented to the MCC and included into the database soon after they are received.

By 31 March:

These data are to be reported on a magnetic tape, diskette or by e-mail. The data should include the records which were registered (as indicated by item DREG of the data transfer format) by the end of the calendar year which ended 15 months earlier, but which have not been submitted to the MDC earlier. After this data transfer the MDC assumes it has received practically all of the events with onset in the calendar year which ended 27 months earlier. For example, by 31 March 1988 the MDC expects records for the events which were registered by the end of 1986. After the data transfer the MDC assumes it has received data for practically all events with onset during 1985. If the MCC registers events for 1985 still during 1987, the records should be submitted to the MDC.

The Serial Number Inventory Data should be submitted for all serial numbers which were issued during the calendar year which ended 15 months earlier. The serial number inventory data should also be submitted for the serial numbers which were issued earlier, but for which either no serial number inventory data have been submitted before, or for which the STATUS was then 2: core data not yet ready to be submitted. The event registration data transfer schedule is illustrated in Figure 1.

By 31 October:

These data should be reported on paper forms. The forms should contain the data for the preceding year. For example, the data for 1988 should be reported by 31 October 1989.

By 31 December:

These data should be reported on paper forms. The forms should contain the data for the calendar year two years before the year of reporting the data. For example, the demographic and mortality data for the year 1993 should be submitted to the MDC by 31 December 1995.

Deadline for Population Survey Data

These data are to be submitted to the MDC on a magnetic tape or a diskette. The deadline for sending all these data is within twelve months of the end of the survey examinations.

Deadline for Medical Care Assessment Data

These data are to be submitted to the MDC on a magnetic tape, diskette or by e-mail. The acute coronary care data for each data collection period should be submitted to the MDC at the same time as the corresponding event registration data.

Figure1.gif (8982 bytes)

2. Material shipment from the MCCs to the MDC

These instructions concern any shipment of MONICA Data to the MDC, including magnetic tapes, MONICA paper forms and error corrections. These instructions do not concern material shipment to the MONICA Management Centre or to the MONICA Quality Control Centres, though many of the points covered are worth being considered also then.

Preparation of material shipments should include the following stages:

  1. Collect together all the material to be shipped. Each MCC is recommended to have one person through whom all the shipments should pass.
  2. Complete a Packing List. The Packing List (Form N, attached to this section) and the instructions for completing it are attached to this section. The Packing List is used in the MDC in two stages:

    The Packing List asks for a Shipment Number. The shipment numbers should form a complete sequence within an MCC. In other words, the first shipment should be numbered 0001, the second 0002, the third 0003 and so on. The completeness of the shipment numbering is used in the MDC to monitor that no shipments have been lost.

  3. Make sure that you have filed a copy of everything included in the shipment at the MCC. The copies are needed if the package is lost or damaged during the transfer or if, possibly years later, it is suspected that the data in the MDC are not consistent with the data submitted by the MCC.
  4. Log the shipment in the MCC. Every MCC should keep book of the shipments. Such a log is useful in keeping the sequence of the shipment numbers complete, in making sure that for each shipment an acknowledgement is received from the MDC, and in helping to monitor what really has been sent to the MDC. An example of a shipment log is given in Figure 2.
  5. Pack the material properly and mail it to:

    MONICA Data Centre
    National Public Health Institute
    Mannerheimintie 166
    SF-00300 Helsinki
    Finland

  6. Make sure that you receive an acknowledgement from the MDC. Indicate the receipt of the acknowledgement in the Shipment Log. If you do not receive an acknowledgement within a month of sending the material, check that you have really sent it and then contact the MDC.

Figure 2. An example of a shipment log for data transfer from an MCC to the MDC

S1page7.gif (3971 bytes)

*) It is useful to have summary information of the contents of the shipment in the log book. It makes it easier to get an overview of material sent, and it helps in finding out when a particular data was sent out. The contents of the shipments can also be found in the backup copies of the Packing Lists.

3. Data transfer on magnetic tapes, diskettes or by electronic mail to the MDC

3.1 Local procedures

3.1.1 Extraction of the core data

Most MCCs collect more data than is necessary for the MONICA core study. The core data has to be extracted from the more extensive data set for transmission to the MDC. The extraction of the core data is the most critical stage of the data transfer: it is usually done by using a computer programme. Any errors in the core data induced by an erroneous program are systematic. Therefore, special attention should be paid on detecting all possible errors in the data extraction programme. It should include selecting a sample of records for which a Data Transfer Format is then filled manually from the original data collection forms (including error corrections) by a qualified person. The sample size should be at least 20 if the extraction consists only of a selection of items from the local data set. If, however, the value of a core data item is a combination of the values of one or several local data set items the correctness of such core data items should be checked more carefully. In particular a larger sample, at least for such items, should be considered. The manually completed forms should then be compared with the core data extracted by computer.

3.1.2 Checking the correctness of the data

Before data are submitted to the MDC it should be checked by a computer for

  1. completeness: that it contains all the records it should contain. The MCC should compare the data with the log-book indicating the history of every serial number issued.
  2. correctness and consistency. The data should be checked such that it will not contain illegal values.

After the MDC has received the data, a check for correctness and consistency will be made. The details of the procedure for this checking and error correction are explained in Section 3.3 below. The edit specifications used by the MDC have been distributed to the MCCs.

3.1.3 Preparation of Magnetic Tapes

If you prefer to send the data on diskettes, please refer to Section 3.1.4: Preparation of Diskettes. If you prefer to send data by e-mail, please refer to Section 3.1.5: Data Transfer via Electronic Mail.

The magnetic tapes on which data are submitted to the MDC are processed in the MDC in a routine manner. It is important that the tape formats used fit exactly to the processing routine of the MDC. Please follow carefully the instructions below. If you are not able to follow these instructions please contact the MDC.

3.1.3.1 Tape Number

The MCC must give a sequential TAPE NUMBER to every magnetic tape in which data are submitted to the MDC. The number given to the first tape should be 0001, to the second 0002 and so on. The TAPE NUMBER is recorded in the Tape Header (also see Section 3.1.3.3), in the Tape Label (also see Section 3.1.3.4) and in the Packing List.

3.1.3.2 Characteristics of the tape
Tape density: The accepted tape densities are 800, 1600 and 6250 bytes per inch (bpi).
Tracks: A 9-track tape must be used.
Character code: The accepted character codes are ASCII and EBCDIC. Please use either of these.
Label: The tapes must be UNLABELLED. No standard IBM, VAX or other labels are accepted.
Blocks: The tapes can be blocked or unblocked. "Unblocked tape" means that the block size is the same as the record length, i.e. one block is one record.
Records: The records should be long enough to include all the data from any particular form for which data are submitted in the tape. The record length must be constant throughout the tape, including the headers and trailers.

In most cases a short 400 feet or 600 feet tape is sufficient for MONICA data transfer. Such tapes are easier to send and to handle than longer tapes. The length of the data set in feet can be calculated by this formula:

(1/12) * #Recs * (Reclen/Blsize) * (Blsize/Density + 0.6),

where

As an example, the following table gives the number of 100 character records that can be fit into a 400 feet tape with density 1600.

Blsize 100 500 2000
#Recs 7000 26000 51000
3.1.3.3 Internal layout of the tape

The magnetic tape must contain at least three files: A HEADER FILE, one or several DATA FILES and a TRAILER FILE. The files must be included in the tape in the above order. Different files can be written on the tape in the same physical file, or they can be separated from each other by a single TAPE MARK. (Note that the TAPE MARK was called END-OF-FILE-MARK in the early versions of these instructions.) To improve the reliability of the data transfer, an MCC can include two copies of the data (including the headers and trailers) on the same physical tape. A double TAPE MARK should be included at the end of the tape.

The contents of the files should be as follows:

Header file: The HEADER FILE should contain the data of exactly one form: the Tape Header (Form HDMONICA). The form is attached to this section. The Tape Header contains information about the files included in the tape.
Data files: The data of the actual data forms (i.e. forms with form identifications 01, 02 etc., e.g. Core Data Transfer Format-Survey Data) are included in the DATA FILES. Please include different data forms and forms from different Reporting Units in different DATA FILES. Every actual data form of any particular DATA FILE must have the same Form Identification, the same Form Version, and have data from the same Reporting Unit.

The first form of every DATA FILE should be the Form Header (Form HDRFORMS, attached to this section). The Form Header should be followed by the actual data forms. The last form of every DATA FILE should be the Form Trailer (Form TRAFORMS, attached to this section). The Form Header and Form Trailer contain information about the actual data forms included in the file.

Trailer file: The TRAILER FILE should contain the data of exactly one form: the Tape Trailer (Form TRMONICA). The form is attached to this section. The main purpose of the Tape Trailer is to indicate to the processing routine of the MDC that all DATA FILES have been read.

The two possibilities for the internal layout of the tape are illustrated in Figure 3.

Figure3.gif (9255 bytes)

3.1.3.4 The label to be fixed on the tape reel

Every tape in which data are submitted from the MCC to the MDC must have a Tape Label fixed on the reel. The format of the label and instructions for completing it are given in the form Tape Label (Form T) attached to this section.

3.1.4 Preparation of diskettes

All data which has been advised to be transferred to MDC on magnetic tape can also be transferred on a floppy disk (diskette).

The diskettes on which data are submitted to the MDC are processed in the MDC in a routine manner. It is important that the diskette formats used fit exactly to the processing routine of the MDC. Please follow carefully the instructions below. If you are not able to follow these instructions please contact the MDC.

3.1.4.1 Tape number

The MCC must give a sequential TAPE NUMBER to every diskette in which data are submitted to the MDC. The number given to the first tape or diskette should be 0001, to the second 0002 and so on. The TAPE NUMBER is recorded in the Tape Header (also see section Section 3.1.4.3), in the Tape Label (also see Section 3.1.4.4) and in the Packing List.

3.1.4.2 Characteristics of the diskette
Diskette specification: 3.5 inch
Disk format: MS-DOS, at most 1.44 Mb (3.5 inch)
Character code: ASCII
Records: The record length can be variable or fixed.

If you are not able to fit all the data being transferred on one diskette, you should prepare for the rest of the data another diskette, with a new TAPE NUMBER and new headers and trailers as explained in the next section.

3.1.4.3 Internal layout of the diskette

Each diskette must contain at least two files: A HEADER FILE and one or several DATA FILES.

The contents of the files should be as follows:

Header file: The HEADER FILE should contain the data of exactly one form: the Tape Header (Form HDMONICA, attached to this section). The Tape Header contains information about the files included in the diskette.

Name the HEADER FILE as "HEADmm.nnn", where mm is the MCC code, and nnn are the three last digits of the tape number. For example, the name of the HEADER FILE of the diskette with tape number 0007 from Sino-MONICA-Beijing (MCC 17) should be HEAD17.007.

Data files: The data of the actual data forms (i.e. forms with form identifications 01, 02 etc., e.g.   Core Data Transfer Format-Survey Data) are included in the DATA FILES. Please include different data forms and forms from different Reporting Units in different DATA FILES. Every actual data form of any particular DATA FILE must have the same Form Identification, the same Form Version, and have data from the same Reporting Unit.

The first form of every DATA FILE should be the Form Header (Form HDRFORM, attached to this section). The Form Header should be followed by the actual data forms. The last form of every DATA FILE should be the Form Trailer (Form TRAFORMS, attached to this section). The Form Header and Form Trailer contain information about the actual data forms included in the file.

The data files should be named as "DATAmmll.nnn", where mm is the MCC code, ll is any number identifying the DATA file on the diskette,  and nnn are the three last digits of the tape number. For example, the name of the second DATA FILE of the diskette with tape number 0007 from Sino-MONICA-Beijing could be DATA1702.007.

Trailer file: No trailer file is needed on a diskette used for data transfer to MDC.
 3.1.4.4 The label to be fixed on the diskette

Every diskette in which data are submitted from the MCC to the MDC must have a Tape Label fixed on it. The format of the label and instructions for completing it are given in the form Tape Label (Form T) attached to this section.

3.1.5 Data transfer via electronic mail

The MCCs which have a connection to Internet can send their data to MDC via E-mail instead of magnetic tapes or diskettes. The files transferred via E-mail to MDC are processed in the MDC in a routine manner. It is important that the format used fit exactly to the processing routine of the MDC. Please follow carefully the instructions below. If you are not able to follow these instructions please contact the MDC.

3.1.5.1 Tape number

The MCC must give a sequential TAPE NUMBER to every data transfer via E-mail to MDC. The number given to the first tape, diskette or E-mail transfer should be 0001, to the second 0002 and so on. The TAPE NUMBER is recorded in the Tape Header (also see also Section 3.1.5.2) and in the Packing List (also see Section 3.1.5.4).

3.1.5.2 Layout of the E-mail data transfer

Each transfer should contain at least two files: A HEADER FILE and one or several DATA FILES. It is also recommended to send an additional informal file to tell that a data transfer is being made, indicating the number of files being sent.

The contents of the files should be as follows:

Header file: The HEADER FILE should contain the data of exactly one form: the Tape Header (Form HDMONICA, attached to this section). The Tape Header contains information about the files included in the E-mail transfer.
Data files: The data of the actual data forms (i.e. forms with form identifications 01, 02, etc., e.g. Core Data Transfer Format-Survey Data) are included in the DATA FILES. Please send different data forms and forms from different Reporting Units separately. Every actual data form of any particular DATA FILE must have the same Form Identification, the same Form Version, and have data from the same Reporting Unit.

The first form of every DATA FILE should be the Form Header (Form HDRFORMS, attached to this section). The Form Header should be followed by the actual data forms. The last form of every DATA FILE should be the Form Trailer (Form TRAFORMS, attached to this section). The Form Header and Form Trailer contain information about the actual data forms included in the file.

Trailer file A file corresponding to Tape Trailer in data transfer on magnetic tape is not needed in E-mail data transfer.
3.1.5.3 Sending the data

The data should be sent to the E-mail address provided by the MDC. The files should be sent as E-mail attachments.

The DATA FILES should be named as DATAmm11.nnn where mm is the MCC code, 11 is any number identifying the DATA FILE on the transfer, and nnn are the last three digits of the tape number. For example, the name of the second transferred file with tape number 0010 from Augsburg (MCC26) should be DATA2602.010. The name of the HEADER FILE should in this case be "HEADmm.nnn" where mm is the MCC code and nnn are the last three digits of the tape number (e.g. HEAD26.010).

If your mailing system asks for subject of the E-mail message of the data transfer, the subject should be "DATA mm/nnnn" where mm is the MCC code and nnnn is the tape number. For example, the subject field of the transfer of data with tape number 10 from Augsburg should be "DATA 26/0010".

3.1.5.4 Packing List

A completed Packing List - MCC to MDC (Form N) should be sent to MDC by telefax. The following coding should be used for item 5 of the Packing List:

5.
How many magnetic tapes or diskettes:   1 (e-mail transfer)

3.2 Procedures for transfer to MDC

For shipment to the MDC the magnetic tapes or diskettes must be packed properly so that possible damage to the tape or its attachments could be avoided. The package sent to MDC should contain

  1. A completed Packing List - MCC to MDC (Form N).
  2. The magnetic tape or diskette containing the data, with the Tape Label fixed on the reel.
  3. A listing of all headers and the first five data records of every data file contained in the tape.

The MCC should keep a copy of every item contained in the package, including a copy of the magnetic tape or floppy disk. The copies will be needed if the package is lost during transfer or if, possibly years later, it is suspected that the data in the MDC is not consistent with the data submitted by the MCC.

One way of storing the copy of data transferred on a magnetic tape is the following: The MCC keeps a long backup tape for the MONICA data transfers. Every time data on a magnetic tape is transferred to the MDC, a copy of the transfer tape is appended to the long backup tape. Most computers have standard software for doing this. Note that the Tape Headers and Tape Trailers contain the necessary information about the tape and data transferred and about the transfer date. It is strongly recommended that the MCC keeps two copies of the long backup tape.

The package should be sent direct to the MONICA Data Centre, Helsinki. If there are problems in exporting data on magnetic tapes which cannot be overcome locally, please contact the MDC. It may be useful to indicate on the package that MONICA Data Centre is a WHO Collaborating Centre.

The MCC should number and log every shipment. (For numbering, see the instructions for the Packing List attached to this section). The MCC should also make sure to receive and log an acknowledgement of every shipment from the MDC.

3.3 Procedures at MDC

3.3.1 Receipt of the data in the MDC

When a data package is received it is opened, checked against the Packing List and recorded in a log book. If the Packing List is not consistent with the contents of the package, the list is marked, copied and returned with an inquiry to the MCC. Otherwise an acknowledgement is sent to the MCC, the information on the Tape Label is keyed to the computer, and the tape or diskette is sent to the operator.

3.3.2 Data processing in the MDC

The tape reading process relies on information keyed earlier from the Tape Label. The tape or diskette is copied on disk and on a magnetic tape which is then stored. The consistency of headers and trailers with the data is checked. If there are discrepancies, the MCC is contacted.

The data are appended to the database (VAX/Rdb). The TAPE NUMBER and the date of processing are included in the data records. All earlier records will remain in the database, with one exception: a new record will always replace an earlier record which has the same SERIAL NUMBER. In such a case, a report is generated about the replacement and the MCC is informed.

At the time of appending data to the database, they are checked for multiple records with the same serial numbers and for illegal or unusual values according to the constraints defined in the MDC edit specifications. The constraint violations are printed on a Computer Generated Error Correction Form (Form H, an example attached to this section). The rates of constraint violations are recorded and reported for quality control purposes. If the rates are unacceptable, the Error Correction Form is sent to the MCC for information and the MCC is asked to correct the data and submit the full data again to the MDC. Otherwise the error correction form is sent to the MCC, and the MCC is asked to complete the form. The records with constraint violations are marked and they will remain in the database.

3.3.3 Completing the Computer Generated Error Correction Form in the MCC

After the MCC receives the Error Correction Form, generated by the MDC, all the data values queried on the form should be checked against the original documents where the data were collected. Every value queried should be either confirmed as being correct or corrected according to the instructions for the Error Correction Form. When checking for the correctness of the data, all suspect data items should be checked against the original documents on which the data were collected. Suspect values which cannot be shown to be erroneous should be confirmed to be correct. The completed Error Correction Form should be returned to the MDC, accompanied with a Packing List. Also this shipment should be numbered and logged in the MCC.

3.3.4 Correcting errors which have not been marked on the Computer Generated Error Correction Form by the MDC

If the MCC finds an error in the data which has not been questioned by the MDC, the error must be reported by the MCC to the MDC by using the Manual Error Correction Form (Form G, attached to this section). The fully completed Error Correction Form, accompanied with a Packing List must be sent to the MDC. If the error concerns the key fields of a record (i.e. the first 14 characters of a record) or if the error concerns the entire DATA FILE or the tape or diskette, the error must be reported by the MCC to the MDC by using the Problem Communication Form (Form P, attached to this section).

3.3.5 Error correction in the MDC

In the MDC the master data file is updated with the confirmed and corrected values. The completed Error Correction Form is archived. The corrected data are checked again. If unconfirmed unusual values or unexplained illegal values are found a new Error Correction Form will be sent to the MCC and so on. The corrected values and status indicators will be added to the database by the MDC. At this stage the variable indicating the date of processing will be changed to the date of modification of the record. Also, changes to the database will be archived on to the journal file, and reported to the MCC.

3.3.6 Additional quality control measures

When the MDC receives data of a Serial Number Inventory Form (Form 05, Form 06 or Form 07), the data will be used to check the completeness of the corresponding core data (Forms 01-04) included in the MDC database. For every missing or duplicate record an inquiry is sent to the MCC.

The MDC checks the coronary and stroke event registration data (Form 01 and Form 03) periodically for possible duplicate registrations. All pairs of events which have the same sex, the same date of birth and the dates of onset within the same 28-day period are printed on a Duplicate Observations Checking List (Form HB, an example attached to this section). The Duplicate Observations Checking List is sent to the MCC. The MCC should complete the form according to the instructions given on the form. The completed form should be returned to the MDC, accompanied with a Packing List. The shipment should be numbered and logged in the MCC.

3.3.7 Monitoring of data transfer

Once a year the Data Centre must give a report of data transfer to the Steering Committee and send copies of it to the Collaborating Centres. The report contains for each MCC:

  1. the frequencies of data shipments and the numbers of observations received by the MDC from the beginning of year 1984 and within the last 12 months,
  2. the numbers of illegal values and error rates received by the MDC within the last 12 months,
  3. numbers of duplicate and missing observations within the last 12 months and
  4. inconsistencies in shipping lists within the last 12 months.