Analysis and Data Improvement Tools

Analysis and Data Improvement Tools

NAACCR Committees and members have worked collaboratively to develop tools and resources for use by central cancer registry analysts and researchers. Select one of the options below to learn more.

This tool describes and provides macro-driven formulae, in a Microsoft Excel workbook, to calculate completeness of case ascertainment based on observed cancer incidence, death rates, and a comparison of standard rates of incidence and mortality in the United States.

This program is used with a NAACCR standard data exchange file format with confidential information, including a census tract identifier. The program will link the census tract identifier with the percent of the residents in the census tract that live below the poverty level. This information is based data from the 2000 U.S. Census and the American Community Survey. The data used is the census data most closely aligned with diagnosis year. The program will output two variables that will be attached to every registry record inputted: the xx.x% poverty for the census tract, and a second variable that groups the exact percents into four categories: less than 5% poverty, 5%-9.9% poverty, 10%-19.9% poverty, and 20% or higher poverty.

The Record Uniqueness Program was developed by Howe, Lake, and Shen to assess electronic data files for risk of confidentiality breach based on unique combinations of key variables.

This is a software utility developed in MS Access to identify miscoded sex codes based on first name. Taking as input a data file in NAACCR v14 or v15 format, a query runs against a list of known sex/name pairs, and it produces a list of cases for manual review that have potential errors in sex. The utility is based on an algorithm initially created by the New York Cancer Registry in August 2011.

An evaluation of this tool found that 19-75% of flagged cases were errors. For more recent years of data, a greater percentage of the flagged cases are identified as errors after investigation. For cases where the edit flagged a sex that was correct, a misspelling of the name was often identified. For male breast cancer, nearly all flagged cases were errors, a consequence of the highly skewed sex distribution of this cancer site. A published study on this tool is available here.

A list of tools which can import and export data in NAACCR Volume II format.


v16 SAS Translation Tool

The code template below can be used by proficient SAS programmers to efficiently and accurately access data in the new V16 format. Code to both read and write ASCII V16 format is provided. Various sections and options are included – users simply comment out sections which are not applicable for their specific needs. The code supports the three most often used record types (Incidence, Confidential and Text). Beginning with V14, code is included to handle data elements which are part of the CDC’s Comparative Effectiveness Research (CER) and Patient-Centered Outcomes Research (PCOR) projects. As you use the tool, we appreciate any feedback or comments you have. Contact with your thoughts.

This algorithm uses information on Asian race Not Otherwise Specificed (NOS) as reported to the cancer registry based on information from the medical record, and using gender, birthplace, first name, and surname (including maiden name, when available), assigns an Asian NOS race status to a more specific Asian race group.

Note: NAPIIA is now run as part of the NHAPIIA algorithm.

This algorithm uses information on ethnicity from the medical record, information reported to the cancer registry, and an evaluation of the strength of the birthplace race and surname (including maiden name, when available) associated with Hispanic ethnicity status.

Note: NHIA is now run as part of the NHAPIIA algorithm.

Copyright © 2016 NAACCR, Inc. All Rights Reserved | Terms of Use | naaccr-swoosh-only See NAACCR Partners and Sponsors