When a physician makes a cancer diagnosis, there are several ways the information is entered into the California Cancer Registry’s data base. In the mid-1980s, when the registry started, abstractors trained to turn handwritten notes into hundreds of numerical codes for computerization visited hospitals to look at medical records. The codes describe the type of cancer tumor, its location, size, shape, cellular structure, degree of invasion of surrounding tissues and level of malevolence.

Coding is a difficult and subjective task, and the codes are numerous and complex. Abstractors often mislabel the extent of metastatic spread because they cannot interpret handwritten notes or keep up with the periodically changing codes. And there are other challenges to creating accurate records. Patients trek between doctors’ offices, clinics, labs and hospitals, seeking second opinions and treatment; at every juncture, multiple records are generated, duplicating enormous amounts of data. Information drawn from scattered medical records trickles into the central registry over years—if it gets there at all.

Looking to improve the efficiency of its data collection, the registry encourages medical providers to switch to electronic record-keeping, uploading data directly to the registry. Though many hospitals have done this, private practices and laboratories lag behind.

And there are new challenges. Different types of record-keeping software systems in use throughout the state registry do not easily interface. Registry data is not easily corrected when biopsies or treatments reveal that an initial diagnosis was incorrect. The registry does not track most recurrences of a cancer.

Doctors can also have different diagnostic takes on the same tumor, confusing the system. The coding manuals teach abstractors to categorize as a “confirmed” cancer diagnosis records containing language such as “apparently,” “probable” and “suspicious.” A positive diagnosis is still confirmed if a doctor treats a patient for cancer in spite of a negative biopsy. Subjectivity abounds.


Auditing the data

California’s cancer registry data base contains about four million patients, five million tumors and six million case abstracts.

Regular data-quality audits evaluating the timely submission of diagnoses, the completeness of follow-up on case histories and coding accuracy are a requirement of the National Cancer Institute and the Centers for Disease Control. To test data quality, auditors working for the California Cancer Registry periodically select a number of case abstracts that represent datasets. They compare these abstracts to the medical records from which they were generated, looking for errors.

The federal government requires most error rates to be less than 3 percent. Ideally, error rates would be zero, since even a few mistakes can play havoc with statistical analysis.

Over the decades, California’s data audits have revealed troubling error patterns and information gaps. But the audits are also designed to disguise failure. How so?

Let’s say an auditor selects 10 medical records at a doctor’s office. She compares each record to its coded abstract in the registry data base. Each abstract has more than 100 fields to be coded, such as date and place of diagnosis, biopsy results, stage and grade of cancer, extent of metastases, type of treatments, and follow-up tests and recurrences.

If there are 100 opportunities for error in each abstract, there are 1,000 opportunities for error in the set of 10 abstracts.

The medical coding industry recognizes two methods for calculating error rates. The first, known as “code-over-code,” minimizes error rates; the other, called “record-over-record,” maximizes them.

If the auditor uses the code-over-code method to describe finding three errors in each of the 10 cases, the error rate for each case is 3 percent, since each case has 100 fields. That 3 percent error rate would meet the funding agencies’ goal of 97 percent accuracy.

Thirty errors out of 1,000 possible errors may not sound like a lot. But what if one or more of those errors is the labeling of an invasive cancer as a non-invasive cancer? What if a diagnosis is mislabeled as confirmed, or a biopsy result is incorrectly coded?

On the other hand, the record–over-record method highlights data-quality problems. If all 10 abstracts each contain one error, the error rate is 100 percent. If only three cases contain an error, the error rate is 30 percent. If one case has 30 errors, and the rest are mistake-free, the error rate is 10 percent.

Recoding audits can be prepared using either or both methods, but the code-over-code method has become the auditing standard for cancer registries. It has the advantage of pinpointing certain types of errors, but loses sight of the number of cases that have been incorrectly coded.

The code-over-code method allows registries to approach or exceed federal standards that would not be met by the record-over-record approach. Consequently, federal agencies award many registries with a “gold” rating for data quality, despite the existence of both independent and internal audits that question the usefulness of a registry’s data for research purposes.

The National Cancer Institute and the C.D.C. put a high premium on “case completeness,” or the time it takes to collect each year’s crop of cases. California law mandates that all new diagnoses be reported to the cancer registry within six months; federal standards aim at 97 percent case completeness. California’s registry rarely meets the annual completeness goal by even 50 percent. Cases straggle in over many years, preventing their use in timely research that could improve patient care, and biasing incidence rates.


How the data stacks up

According to the California Cancer Registry, “Audits are necessary to maintain the integrity and confidence level of the data we collect. Without these types of evaluations, the data from every state registry [runs] the risk of becoming inaccurate, inconsistent, and unusable, in effect making the collection of cancer data obsolete.”

So how is the registry doing? Based on a series of internal audits and progress reports obtained by the Light, its chronic data-quality problems are worsening.

In a statement to the Light, Kenneth Kizer, who runs the registry’s central database at University of California, Davis, said, “These internal audits reflect very specific and typically small datasets that alone cannot be used to reliably draw any conclusions about the quality of the millions of records in the California Cancer Registry database.”

The whole point of an audit, of course, is to draw exactly such conclusions.

The California Department of Public Health told the Light that the data-quality problems are caused by the “complexity” of recording, but emphasized that overall, the registry still meets federal standards. Here are some of the findings of registry audits in recent years.


* The most recent CDC audit of the California Cancer Registry examined 200 case abstracts for coding errors. Applying the record-over-record method, auditors found that 29 percent of the cases sampled for 2010 had one or more errors. Remarkably, 37 percent of the 81 breast cancer cases sampled had one or more errors, including for “date of diagnosis” (an essential data point for tracking incidence rates), and for diagnostic and surgical codes that did not match source records. Using the code-over-code method, the registry claimed a 98 percent accuracy rate. If the audit sample reflects the actual state of the registry data base, there are hundreds of thousands of breast cancer records containing uncorrected errors.


* The C.D.C.’s previous audit of the registry, released in 2005, randomly selected 297 cancer cases from thousands of hospital records and compared them to registry abstracts. They reported troubling results.

Seven percent of the hospital cases could not be located in the registry data base; 35 percent of those missing cases were breast cancers. In addition, 68 cases contained mistakes—an error rate of 23 percent using the record-over-record method. But, using the code-by-code method, the registry could claim a 98 percent accuracy rate.

Of the audited sample, 120 were female breast cancer cases, and they contained the most errors. Six percent of the breast cancer abstracts carried the wrong codes for biopsy findings, and 10 percent were incorrectly coded for tumor stage or grade, which are crucial diagnostic factors. Auditors found that four cases understated the level of malevolence of the disease and two listed non-invasive cancers as invasive.

Let’s translate these findings. In 2002, the California Cancer Registry recorded 25,503 new female breast cancer diagnoses. The error rate in biopsy coding means that the records of 1,530 patients may misrepresent the pathology of their disease. And abstracts for almost 10 percent of cases, or 2,500 women, incorrectly coded tumor stage.

Remarkably, in 2002 alone, the registry may have wrongly labeled 408 women as suffering from invasive breast cancer.


* A 2010 progress report to the National Cancer Institute noted that two studies testing the accuracy of the Cancer Prevention Institute of California’s abstractors revealed an error rate of 19 percent. The reported also noted that due to a computer snafu caused by a change in coding standards, the registry had indefinitely suspended record gathering: “Reporting facilities have been instructed to abstract 2010 cases but not to transmit them to their regional registry.” When reporting resumed more than a year later, the backlog choked the system, causing the registry to miss some data-quality goals. This problem highlights a structural issue that has plagued the registry since its inception: the network of sometimes incompatible software systems throughout the regional system tends to meltdown when medical codes are periodically changed.


The California Cancer Registry also self-audits from time to time.


* A June 2014 audit of 363 endometrial cancer cases found that 38 percent were incorrectly coded for the stage or spread of tumors. Auditors observed that, “some abstractors are not following coding instructions … partially due … to the manual’s lengthy and sometimes confusing organization [and] conflicting guidelines.”


* A May 2013 audit found that 33 percent of a 348-case sample of new lung cancers contained wrong information about the invasive spread of tumors. Of the 68 cases with errors, 22 percent were “originally coded to ‘no distant metastasis’ when in fact the patient did have metastatic disease” and 12 percent were “originally coded as having metastatic disease [even though the abstractors] did not have documentation that supported distant disease.”


* A June 2012 audit reported that abstractors incorrectly coded the stages of prostate cancers in 17 percent of sampled cases. Auditors reported that 70 percent “of the [186] discrepancies identified on this audit were the result of not following coding directions which are clearly stated in the [coding manual].”


* A May 2012 audit of the original records of a large oncology group reported that the registry had failed to record whether or not the group’s patients had received radiation treatment in 83 percent of sampled cases. Nearly 8 percent of the group’s cases had not been reported to the registry at all, with breast and prostate cancer cases most often missed.


* An April 2012 audit comparing registry records to hospital medical records revealed that 14 out of 15 sampled breast cancer cases contained diagnostic errors. Using the record-over-record method, that’s a 93 percent error rate. Auditors observed: “The cancer staging and treatment information [was] available in the medical record but was not captured in the coding.”


* A July 2011 statewide audit found that 34 percent of gastrointestinal lesions coded as malignant cancers were not really cancers. The Cancer Prevention Institute of California had the largest error rate. The auditors observed that the federal rules for coding this type of cancer are “the exact opposite” of the state registry’s “current clinical practices.”


* A December 2011 audit reported “a significant amount of miscommunication” between cancer registry employees and employees at a rural hospital. Important medical records were “never located” and hospital electronic record systems were “not user friendly.” The rural hospital and an urban hospital that was concurrently audited used different software; significant numbers of errors were found in registry abstracts of pathology and treatment histories for patients at both hospitals. “The data quality results of this audit are concerning,” the auditors wrote.


* Another December 2011 audit of 360 “borderline” ovarian cancer records found errors in about a quarter of the cases, most notably that four non-malignant cases were mistakenly coded as malignant, and two non-cancers were reported as cancers. “Overall very poor data quality,” the auditors concluded.


* A 2010 audit comparing 50 registry abstracts to the original electronic hospital records determined that 42 percent of abstracts contained errors. Using the code-over-code method, the registry claimed a 98.5 percent accuracy rate.


* A 2009 statewide audit of lymphoma cases reported that none of the registry regions reached the federal accuracy standard in code-over code.


* An October 2008 statewide of lung and colon case records failed to meet federal standards, especially in the coding of cancer stages. (A 2005 study in Population Health Metrics points out, “routine data from cancer registries often lack information on stage of cancer, limiting their use.”)


* A June 2006 comparison of registry breast cancer records to original medical records revealed that the records overseen by the Cancer Prevention Institute of California had the most errors. The auditors were surprised that the institute’s most experienced coders were making substantial numbers of errors.


As the scientific experts quoted in this investigation have pointed out: It is past time to keep making the same old mistakes. The bureaucrats and institutions operating the California Cancer Registry have had nearly three decades to get it right. They have failed to perform as promised, while charging the public $1 billion to create an error-plagued database that is retarding our understanding of the many diseases of cancer.

There is an obvious solution. A centrally planned—yes, socialized—single payer-health care system would reduce the level of profit-driven pseudoscience revealed by this series, and vastly reduce the cost of providing decent health care to all people, regardless of the color of their skin.


Next week, Busted! concludes with a bonus feature: the story of a Marin woman whose breast cancer experience sums it all up.



Read Peter Byrne’s investigative series on a nationwide breast cancer scare that never should have happened: