The University of Iowa
The University of Iowa News Services Home News Releases UI in the News Subscribe to UI News Contact Us


2130 Medical Laboratories
Iowa City IA 52242
(319) 335-9917; fax (319) 335-8034

Release: Nov. 2, 1999

UI study shows classification system for evaluating childhood hip disease may be unreliable

IOWA CITY, Iowa — A recent University of Iowa Health Care study is calling into question the reliability of a classification system that orthopaedic surgeons use to help them evaluate treatment outcomes in children with a hip condition known as Legg-Calve-Perthes Disease.

The system, developed in 1981, is called the Stulberg Classification System. The classification scheme predicts how a patient with the hip disease (often referred to as simply Perthes disease) will do over the long term.

"Physicians make decisions based on the system, but nobody had ever proven its reliability," said Stuart Weinstein, M.D., UI professor of orthopaedic surgery and the study’s corresponding author. "Our findings indicate that the system has poor reliability. Consequently, the validity of any treatment decisions, outcome evaluations or epidemiological studies based on this system must be called into question."

Perthes disease is a condition that affects young children, most commonly between the ages of 4 and 10. It involves the loss of blood supply to the upper end of the femur, which eventually leads to deformation of the femoral head. If untreated, the disease may lead to degenerative arthritis. The cause of the disease is unknown, although roughly one in 12,000 children has the disease. The UI Hospitals and Clinics treat between 25 and 30 children with Perthes disease each year.

The Stulberg system establishes five patient outcome categories based on radiographs. Each category describes a different shape or relationship between the femoral head and the hip socket. Each category is associated with the potential for, and the onset of, degenerative joint disease. For example, children with category I hips are at low risk for early-onset degenerative disease, whereas children with category IV hips frequently develop arthritic changes in early adulthood. The system has also been used to evaluate the effectiveness of various treatments for Perthes disease.

For years, Weinstein noticed inconsistencies in outcome evaluations based on the system.

"What one surgeon might classify as being in the Stulberg III class, a colleague might decide falls into the Stulberg IV class," Weinstein explained. "This stems from the lack of objective, standardized language for evaluating the structures we’re looking at on x-ray. This leads to problems in the clinic and in evaluating the literature."

Based on Weinstein’s observation and the discovery that there was little published work concerning the reliability of the Stulberg system, a UI investigative team set out to measure how well surgeons agree with themselves and how well they agree with each other when using the system. The UI investigators also wanted to identify and quantify any sources of nonrandom error.

The UI study involved nine individuals representing three levels of experience — three staff pediatric orthopaedic surgeons, three senior residents and three junior residents. The nine raters independently used the Stulberg system to evaluate radiographs of 23 skeletally mature patients who had been managed for Perthes disease. None of the study raters were involved in the patients’ care. Realizing that low reliability might be due to differing definitions, the investigators designed the study to involve an intervention where the team, following initial evaluations, discussed and standardized the conditions of each category. After seven to 10 days, the raters went back and re-evaluated the radiographs, as well as those of 22 additional patients. Researchers then compared reliability of the ratings before and after the team discussion.

Regardless of the raters’ experience levels, the UI study showed that each rater’s individual reliability coefficient ranged between 71 and 92 before the team meeting and between 57 and 87 afterward. The reliability coefficient among raters ranged between 60 and 73 before the consensus-building session and between 65 and 74 after the session.

"Most interesting was the fact that the category distributions were significantly different before and after the team meeting," Weinstein said.

This indicates that the definitions used have a strong impact on the resulting classifications, therefore this system is likely to be incomparable across physicians and studies when used to evaluate outcomes.

"The study results suggest that the Stulberg system is not a highly reliable tool for evaluating the radiographs of patients with Perthes disease at or after skeletal maturity," Weinstein said. "There were marked inter-rater variances among the nine physicians regardless of their experience level."

Weinstein said that the study results may be over-estimations of true reliability and that the reliability in actual clinical practice may be even lower.

"The participants knew that their ratings would be evaluated and compared with those assigned by their peers," Weinstein said. "In addition, because the participants were asked to review and memorize the Stulberg system and because they were guided by rating sheets, they probably were more knowledgeable about the system than the average orthopaedic surgeon is."

Although Weinstein questions the reliability of the current Stulberg system, he noted that it should not be entirely discounted.

"You don’t want to just throw it out," he said. "Intuitively, it has some validity, but I think the whole classification scheme needs to be looked at. Perthes disease is a three-dimensional deformity, and these classifications are based on two-dimensional radiographs. In the future, we need to have other, more sophisticated methodologies to determine the anatomic outcome of the hip and relate it to the clinical outcome long term. More generally, the lesson of this work is not necessarily that the classification is unreliable, but that years of assuming otherwise may have misled investigators when reporting or making treatment decisions."

The UI findings appear in the September issue of the Journal of Bone and Joint Surgery.

University of Iowa Health Care describes the partnership between the UI College of Medicine and the UI Hospitals and Clinics and the patient care, medical education and research programs and services they provide.