Background: Metal-on-metal hip resurfacing arthroplasty was introduced to preserve patients’ bone and facilitate revision surgery. This prospective cohort study aims to determine a prognostic model (OsBHR) to predict which patient and surgeon related factors predict better long term implant survival for the Birmingham Hip Resurfacing (BHR).
Methods: Between 1997 and 2002, details of 4490 patients (4945 hips) treated by hip resurfacing arthroplasty using the BHR prosthesis were entered in a prospective international registry and available for analysis. Implant survival was determined using the Kaplan-Meier estimator and a shared frailty proportional hazard model was used to determine predictors of implant revision.
Results: prognostic equation suggested a minimum diameter of 50 mm should be used for male and 52 mm for female patients to ensure a minimum 10-year survival rate of 95% for in procedures performed by an average surgeon.
Conclusion: Implant survival at ten years and beyond strongly depends on component size and gender, varies between surgeons but is little affected by patient age. The OsBHR predictor allows a personalized estimated survival at ten years based on pre-operative variables.
Metal-on-polyethylene hip resurfacing was introduced around 40 years ago but had high failure rates due to polyethylene wear, in one design as high as 72% by 10 years . Metal-on-metal resurfacing was developed to address this problem by improving the bearing. McMinn, Treacy and other non-Inventor-surgeons have reported 10 year survival rates of 89% to 97% [2-5]. Two systematic reviews of studies comparing metal-on-metal resurfacing with total hip arthroplasty found better functional outcome and activity levels but around twice as high revision rates with metal-on-metal resurfacing [6,7]. It remains to be seen whether the sequence of hip resurfacing followed by primary hip replacement gives a better life-time solution than proceeding immediately to a total hip replacement.
A review of UK registry data also found higher revision rates for metal-on-metal resurfacing, even after adjusting for confounding factors . Failure was in particular due to femoral neck fracture and component loosening [6,7,9], and the rate of aseptic, lymphocytedominated vasculitis associated lesion (ALVAL) were higher among females and patients with smaller diameter femoral components [9-11]. Studies based on the Australian and combined English-Welsh registries have further emphasized the important role of gender and head size [8,12]. Analyses of data from these registries also demonstrate that revision rates differ significantly between the various resurfacing implant designs [8,10,13-15] and are lower for consultants or centres that perform a large volume of resurfacing cases [8,12]. These studies strongly suggest that patient selection is important for metal-on-metal resurfacing [14,16-18].
Since 1997, we have prospectively collected data from an international group of surgeons who used the earliest introduced design of metal-on-metal hip resurfacing arthroplasty. This puts us in a unique position to analyze long-term implant survival of metal-onmetal hip resurfacing and elucidate the influence of gender, head diameter and surgical experience. The aim of this study is to use these data to devise a prognostic equation to select patients for metal-onmetal hip resurfacing arthroplasty who will achieve 95% implant survival at 10 years.
Between July 1997 and November 2002, 4535 patients (5000 hips) treated by hip resurfacing arthroplasty using the Birmingham Hip Resurfacing (BHR; Smith & Nephew, Warwick, UK) were entered in a prospective multicentre registry managed by the Oswestry Outcome Centre. These were all cases treated using this implant in that period by an international group of 141 surgeons, who volunteered to enter their patients onto the registry. All patients in the study consented to be entered, for data analysis and publication. Forty-five patients (55 hips) opted not to be part of the registry soon after surgery, leaving 4490 patients (4945 hips).
The endpoint was defined as revision of part or whole of the prosthesis. Patients who could not be contacted were censored at the time of last contact, and patients who died were censored at the time of death. Age at operation, gender, preoperative diagnosis, component size and treating surgeon were considered as independent predictors of survival.
Kaplan-Meier estimates of cumulative survivorship were determined for subgroups and compared using a log-rank test to determine univariable predictors of implant survival. To find multivariable predictors of survival the data was fitted to a shared frailty model, which is a Cox proportional hazard model with an added random term to model the effect of treating surgeon on implant survival . The proportional hazards assumption was tested using the Grambsch-Therneau test .
The model was stratified for predictors that failed this test, and the assumption that the remaining predictors acted similarly on the baseline hazard function in each stratum was tested using a global likelihood ratio test and a Wald test . A stratified Cox model does not allow analyzing the influence of the stratifying predictor on survival . We therefore used the method proposed by Powers to analyze the influence of stratifying predictors on survival . The standard test for investigating differences in survival, namely the logrank test, has optimal power to detect differences if hazards are proportional. Because any stratifying variable is non-proportional we used the more general Fleming-Harrington test with a weight function Gγ,ρ to investigate whether the stratifying variable affected survival . This weight function can represent the log-rank test (γ,ρ=0,0), emphasize difference at early time points (γ,ρ=1,0) or medium and late time points (γ,ρ=0,1) .
To prevent over fitting, we used bootstrapping to determine optimism-corrected values for the performance of the final model . This performance was assessed using the concordance c, the proportion of all pairs of implants that can be ordered such that the implant with the predicted longer survival also has the actual longer survival . The best-fit model was used to determine a prognostic equation and a table to predict 10-year survival of the implant as a function of independent predictors.
All statistical analyses were performed with R version 2.8.1  using the packages “survival”, “coxme”, “multcomp”, “rms” and “boot”. A p-value of 0.05 or less was assumed to denote statistical significance.
A complete dataset of predictors was available for 4214 patients (2824 men and 1390 women; 4644 hips; Table 1), corresponding to 94% of patients and hips. Head size was unavailable for 115 patients (130 hips), primary diagnosis for 26 patients (27 hips), and 135 patients (144 hips) were lost to follow up. The average age at operation was 52.9 years (Table 1). The commonest primary diagnoses were osteoarthritis (OA; 84.9%), developmental dysplasia of the hip (DDH; 8.6%), avascular necrosis of the neck (AVN; 4.7%) and rheumatoid arthritis (RA; 1.7%; Table 1). The median femoral component diameter was 50 mm (46 mm for female, 50 for male patients; Table 1).
Most cases (95.3%) were treated with a standard cup, 4.2% of cases with a dysplasia cup and 0.3% of cases with a bridging cup (Table 1). The dysplasia cup was used in 29.2% of cases with DDH, which accounted for 60.6% of its usage. The remaining 39.4% of dysplasia cups were almost all used in OA cases. Cases with a dysplasia cup were mainly female and had a smaller median diameter than standard cups (46 vs. 50 mm). Four cases (four patients) had a primary diagnosis of trauma, none of which failed; these were excluded from analysis.
|Characteristic||N, mean (SD) or median (IQR)|
|Mean age at surgery||52.9 (9.8)|
|Developmental Dysplasia||401 (8.6%)|
|Avascular necrosis||216 (4.7%)|
|Rheumatoid arthritis||79 (1.7%)|
|Median femoral component size(mm)||50 (46-54)|
|Male patients||50 (50-54)|
|Female patients||46 (42-46)|
|Median acetabular component size(mm)||56 (52-60)|
|Median followup time (year)||7.13 (7.05-10.05)|
Table 1: Overview of patient and hip characteristics.
The 134 surgeons who performed the 4644 cases with a complete dataset treated a median of 5 cases (IQR 2-14, range 1-1628). The two designer-surgeons treated half the cases (2500 hips or 54%) and 23 surgeons treated 40 cases or more. In the group with complete data, 182 implants failed. The commonest failure reasons were femoral neck fracture (26%), aseptic loosening (23%) and avascular necrosis (17%; Table 2). The reasons differed significantly between male and female patients, with femoral neck fracture commonest among men and aseptic loosening commonest among women (Table 2).
|Failure reason||Total||Male||Female||Time to failure
(years; median and IQR)
|Fractured femoral neck||47 (26%)||31 (36%)||16 (16%)||0.3 (0.1-1.2)|
|Aseptic loosening||41 (23%)||16 (19%)||25 (26%)||2.8 (1.9-4.4)|
|Avascular necrosis||31 (17%)||20 (24%)||11 (11%)||4.3 (1.4-7.1)|
|Infection||18 (10%)||6 (7%)||12 (12%)||2.8 (2.0-5.4)|
|Adverse reaction to metal debris||13 (7%)||3 (4%)||10 (10%)||5.8 (4.5-8.2)|
|Dislocation/subluxation||11 (6%)||4 (5%)||7 (7%)||2.9 (1.5-6.7)|
|Others||6 (3%)||2 (2%)||4 (4%)||2.9 (1.6-5.6)|
|Unknown||15 (8%)||3 (4%)||12 (12%)||6.6 (3.6-8.5)|
Table 2: Reasons for the 182 implant failures.
The mean time to failure differed significantly between failure reasons, with femoral neck fractures occurring earlier and avascular necrosis, adverse soft tissue reaction and “unknown” later than average (Table 2). The overall 10-year survival rate was 95.4% (Table 3). In the univariate analysis, implant survival was significantly lower for female patients, patients under the age of 40, patients diagnosed with DDH or AVN, patients with smaller head sizes, and patients operated by a nondesigner surgeon (Table 3).
|Predictor||N||Survival at 10 year1
(%; 95% CI)
|All cases||4640||95.4 (94.7-96.1)||-|
|Female||1532||92.4 (90. 7-94.1)|
|Femoral head size||<0.001|
|Surgeon resurfacing volume3||0.77|
|Low volume (below 40)||623||93.9 (91.5-96.4)|
|High volume (40 or more)||1520||93.4 (91.8-95.1)|
Table 3: Univariable analysis of predictors for implant survival.
Excluding the designer-surgeons, 10-year implant survival was equivalent between cases performed by surgeons who contributed 40 or more cases and those who contributed less than 40 (Table 3 and Figure 1).
To determine the cumulative effect of all variables, we performed a Cox regression analysis with shared frailty model. Because the effect of gender didnt meet the proportional hazards assumption, the analysis was stratified by gender. Femoral head size and operating surgeon were, but age diagnosis were not significantly associated with survival (Table 4 and Figure 2).
Figure 2: Kaplan-Meier survival curves for cases treated by the two designer- surgeons (black lines), cases treated by “high-volume” (at least 40 cases) surgeons (red line first 40 case, green line later cases) “low-volume” (at most 40 cases) surgeons (blue line). The number of cases at risk at each time point is shown at the bottom of the graph; the shaded bands represent 95% confidence intervals.
|Femoral head size||0.89 (0.84-0.93)||<0.0001|
|Surgeon3||SD 0.22, range 0.41-2.3||<0.0001|
|Early time points||0.32|
|Medium/late time points||<0.0001|
Table 4: Results of cox regression analysis.
Each two millimeter of extra head size decreased the failure risk (hazard) by 23%, and the surgeon-associated hazard ratio ranged from 0.41 to 2.3 (Table 4). For the final model we omitted age diagnosis, which did not affect the coefficient values of the remaining two variables. The final model had an uncorrected concordance c of 0.72, which would require an optimism-correction of 0.002 to characterize the performance of the model for new patients not in the original dataset.
The final model was used to predict survival curves for male and female patients, assuming both had a 50 mm implant (the median size) and were operated by an “average” surgeon (Figure 3). Predicted implant survival for male patients was lower up to eight years postoperatively and higher afterwards (Figure 3). No significant difference in predicted implant survival between genders was found using a logrank test or when emphasizing early time points (Table 4).
Figure 3: Three predicted implant survival curves and their confidence intervals for male and female patients assuming an average surgeon. For female patients (green and bluethin lines), curves for two different head sizes are shown, namely the median head size of all patients (50 mm, top blue curve) and the median head size of female patients (46 mm, bottom green curve). The median head size of male patients (50 mm) was identical to the median head size of all patients and hence a single curve (thick red linecurve) is shown for them. Implant survival at later time points for 50 mm heads differs significantly between male and female patients (Fleming-Harrington test, p<0.0001).
However, adjusted implant survival was significantly lower for female patients when emphasizing medium or late time points (Table 4). The difference in predicted implant survival between male and female patients was larger when the gender-specific median component sizes of 50 mm for males and 46 mm for females were assumed (Figure 3).
Based on the coefficients in the model, we determined an equation to predict implant survival 10 years postoperatively. The predicted hazard ratio HR was:
HR=0.885 (Head Size-50) × HR Surgeon
Where HR Surgeon is the hazard ratio associated with a surgeon and equals 1 for the surgeon with an average implant revision risk for his patients, 0.41 for the surgeon with the lowest revision risk in our dataset and 2.3 for the surgeon with the highest risk. In the absence of further data, surgeons could assume they contribute an average risk. The predicted 10-year implant survival S10 year for the two strata was the Oswestry Birmingham Hip Resurfacing (OsBHR) predictor (Figure 4).
Figure 4: Predicted ten-year implant survival for male and female patients as a function of femoral head size and surgeon-specific risk factor. For male patients, a curve representing patients operated by an average surgeon (thick line) and two representing patients operated by a surgeon with a high risk (95% centile) or a low risk (5% centile; thin lines) are shown. For female patients, only a curve representing patients operated by an average surgeon is shown (dotted line).
Male: S10 year=0.954 HR
Female: S10 year=0.945 H
Using the above equations and bootstrapping to determine confidence limits, 10-year survival rates were predicted for a range of head sizes (Table 5). The 23 surgeons who performed 40 or more cases treated 4155 cases, with a median of 87 cases (IQR 51 to 108). Of these cases, 3860 had a complete set of predictors. These 23 surgeons had an average hazard ratio of 0.99 (SD 0.42, range 0.41 to 2.3). We determined a new Cox regression with shared frailty model based on this group, with the sequential case number as an extra variable. We found no effect of case number on implant survival (HR per 100 extra cases 0.96, 95% CI 0.91-1.01, p=0.14).
|Head size (mm)||10-year survival (95% CI)|
|Male patients||Female patients|
|38||0.81 (0.71 to 0.95)||0.78 (0.70 to 0.87)|
|42||0.88 (0.83 to 0.94)||0.86 (0.82 to 0.89)|
|46||0.93 (0.90 to 0.95)||0.91 (0.88 to 0.93)|
|50||0.95 (0.94 to 0.96)||0.94 (0.92 to 0.96)|
|54||0.97 (0.96 to 0.98)||0.97 (0.94 to 0.98)|
|58||0.98 (0.97 to 0.99)||0.98 (0.96 to 0.99)|
Table 5: Predicted 10-year survival rates for standard head sizes, assuming all cases performed by an average surgeon.
According to the OsBHR predictor this particular design of resurfacing arthroplasty could meet the criterion of 95% implant survival at 10 year set by the National Institute for Health and Clinical Excellence (NICE, UK) when the procedure is performed by an average surgeon, provided a minimum femoral component size of 50 mm for male and 52 mm for female patients is needed.
Implementing this criterion would exclude almost all female patients in our study (98.6%) from the operation, but only a small proportion of the male patients (10.2%), highlighting the recommendation that resurfacing is unsuited for female patients [3,27]. If the use of the BHR was limited to male patients with femoral component sizes of 50 mm or greater, the predicted 10 year implant survivorship in our cohort would rise from the current 93.5% to 96.7%, assuming the operation is performed by an average surgeon.
A more stringent criterion was applied based on the assumption that 10-year survival should be at least 95% even if the operation is performed by a below-average surgeon. A minimum size of 52 mm in male and 54 mm in female patients was found to ensure that 95% of surgeons would achieve the target survival rate in their patients, in which case 98.8% of the female and 56.9% of the male patients in our study would have been excluded from the operation.
This study provides the largest independently assessed analysis of long-term implant survival for one particular design of resurfacing arthroplasty, the Birmingham Hip Resurfacing (BHR). The overall ten year survival rate of 95% in this study compares favorably to that of conventional arthroplasty for patients in a similar age group, also using the combination of cemented femoral component and cementless cup (85% for patients under 50 and 88% for patients between 50 and 59) . Three studies found that gender did not influence survival once femoral head size was taken into account [8,13], but one did find that female patients had a higher revision rate independent of head size [8,10,15,28]. Our study analyzed the influence of gender in detail and found the risk of particular failure mechanisms in resurfacing implants is time and gender-specific, specifically femoral neck fracture, the earliest occurring mechanism and predominant in male patients, versus adverse reaction to metal debris (ARMD) and “unknown”, the latest occurring mechanisms and predominant in female patients [11,29-33]. The same difference in failure patterns was observed in a study of 173 retrieved specimens . We also found that female patients had a significantly lower implant survival rate, even when corrected for head size and operating surgeon. Late occurring failure mechanisms, especially adverse reactions to metal debris, have been linked with a delayed-type hypersensitivity response . Although only a few failures were identified due to adverse reactions, it should be realized that many of the failure cases occurred before knowledge of this diagnosis was first disseminated. Hence, many of the failure cases diagnosed as “unknown” will probably be due to adverse reactions [29-33].
Another important factor predicting ten-year survival was the operating surgeon, with the surgeon-associated hazard ratio ranging from 0.40 for the surgeon showing the lowest revision risk to 2.1 for the one showing the highest risk. By itself, this finding should not surprise, it would be far more surprising if all surgeons were equal. The inevitable finding of a variation between units, be they surgeons or institutions, is well know from other studies [36,37].
The performance of individual surgeons and surgical units has been studied extensively in cardiology, where it came to the fore after the Bristol Royal Infirmary Inquiry into paediatric cardiac deaths . One main lesson from that enquiry was that when reviewing individual surgeons, several issues of process and organization rather than technical ability were usually the underlying problem . In the absence of a set of known risk factors, the Society of Cardiothoracic Surgeons (UK) decided that any surgeons whose patient mortality rates were outside 99.99% (4 SD) over a three year period would not have met the defined standards . If we apply the same 4 SD standard to individual surgeon’s hazard ratios, none of those entered in our cohort would fall outside the standard.
Our study has limitations. Although we considered several preoperative factors, we omitted other factors such as obesity, the presence of comorbidities and bone quality or factors for hip resurfacing such as femoral head cysts and head-neck junction abnormalities . These predictors were simply not considered when initiating this study in 1998. Inclusion of post-operative variables such as cup orientation and post-operative activity level may improve the accuracy of survival prediction, however these factors are unknown when the initial database were setup in 1997 and thus unlikely to influence the decision. A final limitation of our study is the focus on implant survival. Although 10 years 95% implant survival is a common outcome and benchmark use by NICE, it is not necessarily the most important criterion for the patient.
Survival of BHR metal-on-metal hip resurfacing arthroplasty at ten years and beyond strongly depends on component size and gender, and varies between surgeons. The OsBHR formula can use these preoperative factors to predict an individual survivorship and so personalize the recommended choice of implant by a particular surgeon. These longer-term risks should be kept in mind when considering this type of implant for female patients.