My Research
Reliability and Accuracy in the
Interpretation of Consistency of Surface Lumps
(Thyroid Nodules)
Abstract:
Background: Variability in the interpretation of consistency of surface lumps is
being observed during medical conferences. Objectives: To assess the
inter-observer agreement in the interpretation of consistency of surface lumps
using thyroid nodule as an example an to explore ways on how to reduce the
variability. Methods: Prospective study with group interpretation involving 43
patients with thyroid nodules as subjects per session. Interpreters consisted of
10 residents, 4 interns, and 6 medical clerks. Session 1: interpreters were free
to use any adjective to describe consistency. Session 2: interpreters instructed
to use four adjectives – cystic, soft, firm, and hard. Session 3: interpreters
instructed to use two adjectives – hard and non-hard. Results: Session 1 and 2
concurrence was seen in only 2 and 5 patients respectively. Session 3 showed 79
per cent concurrence among the residents and 84 per cent among the medical
students. The difference between the interpretation among residents and medical
students was not significant using chi-square test (p = 0.15). However, the
difference of the concurrence rate between sessions 1, 2 and 3 was significant
(p < 0.001). Conclusion: With the absence of standardization and when more than
two adjectives were used, great variability in interpretation of consistency of
surface lumps among physicians occurs. The study showed that the use of the hard
and non-hard categorization reduced variability and promoted effective and
efficient use of consistency as a diagnostic aid.
Keywords: Consistency, surface lumps, thyroid nodules
Introduction
In the practice of teaching medicine, clinicians and medical students emphasize
the need to describe the consistency of whatever surface lump is felt on a
patient’s body (1-3). The consistency constitutes a datum that can help in the
diagnosis of the said lump.
A common problem experienced by the authors is the variability of interpretation
of consistency of surface lumps by certified physicians, both consultants and
residents, as well as medical students. This variability had been causing
confusion during medical conferences and problem in decision-making when the
consistency is being used as a diagnostic aid.
The objectives of this paper were 1) to assess the inter-observer agreement in
the interpretation of consistency of surface lumps using thyroid nodules as an
example and 2) to explore ways on how to reduce the variability.
Methods:
Patients with solitary thyroid nodule were used as these patients were readily
available in the hospital for group interpretation sessions. The thyroid nodules
were at least 2 cm in its greatest diameter.
Residents, interns, and medical clerks in the Department of Surgery were used as
interpreters.
Group sessions were conducted from January to June 2006. In each session, there
were 43 patients with solitary thyroid nodule and 10 residents, 4 interns, and 6
clinical clerks as interpreters.
Each session differed in the instructions of the interpretation of consistency.
In the first session, the interpreters were free to use any term they were
familiar with in the interpretation of consistency.
In the second session, they were instructed to use four words (cystic, soft,
firm, and hard) and they were given the operational definition of each of the
four adjectives. “Cystic” was to be used when the examiner felt the lump
contains fluid; it was dentable and compressible. “Soft,” when the examiner felt
the softness of the lower lip. “Firm”, when the examiner felt the firmness of
the tip of the nose. “Hard”, when the examiner felt the hardness of the bone.
In the third session, they were instructed to use two words (hard, non-hard) and
they were given the operational definition of these two adjectives. “Hard” was
to be used when the examiner felt the hardness of a bone. “Non-hard”, when the
examiner din not feel the hardness of a bone.
The answers in each session were compared and analyzed as to concurrence among
the interpreters, between residents and medical students (interns and medical
clerks), and between sessions. Statistical hypothesis testing was done using the
chi-square test and exact Fisher test, since the data contained only small
values, which was less than 5, for the concurrence rate in sessions 1 and 2
group.
Results:
Tables 1 and 2 showed variable adjectives used in describing consistency by the
residents and medical students respectively. The interpreters were free to use
any adjective they knew. The adjectives used were doughy, cystic, firm, soft and
hard. There was concurrence in only 2 patients. The concurrence rate was 4 per
cent among residents and none among medical students.
Tables 3 and 4 showed results in describing consistency by the residents and
medical students respectively after being instructed to use the following four
adjectives: cystic, soft, firm, and hard. There was concurrence in only 5
patients. The concurrence rate was 9 per cent among the residents and 2 per cent
among the medical students.
Table 5 and 6 showed results describing consistency by the residents and medical
students respectively after being instructed to use the two adjectives: hard and
non-hard. There was concurrence in 70 patients. The concurrence rate was 79 per
cent among the residents and 84 per cent among the medical students.
There was no significant difference between the patients examined by the
residents and the patients examined by the medical students. (Table 7)
The difference between the interpretation among residents and medical students
was not significant using chi-square test (p=0.15). (Table 8)
The difference of the concurrence rate between sessions was significant using
chi-square test (p< 0.001). There was no significant difference between session
1 and 2 (p= 0.247), but the difference in concurrence rates was significant if
session 3 was compared with either sessions 1 and 2 respectively (p< 0.001).
(Table 9)
Discussion:
If consistency of surface lumps is to be continued to be relied upon as an aid
for diagnosis, there must be improvement measures on its effective and efficient
usage. Based on the experience of authors, the absence of an objective gold
standard for consistency of lumps and the variability in its interpretation even
among certified physicians are hampering it effective and efficient usage. The
absence of a gold standard is the root cause of the problem. Does this mean
consistency should be totally disregarded as a diagnostic aid? The authors
believed that despite this limitation, interpretation of consistency of surface
lumps should be maintained because somehow it has diagnostic usefulness and it
is widely and globally used by all physicians. Improvement measures, however,
should be adopted.
The results of the three interpretation sessions in this paper illustrated the
problem virtually, its causes, and the improvement measures. Session 1
documented the problem as it exists in the present. Medical students and
graduates had been taught so many adjectives and possibly, variable ways of
interpretation of consistency of lumps. Session 2 and 3 limited the adjectives
to four and two respectively with an operational definition for each adjective.
The discrepancy in session two remained high while that in Session 3 dropped
significantly.
With this experience, the study showed that the use of hard and non-hard
categorization increased the chances of congruency among different interpreters.
Using hard and non-hard could also contribute to efficient use of consistency of
surface lumps as presently being used to discriminate between a malignant and a
non-malignant tumor. A hard consistency is being used as a basis to support
malignancy (4-9). A non-hard consistency could either be firm of soft. Since the
diagnostic implication of firm and soft consistency is the same, they can be
lumped together under the non-hard consistency.
This group session had been tried by the senior author before 1992 in another
institution involving consultants and residents (10). The results were the same
in terms of variability even among consultants and 80 per cent concurrence when
the hard and non-hard classification was used. With this experience, the authors
plan to disseminate this information for use in medical education and schools.
References:
1. Love RR, Clark CC, Douglas JA, Clinical Exams for Essential Cancer Medicine
Skills. Board of Reagents of the University of Wisconsin System. 2001. p15-21.
2. Baines CJ. Physical examination of the breasts in screening for breast
cancer. Journal of Gerontology. 1992; 47: 63-67.
3. Pennypacker HS, Pilgrim CA, Achieving competence in clinical breast
examination. Nurse Practitioner Forum. 1993; 4(2):85-90.
4. Bagasao M, Mijares PC, dela Pena AS, Liquete M, Laudico AV: A practical
approach to the diagnosis of thyroid cancer. Philipp J Surg Spec 1986;
41(2):56-60.
5. Joson, RO: Gross pathological diagnosis of the thyroid gland. Philipp J Surg
Spec 1986; 45(4):138-141.
6. Joson, RO, Manalang LR, Ramirez CG, Ick J, Avila JC, Abelardo AD: Thyroid
nodule aspiration: diagnostic usefulness and limitation. Philipp J Surg Spec
1989; 44(2): 45-5.
7. Laudico AV, Liquete MJ, Bagasao M, Mijares PC, dela Pena AS. Diagnosis of
thyroid malignancy. The role of clinical evaluation. Asian J Surg
1988;11(7):135-138.
8. Laudico AV, Eufemio GG, Liquete MJ. Clinical Manifestation of thyroid
carcinoma among Filipinos. Philipp J Surg Spec 1979; 34(3):159-167.
9. Philippine College of Surgeons Scientific Publication, Cancer Treatment
Guidelines, 2nd ed. 1994(6).
10. Layug RT, Joson RO, dela Pena AS, Cabaluna ND: Reliability and accuracy in
the interpretation of consistency of surface lumps (thyroid nodules).1992;
unpublished observations.
Table 8. Comparison of Concurrence of Interpretation Among Residents and Medical Students.
|
|
Session 1 |
Session 2 |
Session 3 |
|
Residents |
2 (4%) |
4 (9%) |
34 (79%) |
|
Medical Students |
0
|
1 (2%) |
36 (84%) |
|
Total Combined |
2 (2%) |
5 (6%) |
70 (81%) |
p = 0.154
Table 9. Comparison of Concurrence Rate Between Sessions of Interpretation.
|
|
Session 1 |
Session 2 |
Session 3 |
|
Concurrence Combined Residents and Medical Students |
2 (2%) |
5 (6%) |
70 (81%) |
|
No Concurrence Combined Residents and Medical Students |
84 (98%) |
81 (94%) |
16 (19%) |
p = 0.247 (between session 1 and session 2)
p < 0.001 (between session 1 and session 3)
p < 0.001 (between session 2 and session 3)
Table 10. Comparison of Concurrence Rate Among Residents Between Sessions of Interpretation.
|
|
Session 1 |
Session 2 |
Session 3 |
|
Concurrence of Residents |
2 (4%) |
4 (9%) |
34 (79%) |
|
No Concurrence of Residents |
41 (96%) |
39 (91%) |
9 (21%) |
p = 0.397 (between session 1 and session 2)
p < 0.001 (between session 1 and session 3)
p < 0.001 (between session 2 and session 3)
Table 11. Comparison of Concurrence Rate Among Medical Students Between Sessions of Interpretation.
|
|
Session 1 |
Session 2 |
Session 3 |
|
Concurrence of Medical Students |
0 (0%) |
1 (2%) |
36 (84%) |
|
No Concurrence of Medical Students |
43 (100%) |
42 (98%) |
7 (16%) |
p = 0.314 (between session 1 and session 2)
p < 0.001 (between session 1 and session 3)
p < 0.001 (between session 2 and session 3)