My Research

 

Reliability and Accuracy in the
Interpretation of Consistency of Surface Lumps
(Thyroid Nodules)
 

Abstract:
Background: Variability in the interpretation of consistency of surface lumps is being observed during medical conferences. Objectives: To assess the inter-observer agreement in the interpretation of consistency of surface lumps using thyroid nodule as an example an to explore ways on how to reduce the variability. Methods: Prospective study with group interpretation involving 43 patients with thyroid nodules as subjects per session. Interpreters consisted of 10 residents, 4 interns, and 6 medical clerks. Session 1: interpreters were free to use any adjective to describe consistency. Session 2: interpreters instructed to use four adjectives – cystic, soft, firm, and hard. Session 3: interpreters instructed to use two adjectives – hard and non-hard. Results: Session 1 and 2 concurrence was seen in only 2 and 5 patients respectively. Session 3 showed 79 per cent concurrence among the residents and 84 per cent among the medical students. The difference between the interpretation among residents and medical students was not significant using chi-square test (p = 0.15). However, the difference of the concurrence rate between sessions 1, 2 and 3 was significant (p < 0.001). Conclusion: With the absence of standardization and when more than two adjectives were used, great variability in interpretation of consistency of surface lumps among physicians occurs. The study showed that the use of the hard and non-hard categorization reduced variability and promoted effective and efficient use of consistency as a diagnostic aid.

Keywords: Consistency, surface lumps, thyroid nodules

Introduction
In the practice of teaching medicine, clinicians and medical students emphasize the need to describe the consistency of whatever surface lump is felt on a patient’s body (1-3). The consistency constitutes a datum that can help in the diagnosis of the said lump.

A common problem experienced by the authors is the variability of interpretation of consistency of surface lumps by certified physicians, both consultants and residents, as well as medical students. This variability had been causing confusion during medical conferences and problem in decision-making when the consistency is being used as a diagnostic aid.

The objectives of this paper were 1) to assess the inter-observer agreement in the interpretation of consistency of surface lumps using thyroid nodules as an example and 2) to explore ways on how to reduce the variability.

Methods:

Patients with solitary thyroid nodule were used as these patients were readily available in the hospital for group interpretation sessions. The thyroid nodules were at least 2 cm in its greatest diameter.

Residents, interns, and medical clerks in the Department of Surgery were used as interpreters.

Group sessions were conducted from January to June 2006. In each session, there were 43 patients with solitary thyroid nodule and 10 residents, 4 interns, and 6 clinical clerks as interpreters.

Each session differed in the instructions of the interpretation of consistency.

In the first session, the interpreters were free to use any term they were familiar with in the interpretation of consistency.

In the second session, they were instructed to use four words (cystic, soft, firm, and hard) and they were given the operational definition of each of the four adjectives. “Cystic” was to be used when the examiner felt the lump contains fluid; it was dentable and compressible. “Soft,” when the examiner felt the softness of the lower lip. “Firm”, when the examiner felt the firmness of the tip of the nose. “Hard”, when the examiner felt the hardness of the bone.

In the third session, they were instructed to use two words (hard, non-hard) and they were given the operational definition of these two adjectives. “Hard” was to be used when the examiner felt the hardness of a bone. “Non-hard”, when the examiner din not feel the hardness of a bone.

The answers in each session were compared and analyzed as to concurrence among the interpreters, between residents and medical students (interns and medical clerks), and between sessions. Statistical hypothesis testing was done using the chi-square test and exact Fisher test, since the data contained only small values, which was less than 5, for the concurrence rate in sessions 1 and 2 group.

Results:

Tables 1 and 2 showed variable adjectives used in describing consistency by the residents and medical students respectively. The interpreters were free to use any adjective they knew. The adjectives used were doughy, cystic, firm, soft and hard. There was concurrence in only 2 patients. The concurrence rate was 4 per cent among residents and none among medical students.

Tables 3 and 4 showed results in describing consistency by the residents and medical students respectively after being instructed to use the following four adjectives: cystic, soft, firm, and hard. There was concurrence in only 5 patients. The concurrence rate was 9 per cent among the residents and 2 per cent among the medical students.

Table 5 and 6 showed results describing consistency by the residents and medical students respectively after being instructed to use the two adjectives: hard and non-hard. There was concurrence in 70 patients. The concurrence rate was 79 per cent among the residents and 84 per cent among the medical students.

There was no significant difference between the patients examined by the residents and the patients examined by the medical students. (Table 7)

The difference between the interpretation among residents and medical students was not significant using chi-square test (p=0.15). (Table 8)

The difference of the concurrence rate between sessions was significant using chi-square test (p< 0.001). There was no significant difference between session 1 and 2 (p= 0.247), but the difference in concurrence rates was significant if session 3 was compared with either sessions 1 and 2 respectively (p< 0.001). (Table 9)

Discussion:
If consistency of surface lumps is to be continued to be relied upon as an aid for diagnosis, there must be improvement measures on its effective and efficient usage. Based on the experience of authors, the absence of an objective gold standard for consistency of lumps and the variability in its interpretation even among certified physicians are hampering it effective and efficient usage. The absence of a gold standard is the root cause of the problem. Does this mean consistency should be totally disregarded as a diagnostic aid? The authors believed that despite this limitation, interpretation of consistency of surface lumps should be maintained because somehow it has diagnostic usefulness and it is widely and globally used by all physicians. Improvement measures, however, should be adopted.

The results of the three interpretation sessions in this paper illustrated the problem virtually, its causes, and the improvement measures. Session 1 documented the problem as it exists in the present. Medical students and graduates had been taught so many adjectives and possibly, variable ways of interpretation of consistency of lumps. Session 2 and 3 limited the adjectives to four and two respectively with an operational definition for each adjective. The discrepancy in session two remained high while that in Session 3 dropped significantly.

With this experience, the study showed that the use of hard and non-hard categorization increased the chances of congruency among different interpreters.

Using hard and non-hard could also contribute to efficient use of consistency of surface lumps as presently being used to discriminate between a malignant and a non-malignant tumor. A hard consistency is being used as a basis to support malignancy (4-9). A non-hard consistency could either be firm of soft. Since the diagnostic implication of firm and soft consistency is the same, they can be lumped together under the non-hard consistency.

This group session had been tried by the senior author before 1992 in another institution involving consultants and residents (10). The results were the same in terms of variability even among consultants and 80 per cent concurrence when the hard and non-hard classification was used. With this experience, the authors plan to disseminate this information for use in medical education and schools.




References:

1. Love RR, Clark CC, Douglas JA, Clinical Exams for Essential Cancer Medicine Skills. Board of Reagents of the University of Wisconsin System. 2001. p15-21.
2. Baines CJ. Physical examination of the breasts in screening for breast cancer. Journal of Gerontology. 1992; 47: 63-67.
3. Pennypacker HS, Pilgrim CA, Achieving competence in clinical breast examination. Nurse Practitioner Forum. 1993; 4(2):85-90.
4. Bagasao M, Mijares PC, dela Pena AS, Liquete M, Laudico AV: A practical approach to the diagnosis of thyroid cancer. Philipp J Surg Spec 1986; 41(2):56-60.
5. Joson, RO: Gross pathological diagnosis of the thyroid gland. Philipp J Surg Spec 1986; 45(4):138-141.
6. Joson, RO, Manalang LR, Ramirez CG, Ick J, Avila JC, Abelardo AD: Thyroid nodule aspiration: diagnostic usefulness and limitation. Philipp J Surg Spec 1989; 44(2): 45-5.
7. Laudico AV, Liquete MJ, Bagasao M, Mijares PC, dela Pena AS. Diagnosis of thyroid malignancy. The role of clinical evaluation. Asian J Surg 1988;11(7):135-138.
8. Laudico AV, Eufemio GG, Liquete MJ. Clinical Manifestation of thyroid carcinoma among Filipinos. Philipp J Surg Spec 1979; 34(3):159-167.
9. Philippine College of Surgeons Scientific Publication, Cancer Treatment Guidelines, 2nd ed. 1994(6).
10. Layug RT, Joson RO, dela Pena AS, Cabaluna ND: Reliability and accuracy in the interpretation of consistency of surface lumps (thyroid nodules).1992; unpublished observations.


 

Table 8. Comparison of Concurrence of Interpretation Among Residents and Medical Students.

 

 

Session 1

 

Session 2

 

Session 3

Residents

2

(4%)

4

(9%)

34

(79%)

Medical Students

0

 

1

(2%)

36

(84%)

Total

Combined

2

(2%)

5

(6%)

70

(81%)

p = 0.154


 

Table 9.  Comparison of Concurrence Rate Between Sessions of Interpretation.

 

 

Session 1

 

Session 2

 

Session 3

Concurrence

Combined Residents and Medical Students

2

(2%)

5

(6%)

70

(81%)

No Concurrence

Combined Residents and Medical Students

84

(98%)

81

(94%)

16

(19%)

p = 0.247 (between session 1 and session 2)

p < 0.001 (between session 1 and session 3)

p < 0.001 (between session 2 and session 3)


 

Table 10.  Comparison of Concurrence Rate Among Residents Between Sessions of Interpretation.

 

 

Session 1

 

Session 2

 

Session 3

Concurrence of

Residents

2

(4%)

4

(9%)

34

(79%)

No Concurrence of Residents

41

(96%)

39

(91%)

9

(21%)

p = 0.397 (between session 1 and session 2)

p < 0.001 (between session 1 and session 3)

p < 0.001 (between session 2 and session 3)


 

Table 11.  Comparison of Concurrence Rate Among Medical Students Between Sessions of Interpretation.

 

 

Session 1

 

Session 2

 

Session 3

Concurrence of

Medical Students

0

(0%)

1

(2%)

36

(84%)

No Concurrence of Medical Students

43

(100%)

42

(98%)

7

(16%)

p = 0.314 (between session 1 and session 2)

p < 0.001 (between session 1 and session 3)

p < 0.001 (between session 2 and session 3)