AI-Driven k-Prototypes Clustering Reveals Thyroid Status, Prolactin, and Ovarian Morphology as Dominant Axes of PCOS Heterogeneity in IVF Patients

Authors

1 General Physician, Infectious Diseases Research Center, Birjand University of Medical Sciences, Birjand, Iran.

2 Department Nuclear Medicine, Nuclear Medicine and Molecular Imaging Research Center, Namazi Teaching Hospital, Shiraz University of Medical Sciences, Shiraz, Iran.

3 Department of Industrial Engineering, Faculty of Engineering, College of Farabi, University of Tehran, Tehran, Iran.

4 Assistant Professor of Obstetrics & Gynecology, Fellowship of Infertility, Supporting the Family and the Youth of Population Research Core, Department of Obstetrics and Gynecology, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.

10.22038/ijogi.2026.27609

Abstract

Introduction: Polycystic ovary syndrome (PCOS) shows marked clinical and biological variety. Although the Rotterdam criteria organize patients into phenotypes A–D, many presentations fall between these labels. We explored this hidden diversity in women undergoing in-vitro fertilization by clustering them with a mixed-data algorithm that emphasized prolactin, thyroid function and detailed ovarian-ultrasound findings.
Methods: Women attending a single infertility center who met the Rotterdam definition of PCOS and were scheduled for IVF were included. Standard clinical data, fasting biochemical panels and transvaginal sonography were recorded. The feature set consisted of serum prolactin, categorical diagnosis of hypo- or hyper-thyroidism derived from serum thyroid-stimulating hormone (TSH), and left-ovary follicle number and volume, alongside age, gonadotrophins (follicle-stimulating hormone and luteinizing hormone), metabolic markers (fasting glucose, total cholesterol, triglycerides), liver enzymes (aspartate and alanine aminotransferases), serum 25-hydroxy-vitamin D, body-mass index, menstrual pattern, hirsutism score and bilateral ovarian morphology. A k-prototypes algorithm (five initializations, Cao seeding, γ=2.178, 1000 iterations) was run with k = 5.
Results: The k-prototypes algorithm successfully separated the 516 women into five distinct and clinically coherent subgroups. The quality of the clustering was quantitatively validated by several internal metrics: the K-Prototypes Cost was minimized at 191.02. Cluster cohesion and separation were further supported by a high Silhouette Score of 0.81, a low Davies-Bouldin Index of 0.47, and a high Calinski-Harabasz Index of 596.09. Cluster 0 (n = 346, 67 %) represented the “typical” picture: almost universal bilateral polycystic ovaries, moderate hirsutism and infrequent menses, together with an unexpectedly high burden of thyroid disease (99.7 % abnormal; mean TSH 2.14 µIU/mL); most matched Rotterdam phenotype A. Cluster 1 (n = 16, 3 %) was distinguished by overt hypothyroidism (mean TSH 6.3 µIU/mL), severe hypertriglyceridemia (273 mg/dL) and the highest hirsutism scores, but showed inconsistent ovarian morphology; patients mapped chiefly to phenotypes A and B. Cluster 2 (n = 65, 13 %) consisted of women with pronounced oligo-anovulation (72 %) and hirsutism (80 %) but no thyroid dysfunction (mean TSH 2.15 µIU/mL) and often normal or unilateral ovarian appearance; it aligned with phenotype B and had the lightest metabolic profile. Cluster 3 (n = 57, 11 %) combined bilateral polycystic ovaries and the greatest menstrual disturbance (77 %) with universal thyroid abnormality (mean TSH 3.26 µIU/mL) and mild transaminase elevation, hinting at early hepatic stress; two-thirds were phenotype A. Cluster 4 (n = 32, 6 %) centered on marked hyperprolactinemia (mean 430 ng/mL); most women also had polycystic ovaries and cycle irregularity, while 84 % showed normal thyroid tests, giving this group a distinct endocrine driver that cut across Rotterdam categories.
Conclusion: By concentrating on prolactin concentration, thyroid status and ovarian ultrasonography, k-prototypes clustering divided IVF-seeking women with PCOS into five reproducible clinical profiles that transcend the Rotterdam scheme. The data emphasize three dominant axes of heterogeneity—thyroid dysfunction, hyperprolactinemia and metabolic-hepatic stress—and suggest that routine evaluation of these features could guide more individualized lifestyle, endocrine and metabolic interventions.

Keywords