Evidence-based advanced prompt engineeringin nursing research: quality analysisof ChatGPT-generated Boolean search query

Joanna Gotlib-Małkowska; Ilona Cieślak; Mariusz Jaworski; Mariusz Panczyk

doi:10.12923/pielxxiw-2025-0002

Autor

Joanna Gotlib-Małkowska Department of Education and Research in Health Sciences, Faculty of Health Sciences, Medical University of Warsaw, Polska Autor https://orcid.org/0000-0002-2717-7741
Ilona Cieślak Department of Education and Research in Health Sciences, Faculty of Health Sciences, Medical University of Warsaw, Polska Autor https://orcid.org/0000-0001-7752-6527
Mariusz Jaworski Department of Education and Research in Health Sciences, Faculty of Health Sciences, Medical University of Warsaw, Polska Autor https://orcid.org/0000-0002-5207-8323
Mariusz Panczyk Department of Education and Research in Health Sciences, Faculty of Health Sciences, Medical University of Warsaw, Polska Autor https://orcid.org/0000-0003-1830-2114

DOI:

https://doi.org/10.12923/pielxxiw-2025-0002

Słowa kluczowe:

nursing research, artificial intelligence (AI), ChatGPT, Large Language Model (LLM), Boolean search query

Abstrakt

OPARTA NA DOWODACH ZAAWANSOWANA INŻYNIERIA ZAPYTAŃ W BADANIACH PIELĘGNIARSKICH: ANALIZA JAKOŚCI ZAAWANSOWANEJ STRATEGII WYSZUKIWANIA BOOLE’A GENEROWANEJ PRZEZ ChatGPT

Cel pracy. W artykule zbadano możliwość wykorzystania zaawansowanej inżynierii podpowiedzi w badaniach z obszaru pielęgniarstwa, ze szczególnym uwzględnieniem zapytań Boole’a (BSQ) generowanych przez ChatGPT.

Materiał i metody. W badaniu porównano skuteczność różnych modeli ChatGPT: ChatGPT-3.5, ChatGPT-4.0 i ChatGPT-4omni, w generowaniu wysokiej jakości zapytań BSQ dla bazy PUBMED. Analizowane metody podpowiedzi obejmowały Zero-Shot, Automated Chain-Of-Thought, Emotional Stimuli, Role-play i Mixed-Methods prompting.

Wyniki. Badanie wykazało, że ChatGPT-4omni, przy wykorzystaniu podpowiedzi Mixed-Methods, osiągnął najwyższą jakość udzielanych odpowiedzi, podczas gdy ChatGPT-3.5, wykorzystujący podpowiedzi zero-shot, jest najmniej skuteczny. Zaobserwowano znaczną zmienność wyników wyszukiwania w różnych modelach i metodach podpowiadania. Autorzy zalecają ChatGPT-4omni jako najskuteczniejszy model do generowania BSQ.

Wnioski. Badanie podkreśla brak wystandaryzowanych metod inżynierii podpowiedzi w badaniach naukowych, co komplikuje wykorzystanie dużych modeli językowych, takich jak ChatGPT oraz wskazuje potencjał ChatGPT do automatyzacji przygotowywania przeglądów systematycznych i opracowywania strategii wyszukiwania w badaniach z obszaru pielęgniarstwa. Pomimo, że ChatGPT okazał się cenny w generowaniu terminów i synonimów, często ma trudności z tworzeniem w pełni dokładnych BSQ. Autorzy argumentują za wykorzystaniem najnowszych modeli ChatGPT, wraz z zaawansowanymi technikami inżynierii podpowiedzi, do zadań naukowych. Zaleca się także prowadzenie dalszych badań w celu udoskonalenia i standaryzacji metod inżynierii podpowiedzi w badaniach z obszaru pielęgniarstwa.

Biogram autora

Joanna Gotlib-Małkowska - Department of Education and Research in Health Sciences, Faculty of Health Sciences, Medical University of Warsaw, Polska

Ilona CieślakB,D-E,K , Mariusz JaworskiE,G , Mariusz PanczykA

Bibliografia

1. Imran M, Almusharraf N. Analyzing the role of ChatGPT as a writing assistant at higher education level: A systematic review of the literature. Cont. Ed. Technology. 2023; 15(4): ep464. https://doi.org/10.30935/cedtech/13605.

2. Chen Q, Sun H, Liu H, et al. An extensive benchmark study on biomedical text generation and mining with ChatGPT. Bioinformatics. 2023; 39(9): btad557. https://doi.org/10.1093%2Fbioinformatics%2Fbtad557

3. Islam I, Islam MN, et al. Opportunities and Challenges of ChatGPT in Academia: A Conceptual Analysis. Authorea (preprint). 2023. https://doi.org/10.22541/au.167712329.97543109/v1

4. Biswas SS. Role of Chat GPT in Public Health. Ann. Bomed. Eng. 2023; 51(5): 868-869. https://doi.org/10.1007/s10439-023-03172-7

5. Hobensack M, von Gerich H, Vyas P, et al. A rapid review on current and potential uses of large language models in nursing. Int. J. Nurs. Stud. 2024; 154: 104753. https://doi.org/10.1016/j.ijnurstu.2024.104753

6. Howard FM, Li A, Riffon M, et al. Artificial intelligence (AI) content detection in ASCO scientific abstracts from 2021 to 2023. J. Clin. Oncol. 2024; 42(16_suppl). https://doi.org/10.1200/JCO.2024.42.16_suppl.1565

7. Gao CA, Howard FM, Markov NS, et al. Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digit. Med. 2023; 6(1): 75. https://doi.org/10.1038/s41746-023-00819-6

8. Alaniz L, Vu C, Pfaff MJ. The Utility of artificial intelligence for systematic reviews and Boolean query formulation and translation. Plast. Reonstr. Surg. Glob. Open. 2023; 11(10): e5339. https://doi.org/10.1097%2FGOX.0000000000005339

9. Wang S, Scells H, Koopman B, et al. Generating natural language queries for more effective systematic review screening prioritisation. In: SIGIR-AP 2023 – Proceedings of the Annual International ACM SIGIR conference on research and development in information retrieval in the Asia Pacific Region; 2023, p. 73-83. https://doi.org/10.1145/3624918.3625322

10. Khraisha Q, Put S, Kappenberg J, et al. Can large language models replace humans in systematic reviews? Evaluating GPT-4’s efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages. Res. Synth. Methods. 2024; 15(4): 616-626. https://doi.org/10.1002/jrsm.1715

11. Haque S, Eberhart Z, Bansal A, et al. Semantic Similarity Metrics for Evaluating Source Code Summarization. ICPS ’22: Proceedings of the 30th IEEE/AMC International Conference on Program Comprehension. 2022: 36-47. https://doi.org/10.1145/3524610.3527909

12. Branum C, Schiavenato M. Can ChatGPT accurately answer a PICOT question? Assessing AI response to a clinical question. Nurse Educ. 2023; 48(5): 231-233. https://doi.org/10.1097/nne.0000000000001436

13. Boudin F, Nie JY, Dawes M. Human clinical information retrieval using document and PICO structure. [In:] Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Los Angeles, California; 2010, p. 822-830. https://aclanthology.org/N10-1124

14. Levin G, Pareja R, Viveros-Carreño D, et al. Association of reviewer experience with discriminating human-written versus ChatGPT-written abstracts. Int. J. Gynecol. Cancer. 2024; 34(5): 669-674. https://doi.org/10.1136/ijgc-2023-005162

15. Makiev KG. A Study on distinguishing ChatGPT-Generated and human-written orthopaedic abstracts by reviewers: decoding the discrepancies. Cureus. 2023; 15(11): e9166. https://doi.org/10.7759/cureus.49166

16. Meskó B. Prompt engineering as an important emerging skill for medical professionals: Tutorial. J. Med. Internet Res. 2023; 25: e50638. https://doi.org/10.2196/50638

17. Sahoo P, Singh AK, Saha S, et al. A systematic survey of prompt engineering in large language models: techniques and applications. arXivLabs (preprint). 2024. https://doi.org/10.48550/arXiv.2402.07927

18. Rahman M, Terano HJR, Rahman N, et al. ChatGPT and academic research: a review and recommendations based on practical examples. Journal of Education, Management and Development Studies. 2023; 3(1): 1-12. https://doi.org/10.52631/jemds.v3i1.175

19. Giray L. Prompt engineering with ChatGPT: A guide for academic writers. Ann. Biomed. Eng. 2023; 51(12): 2629-2633. https://doi.org/10.1007/s10439-023-03272-4

20. Cieślak I, Panczyk M, Jaworski M, et al. Access to information on the requirements to work as a nurse in Poland, provided to Ukrainian refugee background nurses by nursing self-government institutions. Word Wide Web Content Analysis. Pielęgniarstwo XXI Wieku. 2023; 22(3):132-138. https://doi.org/10.2478/pielxxiw-2023-0023

21. Cieślak I, Jaworski M, Panczyk M, et al. Multicultural personality profiles and nursing student attitudes towards refugee healthcare workers: A national, multi-institutional cross-sectional study. Nurse Educ. Today. 2024; 134: 106094. https://doi.org/10.1016/j.nedt.2024.106094

22. Gotlib J, Cieślak I, Wawrzuta D, et al. Challenges in job seeking and the integration of Ukrainian War refugee healthcare workers into the Polish Healthcare System: Facebook content analysis. Int. J. Public Health. 2023; 68: 1606139. https://doi.org/10.3389/ijph.2023.1606139

23. Sallam M, Barakat M, et al. A Preliminary Checklist (METRICS) to standardize the design and reporting of studies on generative artificial intelligence–based models in health care education and practice: development study involving a literature review. Interact J. Med. Res. 2024; 13: e54704. https://doi.org/10.2196/54704

24. OpenAI. (2023). ChatGPT [Large language model]. https://www.openai.com/chatgpt

25. Korzynski P, Mazurek G, Krzypkowska P, et al. Artificial intelligence prompt engineering as a new digital competence: Analysis of generative AI technologies such as ChatGPT. Entrepreneurial Business and Economics Review. 2023; 11(3): 25-37. http://dx.doi.org/10.15678/EBER.2023.110302

26. Xie T, Li Q, Zhang J, et al. Empirical study of Zero-Shot NER with ChatGPT. [In:] Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, p. 7935-7956,. Association for Computational Linguistics. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.493

27. Kong A, Zhao S, Chen H, et al. Better Zero-Shot reasoning with role-play Prompting. arXivLabs (preprint). 2023. https://doi.org/10.48550/arXiv.2308.07702

28. Li G, Wang P, Ke W. Revisiting large language models as Zero-shot relation extractors. arXivLabs (preprint). 2023. https://doi.org/10.48550/arXiv.2310.05028

29. Zhu Z, Cheng X, An H, et al. Zero-Shot Spoken language understanding via large language models: a preliminary study. [In:] Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Torino, Italia. ELRA and ICCL; 2024, p. 17877-17883.

30. Wei X, Cui X, Cheng N, et al. ChatIE: Zero-Shot information extraction via chatting with ChatGPT. arXivLabs (preprint). 2023. https://doi.org/10.48550/arXiv.2302.10205

31. Salewski L, Alaniz S, Rio-Torto I, et al. In-context impersonation reveals large language models’ strengths and biases. arXivLabs (preprint). 2023. https://doi.org/10.48550/arXiv.2305.14930

32. Chen J, Wang X, Xu R, et al. From persona to personalization: a survey on role-playing language agents. arXivvLabs (preprint). 2024. https://doi.org/10.48550/arXiv.2404.18231

33. Zheng M, Pei J, Jurgens D. Is “A Helpful Assistant” the best role for large language models? A systematic evaluation of social roles in system prompts. arXivLabs (preprint). 2023. https://ui.adsabs.harvard.edu/link_gateway/2023arXiv231110054Z/doi:10.48550/arXiv.2311.10054

34. Zhang Z, Zhang A, Li M, et al. Automatic chain of thought prompting in large language models. arXivLabs (preprint). 2022. https://doi.org/10.48550/arXiv.2210.03493

35. Wei J, Wang X, Schuurmans D, et al. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ‚22). Curran Associates Inc. Red Hook. NY, USA; Article 1800, 24824-24837.

36. Li C, Wang J, Zhang Y, et al. Large language models understand and can be enhanced by emotional stimuli. arXivLabs (preprint). 2023. https://doi.org/10.48550/arXiv.2307.11760

37. O’Connor S, Peltonen LM, Topaz M, et al. Prompt engineering when using generative AI in nursing education. Nurse Educ. Pract. 2024; 74: 103825. https://doi.org/10.1016/j.nepr.2023.103825

38. Sun GH. Prompt engineering for nurse educators. Nurse Educ. 2024. https://doi.org/10.1097/nne.0000000000001705

39. Labrague LJ, Aguilar-Rosales R, Yboa BC, et al. Student nurses’ attitudes, perceived utilization, and intention to adopt artificial intelligence (AI) technology in nursing practice: A cross-sectional study. Nurse Educ. Pract. 2023; 73: 103815. https://doi.org/10.1016/j.nepr.2023.103815

40. Lin HL, Liao LL, Wang YN, et al. Attitude and utilization of ChatGPT among registered nurses: A cross-sectional study. Int. Nurs. Rev. 2024;1-10. https://doi.org/10.1111/inr.13012

41. Woodnutt S, Allen C, Snowden J, et al. Could artificial intelligence write mental health nursing care plans? J. Psychiatr. Ment. Health Nurs. 2024; 31(1): 79-86. https://doi.org/10.1111/jpm.12965

42. Hara K, Tachibana R, Kumashiro R, et al. Sentiment analysis of operating room nurses in acute care hospitals in Japan: unveiling passion for perioperative nursing using ChatGPT. Research Square (preprint). 2024; https://doi.org/10.21203/rs.3.rs-4505331/v1

43. Wang T, Mu J, Chen J, et al. Comparing ChatGPT and clinical nurses’ performances on tracheostomy care: A cross-sectional study. Int. J. Nurs. Stud. Adv. 2024; 6: 100181. https://doi.org/10.1016/j.ijnsa.2024.100181

44. Levin C, Kagan T, Rosen S, et al. An evaluation of the capabilities of language models and nurses in providing neonatal clinical decision support. Int. J. Nurs. Stud. 2024; 155: 104771. https://doi.org/10.1016/j.ijnurstu.2024.104771

45. Nashwan AJ, Bani Hani S. Enhancing oncology nursing care planning for patients with cancer through Harnessing large language models. Asia Pac. J. Oncol. Nurs. 2023; 10(9): 100277. https://doi.org/10.1016%2Fj.apjon.2023.100277

46. He FX, Fanaian M, Zhang NM, et al. Academic dishonesty in university nursing students: A scoping review. Int. J. Nurs. Stud. 2024; 154: 104752. https://doi.org/10.1016/j.ijnurstu.2024.104752

47. Alshami A, Elsayed M, Ali E, et al. Harnessing the power of ChatGPT for automating systematic review process: methodology, case study, limitations, and future directions. Systems. 2023; 11(7): 351. https://doi.org/10.3390/systems11070351

48. Qureshi R, Shaughnessy D, Gill KAR, et al. Are ChatGPT and large language models “the answer” to bringing us closer to systematic review automation? Systematic Reviews. 2023; 12: 72. http://dx.doi.org/10.1186/s13643-023-02243-z

49. Sernizon Guimarães N, Joviano-Santos JV, Reis MG, et al. Development of search strategies for systematic reviews in health using ChatGPT: a critical analysis. J. Transl. Med. 2024; 22: 1. https://doi.org/10.1186%2Fs12967-023-04371-5

50. Dwivedi YK, Kshetri N, Hughes L, et al. “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int. J. Inf. Manage. 2023; 71: 102642. https://doi.org/10.1016/j.ijinfomgt.2023.102642

51. Kurian N, Cherian JM, Cherian KK, et al. AI-assisted Boolean search. Br. Dent J. 2023; 235(6): 363. https://doi.org/10.1038/s41415-023-6345-0

52. Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. arXivLabs (preprint). 2020. https://doi.org/10.48550/arXiv.2005.14165

53. An J, Ding WD, Lin C. ChatGPT tackle the growing carbon footprint of generative AI . Nature. 2023; 615(7953): 586-586. http://dx.doi.org/10.1038/d41586-023-00843-2

54. Huising MO, Aron AR. US universities must tackle their huge carbon footprints. Nature. 2023; 623(7985): 32. https://doi.org/10.1038/d41586-023-03348-0

55. Khan IA, Paliwal NW. “ChatGPT and Digital Inequality: A Rising Concern.” Scholars Journal of Applied Medical Sciences. 2023; 11(09): 1646-1647. http://dx.doi.org/10.36347/sjams.2023.v11i09.010.

Oparta na dowodach zaawansowana inżynieria zapytań w badaniach pielęgniarskich:Analiza jakości zaawansowanej strategii wyszukiwania Boole’a generowanej przez ChatGPT

Autor

DOI:

Słowa kluczowe:

Abstrakt

Biogram autora

Bibliografia

Pobrania

Opublikowane

Numer

Dział

Licencja

Jak cytować

sidebar

Język / Language

dofinansowanie