Article Contents
REVIEW   Open Access     Cite

A Chinese Lunar New Year countdown to trustworthy medical research: Eight traditions, eight statistical checkpoints

    Show all affliationsShow less
More Information
  • Corresponding author: glxfgsh@163.com 
  • DownLoad: Full size image
    1. Linking eight Chinese Lunar New Year customs to key research principles makes statistical rigor memorable.

      Eight festive traditions, from clarifying question to interpreting reports, guide more trustworthy research.

      The eight-day Spring Festival countdown offers a cultural roadmap for rigorous research.

      Blending cultural narrative with statistical guidance to help researchers avoid pitfalls and boost credibility.

  • This article offers a practical guide to strengthening medical and public health research by pairing eight familiar rituals from the northern Chinese Lunar New Year countdown with eight research principles. Using an accessible cultural narrative, we emphasize sticking to a clearly stated research aim and estimand to avoid aim-drift and data dredging; auditing and cleaning data in a fully traceable, code-based workflow; handling variable derivation, coding, nonlinearity, and interactions in ways that preserve information and support interpretation; ensuring adequate and representative sampling aligned with the study purpose; addressing missing data and outliers with explicit assumptions, transparent primary analyses, and sensitivity analyses; choosing graphics that reveal distributions and uncertainty and using diagnostic plots to check modeling assumptions; validating prediction models through resampling and, where possible, external validation with attention to discrimination, calibration, and clinical utility; and communicating findings with appropriate caution by focusing on effect sizes, uncertainty, and the limits of observational inference. Together, these eight checkpoints translate best practices in design, analysis, and reporting into concrete steps intended to reduce avoidable errors and improve the trustworthiness of research based on real-world data.
  • 加载中
  • [1] Suchak T., Aliu A. E., Harrison C., et al. (2025). Explosion of formulaic research articles, including inappropriate study designs and false discoveries, based on the NHANES US national health database. PLoS Biol. 23:e3003152. DOI:10.1371/journal.pbio.3003152

    View in Article CrossRef Google Scholar

    [2] Feng G., Zhao Y., Yan F., et al. (2025). Escaping the data misuse maze: Reorienting medical research toward clinical needs. Innov. Med. 3:100153. DOI:10.59717/j.xinn-med.2025.100153

    View in Article CrossRef Google Scholar

    [3] Kahan B. C., Hindley J., Edwards M., et al. (2024). The estimands framework: A primer on the ICH E9(R1) addendum. BMJ 384:e076316. DOI:10.1136/bmj-2023-076316

    View in Article CrossRef Google Scholar

    [4] Van den Broeck J., Cunningham S. A., Eeckels R., et al. (2005). Data cleaning: Detecting, diagnosing, and editing data abnormalities. PLoS Med. 2:e267. DOI:10.1371/journal.pmed.0020267

    View in Article CrossRef Google Scholar

    [5] Pilowsky J. K., Elliott R. and Roche M. A. (2024). Data cleaning for clinician researchers: Application and explanation of a data-quality framework. Aust. Crit. Care 37:827−833. DOI:10.1016/j.aucc.2024.03.004

    View in Article CrossRef Google Scholar

    [6] Dziadkowiec O., Callahan T., Ozkaynak M., et al. (2016). Using a data quality framework to clean data extracted from the electronic health record: A case study. EGEMS (Wash DC) 4:1201. DOI:10.13063/2327-9214.1201

    View in Article CrossRef Google Scholar

    [7] Asher J., Resnick D., Brite J., et al. (2020). An introduction to probabilistic record linkage with a focus on linkage processing for WTC registries. Int. J. Environ. Res. Public Health 17:6937. DOI:10.3390/ijerph17186937

    View in Article CrossRef Google Scholar

    [8] Ciccione L., Dehaene G. and Dehaene S. (2023). Outlier detection and rejection in scatterplots: Do outliers influence intuitive statistical judgments. J. Exp. Psychol. Hum. Percept. Perform. 49:129−144. DOI:10.1037/xhp0001065

    View in Article CrossRef Google Scholar

    [9] Nakayama Y., Yata K. and Aoshima M. (2024). Test for high-dimensional outliers with principal component analysis. Japanese Journal of Statistics and Data Science 7:739−766. DOI:10.1007/s42081-024-00255-0

    View in Article CrossRef Google Scholar

    [10] Lakshmi R. and Sajesh T. A. (2025). A robust distance-based approach for detecting multidimensional outliers. J. Appl. Stat. 52:1278−1298. DOI:10.1080/02664763.2024.2422403

    View in Article CrossRef Google Scholar

    [11] Weiskopf N. G. and Weng C. (2013). Methods and dimensions of electronic health record data quality assessment: Enabling reuse for clinical research. J. Am. Med. Inform. Assoc. 20:144−151. DOI:10.1136/amiajnl-2011-000681

    View in Article CrossRef Google Scholar

    [12] Collins G. S., Reitsma J. B., Altman D. G., et al. (2015). Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. BMJ 350:g7594. DOI:10.1136/bmj.g7594

    View in Article CrossRef Google Scholar

    [13] Anthony C. A., Marco R. and Aldo C. (2021). The Box-Cox transformation: Review and extensions. Statistical Science 36:239−255. DOI:10.1214/20-STS778

    View in Article CrossRef Google Scholar

    [14] Sappani M., Mani T., Sudarsanam T., et al. (2022). Preferring Box-Cox transformation, instead of log transformation to convert skewed distribution of outcomes to normal in medical research. Clinical Epidemiology and Global Health 15:101043. DOI:10.1016/j.cegh.2022.101043

    View in Article CrossRef Google Scholar

    [15] Nieboer D., Vergouwe Y., Roobol M. J., et al. (2015). Nonlinear modeling was applied thoughtfully for risk prediction: The Prostate Biopsy Collaborative Group. J. Clin. Epidemiol. 68:426−434. DOI:10.1016/j.jclinepi.2014.11.022

    View in Article CrossRef Google Scholar

    [16] Binder H., Sauerbrei W. and Royston P. (2013). Comparison between splines and fractional polynomials for multivariable model building with continuous covariates: A simulation study with continuous response. Stat. Med. 32:2262−2277. DOI:10.1002/sim.5639

    View in Article CrossRef Google Scholar

    [17] Lopez-Ayala P., Riley R. D., Collins G. S., et al. (2025). Dealing with continuous variables and modelling non-linear associations in healthcare data: Practical guide. BMJ 390:e082440. DOI:10.1136/bmj-2024-082440

    View in Article CrossRef Google Scholar

    [18] Ma J., Dhiman P., Qi C., et al. (2023). Poor handling of continuous predictors in clinical prediction models using logistic regression: A systematic review. J. Clin. Epidemiol. 161:140−151. DOI:10.1016/j.jclinepi.2023.07.017

    View in Article CrossRef Google Scholar

    [19] Feng G., Xu H., Wan S., et al. (2024). Twelve practical recommendations for developing and applying clinical predictive models. Innov. Med. 2:100105. DOI:10.59717/j.xinn-med.2024.100105

    View in Article CrossRef Google Scholar

    [20] Goodman M. S., Lopez A., Murillo A. L., et al. (2025). A comparison of methods for coding race in linear and logistic regression models. Ann. Epidemiol. 112:15−22. DOI:10.1016/j.annepidem.2025.10.005

    View in Article CrossRef Google Scholar

    [21] Daly A. J. D., Dekker T. and Hess S. (2016). Dummy coding vs effects coding for categorical variables: Clarifications and extensions. J. Choice Model. 21:36−41. DOI:10.1016/j.jocm.2016.09.005

    View in Article CrossRef Google Scholar

    [22] CIBIS Investigators and Committees. (1994). A randomized trial of beta-blockade in heart failure. The Cardiac Insufficiency Bisoprolol Study (CIBIS). Circulation 90:1765−1773. DOI:10.1161/01.cir.90.4.1765

    View in Article CrossRef Google Scholar

    [23] CIBIS-II Investigators and Committees. (1999). The Cardiac Insufficiency Bisoprolol Study II (CIBIS-II): A randomised trial. Lancet 353:9−13.

    View in Article Google Scholar

    [24] Bradley V. C., Kuriwaki S., Isakov M., et al. (2021). Unrepresentative big surveys significantly overestimated US vaccine uptake. Nature 600:695−700. DOI:10.1038/s41586-021-04198-4

    View in Article CrossRef Google Scholar

    [25] Riley R. D., Ensor J., Snell K. I. E., et al. (2020). Calculating the sample size required for developing a clinical prediction model. BMJ 368:m441. DOI:10.1136/bmj.m441

    View in Article CrossRef Google Scholar

    [26] Riley R. D., Snell K. I., Ensor J., et al. (2019). Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Stat. Med. 38:1276−1296. DOI:10.1002/sim.7992

    View in Article CrossRef Google Scholar

    [27] Riley R. D., Snell K. I. E., Ensor J., et al. (2019). Minimum sample size for developing a multivariable prediction model: Part I - Continuous outcomes. Stat. Med. 38:1262−1275. DOI:10.1002/sim.7993

    View in Article CrossRef Google Scholar

    [28] Pedersen A. B., Mikkelsen E. M., Cronin-Fenton D., et al. (2017). Missing data and multiple imputation in clinical epidemiological research. Clin. Epidemiol. 9:157−166. DOI:10.2147/clep.S129785

    View in Article CrossRef Google Scholar

    [29] White I. R., Royston P. and Wood A. M. (2011). Multiple imputation using chained equations: Issues and guidance for practice. Stat. Med. 30:377−399. DOI:10.1002/sim.4067

    View in Article CrossRef Google Scholar

    [30] Enders C. K. (2010). Applied missing data analysis. (The Guilford Press).

    View in Article Google Scholar

    [31] Leacy F. P., Floyd S., Yates T. A., et al. (2017). Analyses of sensitivity to the missing-at-random assumption using multiple imputation with delta adjustment: Application to a tuberculosis/HIV prevalence survey with incomplete HIV-status data. Am. J. Epidemiol. 185:304−315. DOI:10.1093/aje/kww107

    View in Article CrossRef Google Scholar

    [32] Cro S., Morris T. P., Kenward M. G., et al. (2016). Reference-based sensitivity analysis via multiple imputation for longitudinal trials with protocol deviation. Stata. J. 16:443−463.

    View in Article Google Scholar

    [33] National Research Council Panel on Handling Missing Data in Clinical Trials. (2010). The prevention and treatment of missing data in clinical trials. (National Academies Press (US)). DOI:10.17226/12955

    View in Article Google Scholar

    [34] Little R. J., D'Agostino R., Cohen M. L., et al. (2012). The prevention and treatment of missing data in clinical trials. N. Engl. J. Med. 367:1355−1360. DOI:10.1056/NEJMsr1203730

    View in Article CrossRef Google Scholar

    [35] Bell M. L., Fiero M., Horton N. J., et al. (2014). Handling missing data in RCTs; a review of the top medical journals. BMC Med. Res. Methodol. 14:118. DOI:10.1186/1471-2288-14-118

    View in Article CrossRef Google Scholar

    [36] Lee K. J., Tilling K. M., Cornish R. P., et al. (2021). Framework for the treatment and reporting of missing data in observational studies: The Treatment And Reporting of Missing data in Observational Studies framework. J. Clin. Epidemiol. 134:79−88. DOI:10.1016/j.jclinepi.2021.01.008

    View in Article CrossRef Google Scholar

    [37] Seaman S. R. and White I. R. (2013). Review of inverse probability weighting for dealing with missing data. Stat. Methods Med. Res. 22:278−295. DOI:10.1177/0962280210395740

    View in Article CrossRef Google Scholar

    [38] Wang Y., Li W., Wang L., et al. (2026). Addressing confounders in observational comparative effectiveness research: Methods, software, and reporting standards. Innov. Med. 4:100187. DOI:10.59717/j.xinn-med.2026.100187

    View in Article CrossRef Google Scholar

    [39] Karch J. (2023). Outliers may not be automatically removed. J. Exp. Psychol. Gen. 152:1735−1753. DOI:10.1037/xge0001357

    View in Article CrossRef Google Scholar

    [40] Yuen K.-V. and Ortiz G. A. (2017). Outlier detection and robust regression for correlated data. Comput. Methods Appl. Mech. Eng. 313:632−646. DOI:10.1016/j.cma.2016.10.004

    View in Article CrossRef Google Scholar

    [41] Varin S. and Panagiotakos D. B. (2020). A review of robust regression in biomedical science research. Arch. Med. Sci. 16:1267−1269. DOI:10.5114/aoms.2019.86184

    View in Article CrossRef Google Scholar

    [42] Morton H. C. (1983). Graphical presentation: The visual display of quantitative information. Science 221:1170−1172. DOI:10.1126/science.221.4616.1170-a

    View in Article CrossRef Google Scholar

    [43] Weissgerber T. L., Milic N. M., Winham S. J., et al. (2015). Beyond bar and line graphs: Time for a new data presentation paradigm. PLoS Biol. 13:e1002128. DOI:10.1371/journal.pbio.1002128

    View in Article CrossRef Google Scholar

    [44] Cumming G. (2014). The new statistics: Why and how. Psychol. Sci. 25:7−29. DOI:10.1177/0956797613504966

    View in Article CrossRef Google Scholar

    [45] Hopewell S., Chan A. W., Collins G. S., et al. (2025). CONSORT 2025 statement: Updated guideline for reporting randomised trials. BMJ 389:e081123. DOI:10.1136/bmj-2024-081123

    View in Article CrossRef Google Scholar

    [46] Staffa S. J. and Zurakowski D. (2021). Statistical development and validation of clinical prediction models. Anesthesiology 135:396−405. DOI:10.1097/aln.0000000000003871

    View in Article CrossRef Google Scholar

    [47] Riley R. D., Ensor J., Snell K. I., et al. (2016). External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: Opportunities and challenges. BMJ 353:i3140. DOI:10.1136/bmj.i3140

    View in Article CrossRef Google Scholar

    [48] Bouwmeester W., Zuithoff N. P., Mallett S., et al. (2012). Reporting and methods in clinical prediction research: A systematic review. PLoS Med. 9:1−12. DOI:10.1371/journal.pmed.1001221

    View in Article CrossRef Google Scholar

    [49] Fry A., Littlejohns T. J., Sudlow C., et al. (2017). Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186:1026−1034. DOI:10.1093/aje/kwx246

    View in Article CrossRef Google Scholar

    [50] Collins G. S., Dhiman P., Ma J., et al. (2024). Evaluation of clinical prediction models (part 1): From development to external validation. BMJ 384:e074819. DOI:10.1136/bmj-2023-074819

    View in Article CrossRef Google Scholar

    [51] Riley R. D., Archer L., Snell K. I. E., et al. (2024). Evaluation of clinical prediction models (part 2): How to undertake an external validation study. BMJ 384:e074820. DOI:10.1136/bmj-2023-074820

    View in Article CrossRef Google Scholar

    [52] Toll D. B., Janssen K. J., Vergouwe Y. et al. (2008). Validation, updating and impact of clinical prediction rules: A review. J. Clin. Epidemiol. 61:1085−1094. DOI:10.1016/j.jclinepi.2008.04.008

    View in Article CrossRef Google Scholar

    [53] Strandberg R., Jepsen P. and Hagström H. (2024). Developing and validating clinical prediction models in hepatology - An overview for clinicians. J. Hepatol. 81:149−162. DOI:10.1016/j.jhep.2024.03.030

    View in Article CrossRef Google Scholar

    [54] Alba A. C., Agoritsas T., Walsh M., et al. (2017). Discrimination and calibration of clinical prediction models: Users' guides to the medical literature. JAMA 318:1377−1384. DOI:10.1001/jama.2017.12126

    View in Article CrossRef Google Scholar

    [55] Binuya M. A. E., Engelhardt E. G., Schats W., et al. (2022). Methodological guidance for the evaluation and updating of clinical prediction models: A systematic review. BMC Med. Res. Methodol. 22:316. DOI:10.1186/s12874-022-01801-8

    View in Article CrossRef Google Scholar

    [56] Moons K. G., Kengne A. P., Grobbee D. E., et al. (2012). Risk prediction models: II. External validation, model updating, and impact assessment. Heart 98:691−698. DOI:10.1136/heartjnl-2011-301247

    View in Article CrossRef Google Scholar

    [57] Wasserstein R. L. and Lazar N. A. (2016). The ASA's statement on p-values: Context, process, and purpose. Am. Stat. 70:129−133. DOI:10.1080/00031305.2016.1154108

    View in Article CrossRef Google Scholar

    [58] Greenland S., Senn S. J., Rothman K. J., et al. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. Eur. J. Epidemiol. 31:337−350. DOI:10.1007/s10654-016-0149-3

    View in Article CrossRef Google Scholar

    [59] Rovetta A., Piretta L. and Mansournia M. A. (2025). p-Values and confidence intervals as compatibility measures: Guidelines for interpreting statistical studies in clinical research. Lancet Reg. Health Southeast Asia 33:100534. DOI:10.1016/j.lansea.2025.100534

    View in Article CrossRef Google Scholar

    [60] Altman D. G. and Bland J. M. (1995). Absence of evidence is not evidence of absence. BMJ 311:485. DOI:10.1136/bmj.311.7003.485

    View in Article CrossRef Google Scholar

    [61] Boutron I., Dutton S., Ravaud P., et al. (2010). Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes. JAMA 303:2058−2064. DOI:10.1001/jama.2010.651

    View in Article CrossRef Google Scholar

    [62] Hernán M. A. (2018). The C-Word: Scientific euphemisms do not improve causal inference from observational data. Am. J. Public Health 108:616−619. DOI:10.2105/ajph.2018.304337

    View in Article CrossRef Google Scholar

    [63] Vandenbroucke J. P., von Elm E., Altman D. G., et al. (2007). Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): Explanation and elaboration. Ann. Intern. Med. 147:W163−194. DOI:10.7326/0003-4819-147-8-200710160-00010-w1

    View in Article CrossRef Google Scholar

    [64] Stürmer T., Wang T., Golightly Y. M., et al. (2020). Methodological considerations when analysing and interpreting real-world data. Rheumatology (Oxford) 59:14−25. DOI:10.1093/rheumatology/kez320

    View in Article CrossRef Google Scholar

  • Cite this article:

    Feng G., Wang H., Zhang T., et al. (2026). A Chinese Lunar New Year countdown to trustworthy medical research: Eight traditions, eight statistical checkpoints. The Innovation Medicine 4:100197. https://doi.org/10.59717/j.xinn-med.2026.100197
    Feng G., Wang H., Zhang T., et al. (2026). A Chinese Lunar New Year countdown to trustworthy medical research: Eight traditions, eight statistical checkpoints. The Innovation Medicine 4:100197. https://doi.org/10.59717/j.xinn-med.2026.100197

Welcome!

To request copyright permission to republish or share portions of our works, please visit Copyright Clearance Center's (CCC) Marketplace website at marketplace.copyright.com.

Figures(0)     Tables(2)

Share

  • Share the QR code with wechat scanning code to friends and circle of friends.

Article Metrics

Article views(3040) PDF downloads(658)

Relative Articles

Cited by

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint