I am an Assistant Professor of Statistics & Data Science at the National University of Singapore. I was a Postdoctoral Research Fellow in Biostatistics at Harvard University from 2022 to 2024, working with Prof. Tianxi Cai. I received a Ph.D in Statistics 2022 from the University of California, Davis (UC Davis), advised by Prof. Hao Chen. Before that, I received my B.S. in Statistics and B.E. in Computer Science (dual) from the University of Science and Technology of China (USTC) in 2019.
I am seeking self-motivated students to join my research group or collaborate with me. If you are interested in collaborating, please email me your CV along with a brief introduction about yourself.
I am interested in developing statistical methodology and theory for electronic health records (EHR) data analysis. I’m also developing practical tools for analyzing high-dimensional and non-Euclidean data. I mainly work on representation learning, federated learning, transfer learning, reinforcement learning, network analysis, graph neural networks, large language models, and high-dimensional statistics. Recently, I am interested in:
Zhou, D.*, Li, M.*, Cai, T., Liu, M. Model-assisted and Knowledge-guided Transfer Regression for the Underrepresented Population. (2024+) [arXiv]
Liu, M.*, Zhou, D.*, Chen, H. Generalized Independence Test for Modern Data. (2024+) [arXiv]
Cai, T.#, Huang, F.#, Nakada, R.#, Zhang, L.#, Zhou, D.# Contrastive Learning on Multimodal Analysis of Electronic Health Records. (2024+) [arXiv]
Liang, J.*, Liu, Y.*, Zhou, D., Zhang, S., Lu, J. The Wreaths of Coherence: Uniform Graph Feature Selection with False Discovery Rate Control. (2024+) [arXiv]
Xu, Z., Gan, Z., Zhou, D., Shen, S., Lu, J., Cai, T. Inference of Dependency Knowledge Graph for Electronic Health Records. (2023+) [arXiv]
Cai, T.#, Xia, D.#, Zhang, L.#, Zhou, D.# Consensus Knowledge Graph Learning via Multi-view Sparse Low Rank Block Model. (2023+) [arXiv]
Zhou, D., Chen, H. RING-CPD: Asymptotic Distribution-free Change-point Detection for Multivariate and Non-Euclidean Data. (2022+) [arXiv]
Zhou, D.*, Liu, M.*, Li, M., Cai, T. Doubly Robust Augmented Model Accuracy Transfer Inference with High Dimensional Features. Journal of the American Statistical Association: Theory and Methods, 2024. [arXiv]
Zhou, D.*, Zhang, Y.*, Sonabend-W, A., Wang, Z., Lu, J., Cai, T. Federated Offline Reinforcement Learning. Journal of the American Statistical Association: Theory and Methods, 2024. [arXiv][code]
Zhou, D., Cai, T., Lu, J. Multi-source Learning via Completion of Block-wise Overlapping Noisy Matrices. Journal of Machine Learning Research, 2023. [ code] [ package]
Zhou, D., Chen, H. A New ranking Scheme for Modern Data and Its Application to Two-sample Hypothesis Testing. Conference on Learning Theory (COLT), 2023.
Liu, M.#, Zhang, Y.#, Zhou, D.# Double/Debiased Machine Learning for Logistic Partially Linear Model. The Econometrics Journal, 2021. [code]
Wen, J., et al. DOME: Directional Medical Embedding Vectors from Electronic Health Records.(2024+) [code]
Gan, Z.*, Zhou, D.*, et al. ARCH: Large-scale Knowledge Graph via Aggregated Narrative Codified Health Records Analysis. (2023+) [medRxiv] [code] [ARCH APP]
Yang, D., Zhou, D., Cai, S., Gan, Z., Pencina, M., Avillach, P., Cai, T., Hong, C. SONAR: Enabling Robust Automated Harmonization of Heterogeneous Data through Ensemble Machine Learning. (Under revision, 2023+) [preprint]
Xiong, X., et al. Knowledge-Driven Online Multimodal Automated Phenotyping System. (2023+) [medRxiv] [KOMAP] [ONCE]
Lou, Y., Chen, Y., Huang, Y., Zhou, D., Cao, Y., Wang, H. Two-stream Feature Extraction for Self-supervised Image Quality Assessment. IEEE International Conference on Data Mining (ICDM), 2023.
Cai, B., Zeng, S., Lin, Y., Yuan, Z., Zhou, D., Tian, L. Hierarchical Pretraining for Biomedical Term Embeddings. Proceedings of the 18th Conference on Computational Intelligence Methods for Bioinformatics & Biostatistics (CIBB 2023).
Wen, J., et. al. Multimodal Representation Learning for Predicting Molecule-Disease Relations. Bioinformatics, 2023.
Zhou, D., et al. Multiview Incomplete Knowledge Graph Integration with Application to Cross-institutional EHR Data Harmonization. Journal of Biomedical Informatics, 2022. [MIKGI APP]
Ahuja, Y., Liang, L., Zhou, D., Huang, S., Cai, T. Semisupervised Calibration of Risk with Noisy Event Times (SCORNET) using electronic health record data. Biostatistics, 2022. [code]
Hong, C., et al. Clinical Knowledge Extraction via Sparse Embedding Regression (KESER) with Multi-Center Large Scale Electronic Health Record Data. npj Digital Medicine, 2021. [KESER Network]
Ahuja, Y., Zhou, D., He, Z., Sun, J., Castro, V., Gainer, V., Murphy, S., Hong, C., Cai, T. sureLDA: A Multidisease Automated Phenotyping Method for the Electronic Health Record. Journal of the American Medical Informatics Association, 2020. [code]