Research
Large models and enormous data are essential driven forces of the unprecedented successes achieved by modern algorithms, especially in scientific computing and machine learning. Nevertheless, the growing dimensionality and model complexity, as well as the non-negligible workload of data pre-processing, also bring formidable costs to such successes in both computation and data aggregation. As the deceleration of Moore's Law slackens the cost reduction of computation from the hardware level, fast heuristics for expensive classical routines and efficient algorithms for exploiting limited data are becoming increasingly indispensable for pushing the limit of algorithm potency.
My research focuses on such efficient algorithms for fast execution and effective data utilization.
From the computational efficiency perspective, I work on designing and analyzing randomized low-rank decomposition algorithms that can be executed fast on large matrices.
From the sample efficiency perspective, I am interested in studying and improving the generalization and distributional robustness of learning algorithms in data-limited settings.
Selected Talks
Preprints
Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining
Jianwei Li, Yijun Dong, Qi Lei, 2024.
Robust Blockwise Random Pivoting: Fast and Accurate Adaptive Interpolative Decomposition
Yijun Dong, Chao Chen, Per-Gunnar Martinsson, Katherine Pearce, 2023. [GitHub]
Adaptive Parallelizable Algorithms for Interpolative Decompositions via Partially Pivoted LU
Katherine J. Pearce, Chao Chen, Yijun Dong, Per-Gunnar Martinsson, 2023.
[GitHub]
Publications
Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning
Yijun Dong*, Hoang Phan*, Xiang Pan*, Qi Lei
Neural Information Processing Systems (NeurIPS), 2024.
[GitHub]
Efficient Bounds and Estimates for Canonical Angles in Randomized Subspace Approximations
Yijun Dong, Per-Gunnar Martinsson, Yuji Nakatsukasa
SIAM Journal on Matrix Analysis and Applications, 2024.
[GitHub]
Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering
Yijun Dong*, Kevin Miller*, Qi Lei, Rachel Ward
Neural Information Processing Systems (NeurIPS), 2023.
[GitHub]
Adaptively Weighted Data Augmentation Consistency Regularization for Robust Optimization under Concept Shift
Yijun Dong*, Yuege Xie*, Rachel Ward
International Conference on Machine Learning (ICML), 2023.
[GitHub, poster]
Sample Efficiency of Data Augmentation Consistency Regularization
Shuo Yang*, Yijun Dong*, Rachel Ward, Inderjit Dhillon, Sujay Sanghavi, Qi Lei
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023.
Simpler is better: A comparative study of randomized algorithms for computing the CUR decomposition
Yijun Dong, Per-Gunnar Martinsson
Advances in Computational Mathematics, 2023.
[GitHub]
Quantifying Biofilm Formation of Sinorhizobium meliloti Bacterial Strains in Microfluidic Platforms by Measuring the Diffusion Coefficient of Polystyrene Beads
Chen Cheng*, Yijun Dong*, Matthew Dorian*, Farhan Kamili*, Effrosyni Seitaridou
Open Journal of Biophysics, 2017.
Workshop Papers
(* denotes equal contribution or alphabetical order)
|