👋 Hello there!
I am an aspiring data scientist and Master’s student in the Department of Statistical Science at Duke University, where I also obtained my Bachelor’s in Statistical Science (Data Science Concentration) and a Minor in Computer Science.
- My academic journey has been marked by a deep research commitment to statistical analysis, machine learning, and data science, with special focuses on natural language processing, Bayesian statistics, and creative data visualizations. Take a look of my previous research and intern projects in R and Python, and let me know if you are interested!
- Beyond academia, I actively contribute as a Project Manager & Data Analyst @ Duke Impact Investing Group and as the Chief Technology Officer @ Duke Statistical Science Majors Union. Additionally, I have been a teaching assistant with 3+ years of experience. Feel free to reach out for project advice and business case studies.
- In my free time, I do 🥊 / 🚴♀️ / 🎹 / 🧁
🏫 Education
Institution | Degree | Field of Study | Dates |
---|---|---|---|
Duke University | M.S. Student | Statistics | May 2025 |
Duke University | B.S. | Statistical Science (Data Science Concentration) Minor in Computer Science | May 2023 |
University of California, Santa Barbara | (Transfer Out) | Statistics and Data Science | June 2021 |
⚙️ Skillset
© Visualization is created by scraping through my resume using R wordcloud2 package.
👩💻 Highlights & Updates
Invitee | R Dev Day @ Hutch @ (Aug 2024)
Opportunity Scholar | posit::conf(2024) @ (Aug 2024)
Incoming Masters Statistician Intern @ (May 2024 – Aug 2024)
Student Research Affiliate @ Duke AI Health (May 2022 – Dec 2022)
Lab Test Harmonization: Bio-BERT Based Deduplication of Test Labels- Selected as the sole undergraduate amidst a competitive pool of professional candidates for Duke AI Health 2022 cohort, and earned the prestigious opportunity to present research findings at Duke AI Health Poster Showcase 2022
- Optimized lab test deduplication of grouper labels by adopting and fine-tuning Bio-BERT NLP structure pre-trained on biomedical corpora; created a new method of cross-comparison similarity evaluation based on ground-truth text embeddings, and uncovered 95% performance boost in the application to Duke lab analyte database
Data Science Intern @ Hiya (May 2022 – Aug 2022)
Hiya Shield Project: Robocall Identification & Screening- Spearheaded a robocall screening process using NLP text embeddings to determine if an audio sample (or its transcript) is from a known robocall database
- Quantified the relationship between audio duration and performance of robocall classification; identified the preferred audio truncation length and optimal similarity threshold, and achieved a 67% acceleration in user experience with the introduction of a customizable screening accuracy feature for Hiya mobile App
Lead Author & Research Assistant @ Tsinghua University (Jun 2020 – Mar 2021)
Cross-Media Retrieval Based on Big Data Technology- Improved performance of permutation invariant training with mean squared error loss through BLSTM/LSTM and CNN in a key media separation technique; proved the improvement in two separation methods – the FIX strategy and the masking-based data augmentation strategy – and subsequently developed independent research project
- Paper Publication: Audio-Visual Single-Channel Signal Separation based on Big Data Augmentation published by IEEE during International Conference on Computer Networks and Electronic Communications (ICCNEC 2020)