👋 Hello there!

I am an aspiring data scientist and Master’s student in the Department of Statistical Science at Duke University, where I also obtained my Bachelor’s in Statistical Science (Data Science Concentration) and a Minor in Computer Science.

  • My academic journey has been marked by a deep research commitment to statistical analysis, machine learning, and data science, with special focuses on natural language processing, Bayesian statistics, and creative data visualizations. Take a look of my previous research and intern projects in R and Python, and let me know if you are interested!
  • Beyond academia, I actively contribute as a Project Manager & Data Analyst @ Duke Impact Investing Group and as the Chief Technology Officer @ Duke Statistical Science Majors Union. Additionally, I have been a teaching assistant with 3+ years of experience. Feel free to reach out for project advice and business case studies.
  • In my free time, I do 🥊 / 🚴‍♀️ / 🎹 / 🧁
👉View My Resume

🏫 Education

Institution Degree Field of Study Dates
Duke University M.S. Student Statistics May 2025
Duke University B.S. Statistical Science (Data Science Concentration) Minor in Computer Science May 2023
University of California, Santa Barbara (Transfer Out) Statistics and Data Science June 2021


⚙️ Skillset

Skillset

© Visualization is created by scraping through my resume using R wordcloud2 package.


👩‍💻 Highlights & Updates

Invitee | R Dev Day @ Hutch @ (Aug 2024)

Opportunity Scholar | posit::conf(2024) @ (Aug 2024)

Incoming Masters Statistician Intern @ (May 2024 – Aug 2024)

Student Research Affiliate @ Duke AI Health (May 2022 – Dec 2022) Lab Test Harmonization: Bio-BERT Based Deduplication of Test Labels
  • Selected as the sole undergraduate amidst a competitive pool of professional candidates for Duke AI Health 2022 cohort, and earned the prestigious opportunity to present research findings at Duke AI Health Poster Showcase 2022
  • Optimized lab test deduplication of grouper labels by adopting and fine-tuning Bio-BERT NLP structure pre-trained on biomedical corpora; created a new method of cross-comparison similarity evaluation based on ground-truth text embeddings, and uncovered 95% performance boost in the application to Duke lab analyte database
Data Science Intern @ Hiya (May 2022 – Aug 2022) Hiya Shield Project: Robocall Identification & Screening
  • Spearheaded a robocall screening process using NLP text embeddings to determine if an audio sample (or its transcript) is from a known robocall database
  • Quantified the relationship between audio duration and performance of robocall classification; identified the preferred audio truncation length and optimal similarity threshold, and achieved a 67% acceleration in user experience with the introduction of a customizable screening accuracy feature for Hiya mobile App
Lead Author & Research Assistant @ Tsinghua University (Jun 2020 – Mar 2021) Cross-Media Retrieval Based on Big Data Technology
  • Improved performance of permutation invariant training with mean squared error loss through BLSTM/LSTM and CNN in a key media separation technique; proved the improvement in two separation methods – the FIX strategy and the masking-based data augmentation strategy – and subsequently developed independent research project
  • Paper Publication: Audio-Visual Single-Channel Signal Separation based on Big Data Augmentation published by IEEE during International Conference on Computer Networks and Electronic Communications (ICCNEC 2020)