Hi! I'm Nitika Jain.

A
Turning curiosity into insights and data into impact, I am passionate about providing data-driven solutions. With a keen eye for detail and a knack for tackling complex challenges, I transform raw information into actionable intelligence

About

Bonjour! I'm Nitika Jain, a Data Science graduate student at Northeastern University. As a natural problem solver, I thrive on transforming raw data into actionable insights that drive real-world impact. My passion lies in unraveling complex challenges using cutting-edge techniques in Natural Language Processing (NLP), Generative AI, Machine Learning, and statistical analysis.

I'm not immersed in the world of data, you'll find me on the badminton court. I'm drawn to this sport because it mirrors my approach to data science – it's all about finding those small openings and seizing opportunities. Just as a well-placed drop shot can change the course of a rally, a keen insight derived from data can transform a business strategy.

In my downtime, I enjoy going for walks to clear my mind, indulging in the captivating narratives of K-dramas, and experimenting with new recipes in the kitchen. These activities not only provide a refreshing break but also fuel my creativity and problem-solving skills in unexpected ways. I'm always eager to take on new challenges and collaborate on projects that harness the poweWhenr of data to make a meaningful difference. Let's connect and explore how we can turn data into impact together!

Experience

AI Analyst
Data Engineer
  • Reduced clinician documentation time by 85% by deploying Assembly AI and GPT-4o mini powered summarization system that automated consult note generation
  • Improved summary quality by evaluating prompt variations through structured output analysis and iterative refinement
  • Defined clinician and patient engagement KPIs (listening %, concerns addressed, handling time) to measure care quality and workflow efficiency
  • Automated KPI computation and lifecycle tracking using PostgreSQL triggers, CTEs, and indexing strategies, reducing manual reporting effort and improving data reliability
  • Built ETL pipelines in PostgreSQL to ingest clinical interaction data from multiple systems, transforming and validating datasets to produce analytics-ready KPI tables, reducing manual reporting effort by 50%
  • Performed data exploration and validation to define clinician and patient engagement KPIs (listening %, concerns addressed, handling time) to measure care quality
  • Reduced clinician documentation time by 85% by deploying Assembly AI and GPT-4o mini powered summarization system that automated consult note generation
  • Improved summary quality by evaluating prompt variations through structured output analysis and iterative refinement
  • Automated KPI computation and lifecycle tracking using PostgreSQL triggers, CTEs, and indexing strategies, reducing manual reporting effort and improving data reliability
Data Engineer
  • Engineered an LLM-based clinical summarization system converting transcripts into HIPAA-compliant, editable summaries, reducing documentation time by 85%
  • Improved summary accuracy through hallucination guardrails, PHI-safe prompts, and low-latency inference, cutting clinician review effort by 70%
  • Architected a 3NF clinic database in Supabase with RLS and strict constraints, reducing data inconsistencies by 60%
  • Automated clinician metrics and lifecycle updates using Postgres triggers, CTEs, and validation logic, reducing manual fixes by 50%
Oct 2025 - Present | Remote, USA

Playful Mind

Data Analyst
Data Analyst
Research Analyst
  • Consolidated and cleaned multimodal interaction data from controller inputs, eye-tracking streams for an online game of Rocket League, creating structured datasets to analyze user behavioral patterns
  • Examined behavioral and gaze metrics across cohorts using descriptive statistics and feature aggregation, identifying 32% higher target-tracking accuracy among expert players
  • Validated statistically significant gaze strategy differences using ANOVA and chi-square tests, confirming experts rely on target-focused attention while novices favor center-looking pattern
  • Developed a spatiotemporal gaze prediction model (CNN–BiLSTM–Attention) to forecast player attention patterns, improving behavioral prediction accuracy by 25%
  • Engineered analysis-ready datasets by integrating multimodal data (eye tracker, game controllers), applying preprocessing (merging, normalization, missing value handling), and conducting EDA to surface behavioral patterns across player cohorts
  • Mined behavioral and gaze metrics across player cohorts using descriptive statistics and feature aggregation, identifying 32% higher target-tracking accuracy among expert players
  • Validated statistically significant gaze strategy differences using ANOVA and chi-square tests, confirming experts rely on target-focused attention while novices favor center-looking pattern
  • Developed a spatiotemporal gaze prediction model (CNN–BiLSTM–Attention) to forecast player attention patterns, improving behavioral prediction accuracy by 25%
  • Built robust data pipelines to collect, clean, and integrate multi-source interaction data (eye-tracking, controller inputs, in-game events), reducing preprocessing time by 70% and enabling scalable analysis
  • Analyzed interaction data using exploratory data analysis and statistical testing (ANOVA, chi-square, correlation), revealing 32% higher target-tracking accuracy among expert players
  • Implemented a computer vision pipeline with YOLOv7 for frame-level object detection, achieving 93% mAP and enabling downstream behavioral modeling
  • Developed a spatiotemporal gaze prediction model (CNN–BiLSTM–Attention), trained with KL divergence loss and validated via spatial accuracy and correlation metrics, outperforming baselines by 25%
  • Translated modeling results into clear visualizations and summaries, enabling non-technical stakeholders to inform game design decisions
Jan 2024 – Dec 2025 | Boston, MA
Customer Success Data Analyst
  • Automated Salesforce data cleansing and deduplication using Python, implementing validation checks and reconciliation logic that reduced manual effort by 90% and improved reporting accuracy
  • Segmented registered IDC.com users using engagement metrics to define active and super user personas, improving upsell and expansion
  • Evaluated full customer journey from site visits to conversion, identifying drop-off points and driving retention improvements that lifted renewals by 30%
  • Leveraged Salesforce Opportunity insights to refine survey targeting and presented recommendations to VP and senior leadership that increased response rate by 5% in the next quarter
  • Built recurring performance reports and Power BI dashboards to keep C-suite executives informed on renewals, churn risk, and customer health metrics
  • Designed a renewal risk prediction model using Random Forest with hyperparameter tuning, evaluated using ROC-AUC and recall, improving early identification of at-risk accounts by 25%
  • Modeled customer lifetime value using regression on engagement and revenue signals, identifying high-value accounts and improving retention and upsell planning effectiveness by 17%
  • Examined onboarding-to-renewal journeys using EDA, cohort, funnel, and retention analysis, uncovering engagement drop-offs and driving a 30% increase in customer engagement
  • Segmented users by behavioral and engagement patterns using clustering and descriptive analysis, increasing marketing campaign relevance by 10%
  • Delivered executive-ready Power BI dashboards tracking CSAT, NPS, CLTV, and churn risk, translating analytical insights into data-driven leadership decisions
Jun 2024 – Sep 2024 | Boston, MA
Business Data Analyst
  • Partnered with product, operations, and risk stakeholders to gather business requirements and define payment performance KPIs and reconciliation metrics
  • Drove transaction reconciliation analysis using T-SQL (CTEs, window functions, joins, aggregations), improving reliability by 25%
  • Developed Tableau dashboards tracking transaction volume, latency, and failure rates enabling leadership to identify risk trends and reduce incident detection time by 35%
  • Investigated transaction failures through root cause and trend analysis across channels and time windows, reducing failures by 60% and preventing $100K+ in quarterly losses
  • Enabled time-sensitive operational decisions through ad-hoc and recurring analysis using Excel (Pivot Tables, VLOOKUP/XLOOKUP)
Feb 2021 – Jul 2023 | Bangalore, India

Projects

stock price prediction
Sentiment Stock-Price-Prediction

Based on news articles and previous stock prices

Accomplishments
  • I have used both news articles and stock price history to forecast the stock price
  • Predicted Apple & Google stock prices by analyzing news sentiment with Vader, Financial_Bert & Flan T5, incorporating summarization for enhanced Financial_Bert input.
  • Leveraged deep learning models like LSTM, boosting models -XGBoost and CatBoost boosting models, along with SARIMAX models, for time-series prediction, ensuring robustness by addressing volatility and seasonal changes.
quiz app
MediMind

A LLM based automated radiology report generator

Accomplishments
  • Built an automated radiology report generation system using multimodal RAG for chest X-rays, with Qdrant for efficient text and image embedding storage and LangChain for RAG pipeline integration
  • Evaluated report quality using an LLM-as-judge framework — RAG-generated reports scored 3/5 vs. 1/5 for non-RAG responses across conciseness, relevance, factual accuracy, and completeness
Screenshot of web app
AudioVibe

AudioVibe is a music platform providing key functionalities such as user profile management, playlist creation, and seamless music streaming.

Accomplishments
  • Designed and built a full-stack music streaming platform with user authentication, profile management, and playlist creation
  • Implemented seamless music streaming with persistent playback state and search/filter functionality
Screenshot of  web app
Article Recommender

Hybrid recommendation engine using collaborative, content-based, and ensemble filtering.

Accomplishments
  • It uses a combination of techniques like content-based filtering (analyzing article keywords with TF-IDF), collaborative filtering (leveraging user behavior and preferences), and a hybrid model that blends both approaches for better accuracy.
  • The project also incorporates advanced tools like K-Means clustering, SVD, and kernel PCA for improving recommendations and visualizing patterns.
Screenshot of  web app
A/B Testing

Statistical A/B testing to evaluate marketing campaign effectiveness across control and variant groups.

Accomplishments
  • Analyzed ad strategy effectiveness, optimal placement timing, and the relationship between ad exposure and conversion rates
  • Applied chi-square and z-tests to validate statistical significance of findings, informing future campaign budget allocation
Employee Onboarding Chatbot
Employee Onboarding Chatbot

LLM-powered chatbot to streamline employee onboarding with intelligent Q&A and document retrieval.

Accomplishments
  • Built a conversational AI assistant to guide new hires through onboarding policies, benefits, and procedures using RAG over internal documents
  • Enabled accurate, context-aware responses by grounding the chatbot in company knowledge bases, reducing repetitive HR queries

Skills

Languages and Databases

Python
R
MongoDB
MySQL
PostgreSQL

Libraries

NumPy
Pandas
scikit-learn
matplotlib
Hugging Face
NLTK

Frameworks

Apache Spark
LlangChain
LlamaIndex
Keras
TensorFlow
PyTorch

Other

Git
AWS
GCP
Docker

Education

Khoury College of Computer Sciences, Northeastern University

Boston,MA, USA

Degree: Master of Science(MS) in Data Science

    Research Assistant:

    • Worked as a research assistant under Professor Leanne Chukoskie on the online game of Rocket League to understand gaze strategy differences between experts and beginners

    Teaching Assistant:

    • Worked as a teaching assistant for DS2000- Introduction to Programming with Data Science
    • Worked as a teaching assistant for GSND6330 - Statistics for Player Experience

    Relevant Coursework:

    • Introduction to Data Management and Processing
    • Supervised and Unsupervised Machine Learning
    • Natural Language Processing
    • Large Language Models

    Others:

    • Participated at MIT reality Hack 2024
    • Member of the Data Science Hub at NEU

Bangalore Institute of Technology, VTU

Bangalore,Karnataka, India

Degree: Bachelor of Engineering(BE) in Industrial Engineering and Management

    Relevant Coursework:

    • Statistics and Probability
    • Artificial Intelligence
    • Supply Chain Management
    • Operations Management

Contact