Hi! I'm Nitika Jain.
A
Turning curiosity into insights and data into impact, I am passionate about providing data-driven solutions. With a keen eye for detail and a knack for tackling complex challenges, I transform raw information into actionable intelligence
About
Bonjour! I'm Nitika Jain, a Data Science graduate student at Northeastern University. As a natural problem solver, I thrive on transforming raw data into actionable insights that drive real-world impact. My passion lies in unraveling complex challenges using cutting-edge techniques in Natural Language Processing (NLP), Generative AI, Machine Learning, and statistical analysis.
I'm not immersed in the world of data, you'll find me on the badminton court. I'm drawn to this sport because it mirrors my approach to data science – it's all about finding those small openings and seizing opportunities. Just as a well-placed drop shot can change the course of a rally, a keen insight derived from data can transform a business strategy.
In my downtime, I enjoy going for walks to clear my mind, indulging in the captivating narratives of K-dramas, and experimenting with new recipes in the kitchen. These activities not only provide a refreshing break but also fuel my creativity and problem-solving skills in unexpected ways. I'm always eager to take on new challenges and collaborate on projects that harness the poweWhenr of data to make a meaningful difference. Let's connect and explore how we can turn data into impact together!Experience
- Reduced clinician documentation time by 85% by deploying Assembly AI and GPT-4o mini powered summarization system that automated consult note generation
- Improved summary quality by evaluating prompt variations through structured output analysis and iterative refinement
- Defined clinician and patient engagement KPIs (listening %, concerns addressed, handling time) to measure care quality and workflow efficiency
- Automated KPI computation and lifecycle tracking using PostgreSQL triggers, CTEs, and indexing strategies, reducing manual reporting effort and improving data reliability
- Built ETL pipelines in PostgreSQL to ingest clinical interaction data from multiple systems, transforming and validating datasets to produce analytics-ready KPI tables, reducing manual reporting effort by 50%
- Performed data exploration and validation to define clinician and patient engagement KPIs (listening %, concerns addressed, handling time) to measure care quality
- Reduced clinician documentation time by 85% by deploying Assembly AI and GPT-4o mini powered summarization system that automated consult note generation
- Improved summary quality by evaluating prompt variations through structured output analysis and iterative refinement
- Automated KPI computation and lifecycle tracking using PostgreSQL triggers, CTEs, and indexing strategies, reducing manual reporting effort and improving data reliability
- Engineered an LLM-based clinical summarization system converting transcripts into HIPAA-compliant, editable summaries, reducing documentation time by 85%
- Improved summary accuracy through hallucination guardrails, PHI-safe prompts, and low-latency inference, cutting clinician review effort by 70%
- Architected a 3NF clinic database in Supabase with RLS and strict constraints, reducing data inconsistencies by 60%
- Automated clinician metrics and lifecycle updates using Postgres triggers, CTEs, and validation logic, reducing manual fixes by 50%
Playful Mind
- Consolidated and cleaned multimodal interaction data from controller inputs, eye-tracking streams for an online game of Rocket League, creating structured datasets to analyze user behavioral patterns
- Examined behavioral and gaze metrics across cohorts using descriptive statistics and feature aggregation, identifying 32% higher target-tracking accuracy among expert players
- Validated statistically significant gaze strategy differences using ANOVA and chi-square tests, confirming experts rely on target-focused attention while novices favor center-looking pattern
- Developed a spatiotemporal gaze prediction model (CNN–BiLSTM–Attention) to forecast player attention patterns, improving behavioral prediction accuracy by 25%
- Engineered analysis-ready datasets by integrating multimodal data (eye tracker, game controllers), applying preprocessing (merging, normalization, missing value handling), and conducting EDA to surface behavioral patterns across player cohorts
- Mined behavioral and gaze metrics across player cohorts using descriptive statistics and feature aggregation, identifying 32% higher target-tracking accuracy among expert players
- Validated statistically significant gaze strategy differences using ANOVA and chi-square tests, confirming experts rely on target-focused attention while novices favor center-looking pattern
- Developed a spatiotemporal gaze prediction model (CNN–BiLSTM–Attention) to forecast player attention patterns, improving behavioral prediction accuracy by 25%
- Built robust data pipelines to collect, clean, and integrate multi-source interaction data (eye-tracking, controller inputs, in-game events), reducing preprocessing time by 70% and enabling scalable analysis
- Analyzed interaction data using exploratory data analysis and statistical testing (ANOVA, chi-square, correlation), revealing 32% higher target-tracking accuracy among expert players
- Implemented a computer vision pipeline with YOLOv7 for frame-level object detection, achieving 93% mAP and enabling downstream behavioral modeling
- Developed a spatiotemporal gaze prediction model (CNN–BiLSTM–Attention), trained with KL divergence loss and validated via spatial accuracy and correlation metrics, outperforming baselines by 25%
- Translated modeling results into clear visualizations and summaries, enabling non-technical stakeholders to inform game design decisions
- Automated Salesforce data cleansing and deduplication using Python, implementing validation checks and reconciliation logic that reduced manual effort by 90% and improved reporting accuracy
- Segmented registered IDC.com users using engagement metrics to define active and super user personas, improving upsell and expansion
- Evaluated full customer journey from site visits to conversion, identifying drop-off points and driving retention improvements that lifted renewals by 30%
- Leveraged Salesforce Opportunity insights to refine survey targeting and presented recommendations to VP and senior leadership that increased response rate by 5% in the next quarter
- Built recurring performance reports and Power BI dashboards to keep C-suite executives informed on renewals, churn risk, and customer health metrics
- Designed a renewal risk prediction model using Random Forest with hyperparameter tuning, evaluated using ROC-AUC and recall, improving early identification of at-risk accounts by 25%
- Modeled customer lifetime value using regression on engagement and revenue signals, identifying high-value accounts and improving retention and upsell planning effectiveness by 17%
- Examined onboarding-to-renewal journeys using EDA, cohort, funnel, and retention analysis, uncovering engagement drop-offs and driving a 30% increase in customer engagement
- Segmented users by behavioral and engagement patterns using clustering and descriptive analysis, increasing marketing campaign relevance by 10%
- Delivered executive-ready Power BI dashboards tracking CSAT, NPS, CLTV, and churn risk, translating analytical insights into data-driven leadership decisions
- Partnered with product, operations, and risk stakeholders to gather business requirements and define payment performance KPIs and reconciliation metrics
- Drove transaction reconciliation analysis using T-SQL (CTEs, window functions, joins, aggregations), improving reliability by 25%
- Developed Tableau dashboards tracking transaction volume, latency, and failure rates enabling leadership to identify risk trends and reduce incident detection time by 35%
- Investigated transaction failures through root cause and trend analysis across channels and time windows, reducing failures by 60% and preventing $100K+ in quarterly losses
- Enabled time-sensitive operational decisions through ad-hoc and recurring analysis using Excel (Pivot Tables, VLOOKUP/XLOOKUP)
Projects
Based on news articles and previous stock prices
- I have used both news articles and stock price history to forecast the stock price
- Predicted Apple & Google stock prices by analyzing news sentiment with Vader, Financial_Bert & Flan T5, incorporating summarization for enhanced Financial_Bert input.
- Leveraged deep learning models like LSTM, boosting models -XGBoost and CatBoost boosting models, along with SARIMAX models, for time-series prediction, ensuring robustness by addressing volatility and seasonal changes.
A LLM based automated radiology report generator
- Built an automated radiology report generation system using multimodal RAG for chest X-rays, with Qdrant for efficient text and image embedding storage and LangChain for RAG pipeline integration
- Evaluated report quality using an LLM-as-judge framework — RAG-generated reports scored 3/5 vs. 1/5 for non-RAG responses across conciseness, relevance, factual accuracy, and completeness
AudioVibe is a music platform providing key functionalities such as user profile management, playlist creation, and seamless music streaming.
Hybrid recommendation engine using collaborative, content-based, and ensemble filtering.
- It uses a combination of techniques like content-based filtering (analyzing article keywords with TF-IDF), collaborative filtering (leveraging user behavior and preferences), and a hybrid model that blends both approaches for better accuracy.
- The project also incorporates advanced tools like K-Means clustering, SVD, and kernel PCA for improving recommendations and visualizing patterns.
Statistical A/B testing to evaluate marketing campaign effectiveness across control and variant groups.
LLM-powered chatbot to streamline employee onboarding with intelligent Q&A and document retrieval.
Skills
Languages and Databases
Python
R
MongoDB
MySQL
PostgreSQL
Libraries
NumPy
Pandas
scikit-learn
matplotlib
Hugging Face
NLTK
Frameworks
Apache Spark
LlangChain
LlamaIndex
Keras
TensorFlow
PyTorch
Other
Git
AWS
GCP
Docker
Education
Khoury College of Computer Sciences, Northeastern University
Boston,MA, USA
Degree: Master of Science(MS) in Data Science
- Worked as a research assistant under Professor Leanne Chukoskie on the online game of Rocket League to understand gaze strategy differences between experts and beginners
- Worked as a teaching assistant for DS2000- Introduction to Programming with Data Science
- Worked as a teaching assistant for GSND6330 - Statistics for Player Experience
- Introduction to Data Management and Processing
- Supervised and Unsupervised Machine Learning
- Natural Language Processing
- Large Language Models
- Participated at MIT reality Hack 2024
- Member of the Data Science Hub at NEU
Research Assistant:
Teaching Assistant:
Relevant Coursework:
Others:
Bangalore Institute of Technology, VTU
Bangalore,Karnataka, India
Degree: Bachelor of Engineering(BE) in Industrial Engineering and Management
- Statistics and Probability
- Artificial Intelligence
- Supply Chain Management
- Operations Management
Relevant Coursework:

