Hi! I'm Nitika Jain.
A
Turning curiosity into insights and data into impact, I am passionate about providing data-driven solutions. With a keen eye for detail and a knack for tackling complex challenges, I transform raw information into actionable intelligence
About
Bonjour! I'm Nitika Jain, a Data Science graduate student at Northeastern University. As a natural problem solver, I thrive on transforming raw data into actionable insights that drive real-world impact. My passion lies in unraveling complex challenges using cutting-edge techniques in Natural Language Processing (NLP), Generative AI, Machine Learning, and statistical analysis.
I'm not immersed in the world of data, you'll find me on the badminton court. I'm drawn to this sport because it mirrors my approach to data science – it's all about finding those small openings and seizing opportunities. Just as a well-placed drop shot can change the course of a rally, a keen insight derived from data can transform a business strategy.
In my downtime, I enjoy going for walks to clear my mind, indulging in the captivating narratives of K-dramas, and experimenting with new recipes in the kitchen. These activities not only provide a refreshing break but also fuel my creativity and problem-solving skills in unexpected ways. I'm always eager to take on new challenges and collaborate on projects that harness the poweWhenr of data to make a meaningful difference. Let's connect and explore how we can turn data into impact together!Experience
- Built and maintained ETL pipelines with PySpark on Databricks, processing 1M+ records 45% faster and ensuring clean, reliable data for downstream analysis and dashboards
- Analyzed customer engagement and subscription data using cohort analysis, and clustering to uncover behavioral trends, informing retention strategies that increased engagement by 30%
- Partnered with Marketing and Product teams to translate website journey and campaign performance data into actionable insights, improving target efficiency and renewal strategies by 27%
- Led churn prediction development, achieving 94% recall and 0.92 AUC-ROC to forecast at-risk accounts and enable early interventions
- Built a regression-based CLTV model to predict customer value, improving retention and upsell planning effectiveness by 35%
- Designed and maintained interactive BI dashboards and reports to track KPIs such as CSAT, NPS, and churn risk, improving reporting speed by 40% and enhancing leadership visibility
- Tools: Python, PowerBI, SQL
- Analyzed business requirements, designed pipeline integration for downstream applications, ensuring data integrity & reducing operational costs by 10% , contributing to the deployment of an end to end real time payments system.
- Engineered a data center monitoring system by collecting server logs with Splunk and optimizing data storage and retrieval in an Oracle database, leading to a 15% improvement in proactive technology issue prediction.
- Optimized SQL queries and integrated transaction data from multiple systems, improving real-time payment processing efficiency by 15%, through query optimization, data validation, and seamless system integration.
- Design and implement robust data cleaning and preprocessing pipelines using Python (Pandas) and SQL, reducing data inconsistencies by 20%.
- Tools: Python, Oracel DB, Tableau, BMC Remedy
- Utilized SQL queries to extract and analyze client financial data to assess default risk. Enhanced data quality and performed exploratory analysis using Python libraries like NumPy, Pandas, Seaborn, and Matplotlib.
- Attained a 93% prediction accuracy employing Logistic Regression, Random Forest, and XGBoost models, following meticulous handling of missing values using diverse imputation techniques(Iterative Imputer,MICEForest etc)
- Developed a Tableau Dashboard, displaying average ratings of Key Performance Indicators such as service quality, value for money, and sentiments across multiple countries. Tools: Python, SQL, Tableau, PowerBI
Projects
Based on news articles and previous stock prices
- I have used both news articles and stock price history to forecast the stock price
- Predicted Apple & Google stock prices by analyzing news sentiment with Vader, Financial_Bert & Flan T5, incorporating summarization for enhanced Financial_Bert input.
- Leveraged deep learning models like LSTM, boosting models -XGBoost and CatBoost boosting models, along with SARIMAX models, for time-series prediction, ensuring robustness by addressing volatility and seasonal changes.
A LLM based automated radiology report generator
- Developed MediMind, an automated radiology report generation system using multimodal RAG for chest X-rays, implementing Qdrant for efficient text and image embedding storage and retrieval. Utilized LangChain framework to integrate RAG components
- RUtilized LLMs for report generation and evaluation, with an LLM serving as an impartial judge. The system achieved a 3/5 score across metrics (conciseness, relevance, factual accuracy, and completeness) for RAG-generated reports, compared to 1/5 for non-RAG responses
Using collaborative, content and hybird filtering.
- It uses a combination of techniques like content-based filtering (analyzing article keywords with TF-IDF), collaborative filtering (leveraging user behavior and preferences), and a hybrid model that blends both approaches for better accuracy.
- The project also incorporates advanced tools like K-Means clustering, SVD, and kernel PCA for improving recommendations and visualizing patterns.
A/B Testing conducted to evaluate the effectiveness of marketing campaigns by comparing a control group with a variant group.
Skills
Languages and Databases
Python
R
MongoDB
MySQL
PostgreSQL
Libraries
NumPy
Pandas
scikit-learn
matplotlib
Hugging Face
NLTK
Frameworks
Apache Spark
LlangChain
LlamaIndex
Keras
TensorFlow
PyTorch
Other
Git
AWS
GCP
Docker
Education
Khoury College of Computer Sciences, Northeastern University
Boston,MA, USA
Degree: Master of Science(MS) in Data Science
- Worked as a research assistant under Professor Leanne Chukoskie on the online game of Rocket League to understand gaze strategy differences between experts and beginners
- Worked as a teaching assistant for DS2000- Introduction to Programming with Data Science
- Worked as a teaching assistant for GSND6330 - Statistics for Player Experience
- Introduction to Data Management and Processing
- Supervised and Unsupervised Machine Learning
- Natural Language Processing
- Large Language Models
- Participated at MIT reality Hack 2024
- Member of the Data Science Hub at NEU
Research Assistant:
Teaching Assistant:
Relevant Coursework:
Others:
Bangalore Institute of Technology, VTU
Bangalore,Karnataka, India
Degree: Bachelor of Engineering(BE) in Industrial Engineering and Management
- Statistics and Probability
- Artificial Intelligence
- Supply Chain Management
- Operations Management
Relevant Coursework:

