Hi! I'm Nitika Jain.

A
Turning curiosity into insights and data into impact, I am passionate about providing data-driven solutions. With a keen eye for detail and a knack for tackling complex challenges, I transform raw information into actionable intelligence

About

Bonjour! I'm Nitika Jain, a Data Science graduate student at Northeastern University. As a natural problem solver, I thrive on transforming raw data into actionable insights that drive real-world impact. My passion lies in unraveling complex challenges using cutting-edge techniques in Natural Language Processing (NLP), Generative AI, Machine Learning, and statistical analysis.

I'm not immersed in the world of data, you'll find me on the badminton court. I'm drawn to this sport because it mirrors my approach to data science – it's all about finding those small openings and seizing opportunities. Just as a well-placed drop shot can change the course of a rally, a keen insight derived from data can transform a business strategy.

In my downtime, I enjoy going for walks to clear my mind, indulging in the captivating narratives of K-dramas, and experimenting with new recipes in the kitchen. These activities not only provide a refreshing break but also fuel my creativity and problem-solving skills in unexpected ways. I'm always eager to take on new challenges and collaborate on projects that harness the poweWhenr of data to make a meaningful difference. Let's connect and explore how we can turn data into impact together!

Experience

Data Analyst
  • Built and maintained ETL pipelines with PySpark on Databricks, processing 1M+ records 45% faster and ensuring clean, reliable data for downstream analysis and dashboards
  • Analyzed customer engagement and subscription data using cohort analysis, and clustering to uncover behavioral trends, informing retention strategies that increased engagement by 30%
  • Partnered with Marketing and Product teams to translate website journey and campaign performance data into actionable insights, improving target efficiency and renewal strategies by 27%
  • Led churn prediction development, achieving 94% recall and 0.92 AUC-ROC to forecast at-risk accounts and enable early interventions
  • Built a regression-based CLTV model to predict customer value, improving retention and upsell planning effectiveness by 35%
  • Designed and maintained interactive BI dashboards and reports to track KPIs such as CSAT, NPS, and churn risk, improving reporting speed by 40% and enhancing leadership visibility
  • Tools: Python, PowerBI, SQL
July 2024 - Sep 2024 | Needham, US
Data Analyst
  • Analyzed business requirements, designed pipeline integration for downstream applications, ensuring data integrity & reducing operational costs by 10% , contributing to the deployment of an end to end real time payments system.
  • Engineered a data center monitoring system by collecting server logs with Splunk and optimizing data storage and retrieval in an Oracle database, leading to a 15% improvement in proactive technology issue prediction.
  • Optimized SQL queries and integrated transaction data from multiple systems, improving real-time payment processing efficiency by 15%, through query optimization, data validation, and seamless system integration.
  • Design and implement robust data cleaning and preprocessing pipelines using Python (Pandas) and SQL, reducing data inconsistencies by 20%.
  • Tools: Python, Oracel DB, Tableau, BMC Remedy
Feb 2021 - July 2023 | Bangalore, India
Data Scientist
  • Utilized SQL queries to extract and analyze client financial data to assess default risk. Enhanced data quality and performed exploratory analysis using Python libraries like NumPy, Pandas, Seaborn, and Matplotlib.
  • Attained a 93% prediction accuracy employing Logistic Regression, Random Forest, and XGBoost models, following meticulous handling of missing values using diverse imputation techniques(Iterative Imputer,MICEForest etc)
  • Developed a Tableau Dashboard, displaying average ratings of Key Performance Indicators such as service quality, value for money, and sentiments across multiple countries.
  • Tools: Python, SQL, Tableau, PowerBI
July 2019 - Sep 2019 | Bangalore, India

Projects

stock price prediction
Sentiment Stock-Price-Prediction

Based on news articles and previous stock prices

Accomplishments
  • I have used both news articles and stock price history to forecast the stock price
  • Predicted Apple & Google stock prices by analyzing news sentiment with Vader, Financial_Bert & Flan T5, incorporating summarization for enhanced Financial_Bert input.
  • Leveraged deep learning models like LSTM, boosting models -XGBoost and CatBoost boosting models, along with SARIMAX models, for time-series prediction, ensuring robustness by addressing volatility and seasonal changes.
quiz app
MediMind

A LLM based automated radiology report generator

Accomplishments
  • Developed MediMind, an automated radiology report generation system using multimodal RAG for chest X-rays, implementing Qdrant for efficient text and image embedding storage and retrieval. Utilized LangChain framework to integrate RAG components
  • RUtilized LLMs for report generation and evaluation, with an LLM serving as an impartial judge. The system achieved a 3/5 score across metrics (conciseness, relevance, factual accuracy, and completeness) for RAG-generated reports, compared to 1/5 for non-RAG responses
Screenshot of web app
AudioVibe

AudioVibe is a music platform providing key functionalities such as user profile management, playlist creation, and seamless music streaming.

Accomplishments
  • Key Functionalities- User Management,Playlist Management,Seamless Experience.
Screenshot of  web app
Article Recommender

Using collaborative, content and hybird filtering.

Accomplishments
  • It uses a combination of techniques like content-based filtering (analyzing article keywords with TF-IDF), collaborative filtering (leveraging user behavior and preferences), and a hybrid model that blends both approaches for better accuracy.
  • The project also incorporates advanced tools like K-Means clustering, SVD, and kernel PCA for improving recommendations and visualizing patterns.
Screenshot of  web app
A/B Testing

A/B Testing conducted to evaluate the effectiveness of marketing campaigns by comparing a control group with a variant group.

Accomplishments
  • The effectiveness of different ad strategies.
  • Optimal times and days for ad placement.
  • The relationship between ad exposure and conversion rates These findings can be used to optimize future marketing campaigns and improve overall conversion rates.
\

Skills

Languages and Databases

Python
R
MongoDB
MySQL
PostgreSQL

Libraries

NumPy
Pandas
scikit-learn
matplotlib
Hugging Face
NLTK

Frameworks

Apache Spark
LlangChain
LlamaIndex
Keras
TensorFlow
PyTorch

Other

Git
AWS
GCP
Docker

Education

Khoury College of Computer Sciences, Northeastern University

Boston,MA, USA

Degree: Master of Science(MS) in Data Science

    Research Assistant:

    • Worked as a research assistant under Professor Leanne Chukoskie on the online game of Rocket League to understand gaze strategy differences between experts and beginners

    Teaching Assistant:

    • Worked as a teaching assistant for DS2000- Introduction to Programming with Data Science
    • Worked as a teaching assistant for GSND6330 - Statistics for Player Experience

    Relevant Coursework:

    • Introduction to Data Management and Processing
    • Supervised and Unsupervised Machine Learning
    • Natural Language Processing
    • Large Language Models

    Others:

    • Participated at MIT reality Hack 2024
    • Member of the Data Science Hub at NEU

Bangalore Institute of Technology, VTU

Bangalore,Karnataka, India

Degree: Bachelor of Engineering(BE) in Industrial Engineering and Management

    Relevant Coursework:

    • Statistics and Probability
    • Artificial Intelligence
    • Supply Chain Management
    • Operations Management

Contact