Hi! I'm Nitika Jain.
A
Turning curiosity into insights and data into impact, I am passionate about providing data-driven solutions. With a keen eye for detail and a knack for tackling complex challenges, I transform raw information into actionable intelligence
About
Bonjour! I'm Nitika Jain, a Data Science graduate student at Northeastern University. As a natural problem solver, I thrive on transforming raw data into actionable insights that drive real-world impact. My passion lies in unraveling complex challenges using cutting-edge techniques in Natural Language Processing (NLP), Generative AI, Machine Learning, and statistical analysis.
I'm not immersed in the world of data, you'll find me on the badminton court. I'm drawn to this sport because it mirrors my approach to data science – it's all about finding those small openings and seizing opportunities. Just as a well-placed drop shot can change the course of a rally, a keen insight derived from data can transform a business strategy.
In my downtime, I enjoy going for walks to clear my mind, indulging in the captivating narratives of K-dramas, and experimenting with new recipes in the kitchen. These activities not only provide a refreshing break but also fuel my creativity and problem-solving skills in unexpected ways. I'm always eager to take on new challenges and collaborate on projects that harness the poweWhenr of data to make a meaningful difference. Let's connect and explore how we can turn data into impact together!Experience
- Developed automated ETL pipelines in Python to resolve data integrity issues, such as deduplication, relationship mapping, and data validation using custom business logic, reducing processing time by 45% for datasets with 100K+ rows previously handled in Excel.
- Increased customer engagement by 30% by developing customer segmentation models based on revenue tiers and analyzing engagement patterns and KPI’s within each tier , which enabled more targeted and effective engagement strategies
- Increased survey response rates by 27% by targeting high-value users identified through Salesforce Opportunities Reports and engagement metrics.
- Created a logo renewal dashboard to monitor key metrics for the clients, including CSAT and NPS scores, account activity, product usage, enabling the customer success team to improve client retention forecasting and data-driven decision-making.
- Implemented predictive analytics models, like logistic regression,SVM, XGBoost, to forecast customer churn using health scores and unsubscribed product data. Evaluated model performance using accuracy (93%), recall (95%), and AUC-ROC (0.96), ensuring alignment with business objectives and minimizing false negatives.
- Created a dashboard to monitor key metrics for the clients, including CSAT and NPS scores, account activity, product usage, enabling the customer success team to improve client retention forecasting and data-driven decision-making
- Tools: Python, PowerBI, SQL
- Analyzed business requirements, designed pipeline integration for downstream applications, ensuring data integrity & reducing operational costs by 10% , contributing to the deployment of an end to end real time payments system.
- Engineered a data center monitoring system by collecting server logs with Splunk and optimizing data storage and retrieval in an Oracle database, leading to a 15% improvement in proactive technology issue prediction.
- BOptimized SQL queries and integrated transaction data from multiple systems, improving real-time payment processing efficiency by 15%, through query optimization, data validation, and seamless system integration.
- Design and implement robust data cleaning and preprocessing pipelines using Python (Pandas) and SQL, reducing data inconsistencies by 20%.
- Tools: Python, Oracel DB, Tableau, BMC Remedy
- Utilized SQL queries to extract and analyze client financial data to assess default risk. Enhanced data quality and performed exploratory analysis using Python libraries like NumPy, Pandas, Seaborn, and Matplotlib.
- Attained a 93% prediction accuracy employing Logistic Regression, Random Forest, and XGBoost models, following meticulous handling of missing values using diverse imputation techniques(Iterative Imputer,MICEForest etc)
- Developed a Tableau Dashboard, displaying average ratings of Key Performance Indicators such as service quality, value for money, and sentiments across multiple countries. Tools: Python, SQL, Tableau, PowerBI
Projects

Based on news articles and previous stock prices
- I have used both news articles and stock price history to forecast the stock price
- Predicted Apple & Google stock prices by analyzing news sentiment with Vader, Financial_Bert & Flan T5, incorporating summarization for enhanced Financial_Bert input.
- Leveraged deep learning models like LSTM, boosting models -XGBoost and CatBoost boosting models, along with SARIMAX models, for time-series prediction, ensuring robustness by addressing volatility and seasonal changes.

A LLM based automated radiology report generator
- Developed MediMind, an automated radiology report generation system using multimodal RAG for chest X-rays, implementing Qdrant for efficient text and image embedding storage and retrieval. Utilized LangChain framework to integrate RAG components
- RUtilized LLMs for report generation and evaluation, with an LLM serving as an impartial judge. The system achieved a 3/5 score across metrics (conciseness, relevance, factual accuracy, and completeness) for RAG-generated reports, compared to 1/5 for non-RAG responses

Using collaborative, content and hybird filtering.
- It uses a combination of techniques like content-based filtering (analyzing article keywords with TF-IDF), collaborative filtering (leveraging user behavior and preferences), and a hybrid model that blends both approaches for better accuracy.
- The project also incorporates advanced tools like K-Means clustering, SVD, and kernel PCA for improving recommendations and visualizing patterns.

A/B Testing conducted to evaluate the effectiveness of marketing campaigns by comparing a control group with a variant group.
Skills
Languages and Databases

-Logo.wine.png)



Libraries






Frameworks






Other




Education
Khoury College of Computer Sciences, Northeastern University
Boston,MA, USA
Degree: Master of Science(MS) in Data Science
- Worked as a research assistant under Professor Leanne Chukoskie on the online game of Rocket League to understand gaze strategy differences between experts and beginners
- Worked as a teaching assistant for DS2000- Introduction to Programming with Data Science
- Worked as a teaching assistant for GSND6330 - Statistics for Player Experience
- Introduction to Data Management and Processing
- Supervised and Unsupervised Machine Learning
- Natural Language Processing
- Large Language Models
- Participated at MIT reality Hack 2024
- Member of the Data Science Hub at NEU
Research Assistant:
Teaching Assistant:
Relevant Coursework:
Others:
Bangalore Institute of Technology, VTU
Bangalore,Karnataka, India
Degree: Bachelor of Engineering(BE) in Industrial Engineering and Management
- Statistics and Probability
- Artificial Intelligence
- Supply Chain Management
- Operations Management
Relevant Coursework: