Portfolio Details

Each project in this portfolio reflects my hands-on experience in solving real-world problems using data science, machine learning, and full-stack development. From predictive models to intelligent web applications, these solutions are built with a focus on scalability, functionality, and impact.

Project Information

  • Project Name: Duplicate Question Detector
  • Category: Natural Language Processing
  • Model Type: LSTM (Long Short-Term Memory) Neural Network
  • Github Repo: Link
  • Live link : Link

Technologies Used

The project is built using Python, Streamlit, and TensorFlow for the backend. It employs advanced NLP techniques including tokenization, sequence padding, and heuristic feature extraction using libraries like FuzzyWuzzy and NLTK. The frontend is developed with Streamlit for an interactive and responsive UI, enhanced with custom CSS for styling. Pandas is used for data manipulation, and Joblib for model serialization.

Details of Project

Duplicate Question Detector is a web-based application designed to identify whether two questions are semantically similar or duplicates. Users input two questions, and the system processes them through a trained LSTM neural network to predict their similarity with high accuracy.

The project demonstrates advanced NLP techniques for question pair similarity detection, leveraging both deep learning and heuristic feature engineering. It supports real-time analysis and provides insights into question similarity, making it valuable for platforms like Quora or Stack Overflow to optimize content management.

Project Features

  • Accepts two questions through an intuitive Streamlit-based text input interface.
  • Predicts whether questions are duplicates with a similarity score using an LSTM model.
  • Incorporates heuristic features like common word count, fuzzy matching, and token-based features.
  • Includes preprocessing steps: lowercasing, special character replacement, and contraction expansion.
  • Responsive UI with real-time analysis and detailed comparison metrics.