Auchinto Chatterjee


Data Scientist

Looking forward to solve interesting problems...

Portfolio

Professional Work

End-to-end Job Scraper Pipelines

Data Extraction ♦ Backend Service ♦ Dashboard ♦ Data Pipelining
★ ★ ★

Delivering a hosted backend service, a plugin or a dashboard responsible for capturing Job posting details from one or more Job portals across the globe. These services should be triggered at regular time intervals and fetch the latest job postings to the database. The sources included international and regional job portals such as LinkedIn, naukri.com among others.

★ ★ ★

Technologies

Python • • Beautiful Soup • Selenium • Flask • Chart.js • SQL

Custom Site Scraper Pipelines

Data Extraction ♦ Backend Service ♦ Web Automation ♦ Data Pipelining
★ ★ ★

Delivering a hosted backend service, a plugin or a dashboard responsible with custom scrapers specific to sites as specified by the clients. These services are also responsible for data preprocessing and storage functionality as per the client requirements. These services should be triggered at regular time intervals and fetch the latest job postings to the database. These sites included domains like real-estate, auctions, scam alerts and others.

★ ★ ★

Technologies

Python • Beautiful Soup • Selenium • Flask • Pandas • SQL

Lead Generation Pipelines

Data Extraction ♦ Data Engineering
★ ★ ★

Gathering leads from multiple sources such as social media platforms, public APIs, and Google Custom Search API, which are further consolidated and delivered to clients for marketing and sales outcomes.

★ ★ ★

Technologies

Python • Flask • Beautiful-Soup • Selenium • Pandas

Social Media Posts Capture and Topic Modelling

Data Extraction ♦ Natural Language Processing ♦ Data Visualization
★ ★ ★

Capturing posts and metadata from target Facebook groups, and performing topic modelling on the post bodies to gather the topic tags of the Groups.

★ ★ ★

Technologies

Python • Flask • NLTK • Sklearn • Wordcloud • Chart.js

*The above mentioned projects are generalised product categories which have been compiled from multiple client requests