SHASHANK S

A Data Scientist who has crafted countless data-driven insights and solutions.

SHASHANK S

A Data Scientist who has crafted countless data-driven insights and solutions.

SHASHANK S

A Data Scientist who has crafted countless data-driven insights and solutions.

May 31, 2020

3min read

Estimating Your Data Science Salary: A Data-Driven Approach

Ever wondered what to say when an interviewer asks you “How much are you expecting?”

Well, I was going through youtube and found an amazing video series on this topic by Ken Jee. And I’ve implemented the same which might help me and others like me in the near future when I’m in an interview.

This project lets you to estimate various salaries in the data science field from a junior analyst to a manager in the States. The estimator gives you an estimation by using the data collected from Glassdoor.com of around a 1000 companies. I would love to implement the same and estimate the salaries in India as well which is why I thought of doing this project. To do this the web scraper I used was selenium and is configured using this article and it doesn’t work on glassdoor.co.in. I guess I’ll do the same by the end of this month and keep you updated.

Checkout this playlist to get a step by step walkthrough on the project. I’ll be giving you a brief overview of the steps involved.

  1. Scraping the data from glassdoor.com using selenium and python.

  2. Data cleaning through which we obtained new features from existing features which had some irrelevant information in them.

  3. Explanatory data analysis lets you identify the outliers and why certain companies have less average salaries though based in major cities and which company hires which roles the most.

  4. Model building using regression.

  5. Creating a flask api endpoint that can run on your local server. This api takes in a list of values from a job listing and returns the estimation.

The best part about this is you get to implement this step by step and makes you think what went wrong and where. Also you could put this model into production using heroku or other platform which I haven’t done but I’ll look into it now. The final model gives you a mean absolute error of around $11k which isn’t great but it is still helpful. I’d love to hear from you all if you have suggestions on how it’s been implemented on my github here or reachout to me on linkedin. This would let me figure out what else could have been done to give a better result. If you are working on something or you have any ideas let me know. I’d love to join you.

All the credits and the code snippets used are mentioned clearly on my github repo.

LET'S WORK
TOGETHER

LET'S WORK
TOGETHER

LET'S WORK
TOGETHER

Copyright © Shashank | Powered by Framer