Successful Data Science Projects: Habits to adopt and questions to ask
Data Scientists are aware that a successful project needs visible real-time business level impact with a proper plan, hence the role of project management becomes vital. Any industry that adopts a proper project management system, delivers stellar results and there are no exceptions when it comes to our beloved Data Science world.
However, given the unique nature of data science projects, it is not always easy to sort things out at the inception of the project. There are interdependencies and deliverables at stake that may appear as sub tasks which ends up as a surprise most of the times. So the obvious questions is what can be done to make our project successful with business level impact?
Well, here is the short list of habits or behavior's we need to nurture to make our project successful.
- Communication is the key to success. Talk to the team about the efforts that you put in the project and be transparent about the dependencies and blockers. Never forget to be honest when you need help from others.
- Cross team collaboration is an inevitable part of any data science project. While working with engineers, you learn what multi thread is and they learn what a XGBT model is and learning to speak other’s language can greatly improve your efficiency and might help in ling run.
- Documentation is the key for efficient problem solving and maintenance. It advocates of your efforts in the project and it can come handy for other teams to refer your document for design, implementation, deployment and maintenance.
- Knowledge Sharing plays a key role for team growth. Every time you learn something new, never hide it. Always share with others so that team can also learn and grow along.
Every project is a team effort, hence keep everyone on the same page to move forward as a team.
What Key questions to ask business stakeholders for a successful data science project?
The job of a Data Scientist is to translate business needs into technical model requirement. But how to do this translation and what do we need from business stakeholders to formalize our solution requirement? Well, here is what we can ask:
1. What is the Business Goal of the project?
Get a high level understanding around what the goal of the project is, what the output looks like, who are the end users and what metrics will be used to measure success. This helps frame and directs everything else towards the best solutions
2. Can this problem be solved or enhanced with Data Science?
Many aspects of business operations can run or be enhanced without having a Data Science team involved. For efficiency sake, always make sure Data Science isn’t forced onto every project without valid justification.
Based on the goal of the project, Data Scientist’s should work with the stakeholders to answer this question.
3. What does the Minimum Viable product(MVP) look like?
MVP is a version of a product or solutions with just enough features to satisfy the high level goal of the project. As it’s name suggests, it is not a final solution, but a sample of the end solution which has scope of scaling and enhancements.
The MVP gives all teams involved an entry point for the project and scaling up from there becomes much more manageable task for stakeholders without any heavy investment of time and money.
4. Are there any considerations around ethics or compliance?
While there are ethic or compliance considerations across the board, it is also very relevant to the data industry. We as data scientist always need to be aware of permissions to access data sources and also need to consider bias as the machine learning model’s efficiency is not reliable with hidden bias in data.
This is something we of course want to avoid, so asking this question to the business as early as possible is vital.
5. How will the product or solution change over-time?
Will the solution made perform just as well in six months or a year’s time as it will now, or not? How are we going to measure this? Setting up a threshold at which we would consider our machine learning model that is being tracked in production via a dashboard, not meeting the overall business goal is an important factor to be considered.
Also, an important point of discussion with the stakeholder’s would be if there is going to be any data drift in the near future or any change in regulation that we might need an update on.
Last but not the least discuss with the business about frequency of retraining the model and automation of pipelines as it improves the efficiency dramatically.
I hope you enjoyed this article. Let me know about your thoughts and suggestions for improvement in the comments section. Looking forward to the next topic in the near future.