Talking about Data Science and the innovative insights it can bring is easy. Delivering on them is much harder. Delivering Data Science projects, similar to AI and Machine Learning projects, is of a slightly different flavor than a standard IT project. There are more unknowns since much of the data is performing predictions or uncovering insights that are not known at project initiation.
I’ve found that Microsoft’s Team Data Science Process (TDSP) is a good starting framework for new data science projects. It also incorporates the CRISP-DM standard for data science projects which is foundational to most data science development (CRoss Industry Standard Process for Data Mining). In general, data science projects have the following stages:
- Business Understanding
- Data Acquisition and Understanding
- Customer Acceptance
Disclaimer: Do not interpret the mention of the above stages or phases as advocating a waterfall approach to data science projects. My intent is quite the opposite. Agile methodologies that include spring planning, feature release, and users story completion for each stage should absolutely be performed. The stages are merely a guideline to help stakeholders and project team members understand the overall delivery approach of the data science effort.
When planning a data science project, it’s also important to plan for project scope creep or drift. Not to account for time, but rather new discovery. For most data science projects, you’re starting with a question or hypothesis that you expect the data results to bring. The actual test results though could create a completely different project journey. As a result, it’s important for data science projects to have continuous improvement and learning of the original question in order to account for ancillary discoveries of data. These new discoveries can either be placed in a “parking lot” or pursued further.
Another area that is unique to data science projects is where and how to store shared data (part of Data Acquisition and Understanding), especially data used for training sets and modeling. How data is managed and where it is located or centralized is paramount to ensuring teams are using the right data for the right purpose. Knowledge and data management practices should be part of any project plan before modeling begins.
Below is a roundup of additional articles and frameworks that offer insight into managing data science projects:
- Microsoft’s Team Data Science Process: Link
- A Beginner’s Guide to Industry Standard Process of Data Mining – CRISP-DM: Link
- CRISP-DM 1.0 Step-by-Step Data Mining Guide: Link
- Very detailed, in-depth guide to the CRISP-DM methodology
- Data Science and the Art of Persuasion – Harvard Business Review: Link
- Project Management in Data Science: Link
- Data Science Project Management Methodologies: Link
- Role of Project Manager in Data Science: Link
- 10 Reasons Why Data Science Projects Fail: Link
- Data Science Roadmap Worksheet – Oracle: Link
- An Evaluation of Big Data Analytics Projects and the Project Predictive Analytics Approach: Link
- How To Decide Which Data Science Projects To Pursue – Harvard Business Review: Link
- Guiding Principals for Data Science Project Management: Link
- Agile Framework for Analytics: Link
- From Storming To Norming: How to Build High-Impact Data Teams: Link
- Do Your Data Scientists Know the Why? Behind Their Work? – Harvard Business Review: Link
- Why Managing Data Scientists is Different – MIT: Link
Published on August 30, 2020