Everybody loves Artificial Intelligence and Data Science (DS) and that is likely to continue for the next ten years. Most people don’t know much about data science and the capabilities of machine learning and AI algorithms.
This is a very common phenomenon in every field of expertise. You might ask yourself: Do you know what DevOps (Network Operation Center), Support, or NOC (DevOps) really do? Yes, technology professionals can explain it better than those who aren’t in the industry. However, it is often difficult to understand what others do if it hasn’t been done before.
This would usually be fine in most cases. Although it is nice to have knowledge from other areas of expertise, you can still manage your life well enough without all the details. You might find that the extra knowledge you acquire is not helpful in many cases. However, Data Science is very different than the examples I have just given because data is everywhere.
Data science and data-driven decision-making are easier than ever. Data-driven decision-making can be used by every department of an organization. DevOps can make use of Machine Learning (ML), to analyze their pipelines and identify anomalies. Support can use clustering algorithms for similar customer requests to reduce their workload. Network Operation Center can also use anomaly detection algorithms, to detect problems in networks. We believe everyone can benefit from DS so we created a way for every employee to be able to use data science skills to empower them and share our love of data science.
Data Science Workshop Goals
We created a workshop to help anyone who has basic Python skills quickly grasp DS and understand how it works.
The following goals were set by us:
- We can provide “Data Science Ambassadors” outside our team with tools, a basic understanding of Data Science, and the ability to work with them.
- So that “ambassadors”, can be “on the job”, and keep the training short and effective.
- Spread awareness about data-driven decision making, and explain the benefits of it
These goals would be achieved with the following agenda
- Explain ML and the basics of algorithms
- Demonstrate your ability to spot machine-learnable issues
- You will be able to practice ML with Python and Sklearn.
We thought that an online DS course was possible. We did find many online courses but none that we liked for the reasons listed below.
- We were able to cover as much information in our short courses as possible, but very few courses do.
- Many courses are not meant to be taught in a class. They are designed for individuals to learn by themselves.
- Math and statistics weren’t prerequisites for the workshop. Therefore, we needed theoretical explanations about ML models that could easily be understood by everyone
We decided that the workshop would have to be created by us.
Next, I will explain why our workshop agenda was created and why it was so efficient in teaching Data Science to newcomers within such a short amount of time.
Explain ML and the basics of algorithms
Imperva’s largest DS team, we often work with people from different departments, including Product, Dev, and Support, to help us complete our projects. We noticed that many people had difficulty understanding how the ML solution fits into the projects and their capabilities. We initially thought people would have difficulty understanding the ML solution because it was new.
However, we were surprised to find that these problems didn’t occur in subsequent projects. It became much easier to explain how our ML solutions fit into the project. It was clear that people did not struggle with our ML solution. They struggled with ML in general. Once they understood the basics, things became much easier.
Demonstrate your ability to spot machine-learnable issues
No matter what job title we have, manual tasks are an integral part of our work environment. These tasks are not automated and require human interaction. Some tasks may seem impossible to automate, but a good Data Scientist will likely be able to create an ML model capable of performing the task.
Our DS team is limited in its ability to take on multiple projects due to prioritization. Most of these projects are related to core products. We don’t want to spend time on peripheral projects when we have the chance. Instead, we will use it to sort through the many manual tasks that are going on within the company and figure out which can be automated with ML.
We used our solution to train employees in other departments to become “Data Science Ambassadors”.These Ambassadors will be able to spot problems that could be solved with ML, Then, depending on the complexity of the problem, you can either create a new model yourself, work with our mentors, or simply send the problem to us for our backlog.
Practice hands-on ML using Python and Sklearn
We wanted to encourage people to practice ML, not just give a high-level explanation. This is because we believe that practice is better than theory. Participants could learn ML concepts and identify ML-related problems. they could also start thinking about solving these problems themselves. They will also be able to experience DS for the first time and decide if it’s something that interests them.
How we did it
To make it accessible for everyone, we divided the workshop into four days. Each day took three hours to complete. This allowed us to keep the workshop practical while still allowing people to do their jobs. Each day was different.
- Pre-workshop – Installations basic tools like Python, Jupyter Notebooks, basic math, and visualization Python packages
- Day 1 – An overview of Machine Learning (Numpy Pandas, Seaborn), and basic Python packages
- Day 2 – Supervised learning – Linear & logistic regression, Decision Trees, and Random Forest
- Day 3 – Unsupervised learning – DBScan & K-Means
- Day 4 – EDL, Exploratory Data Analysis, Feature Engineering, and evaluation metrics. Final project.
We used the following three-part formula to ensure the best delivery.
- Code examples
An online presentation tool was used to create our slides. We wanted to share them and make changes as we went along. The slides were used to communicate ideas, concepts, and algorithms without displaying any code.
Live code examples were displayed using Jupyter Notebooks. Participants could copy the notebooks to our Git repository and then run the code. This helped them understand the various commands. It was something they could also keep after the workshop. Additionally, since notebooks are a common practice among Data Scientists, using a notebook is training in itself. This meant that they had a place to go if they were interested in DS further.
The internet is full of examples of datasets that can be used to illustrate different concepts, methods, and algorithms.
We also used Jupyter Notebooks for the exercises. We were able to provide off with some basic commands for the exercise, such as loading data, so they could focus on the actual exercise. It was also very simple to copy-paste each of the required commands because the code examples were located in the same notebook that the exercise.
A workshop with great content is great, but you need more than good content to make it great!
We sent out a survey each day to get feedback on the previous day’s workshop and help improve the next day. To get feedback, we used a 1-to-5 scale.
- Logistics – Schedule, refreshments, and breaks
- Quality of content – How good were the slides and exercises?
- Relevance of content – How relevant was the content to participants’ daily work?