When a company’s employees start using machine learning, some of them expect to build models immediately with desired results. And although some processes are similar to engineering magic, there are essential non-technical issues that need to be addressed. If you correctly identify these issues and process data with proper quality, then the probability of your project’s successful implementation will greatly increase. The following tips will help to save the time of any manager or engineer leaping into machine learning.
First, ask the right questions — they are the hardest part of the ML world. The goal of any project is to answer the following questions: what is this, who wrote it, what patterns are there, how much will it cost?
Asking specific questions that need to be answered will help you to determine the objective function. Without posing them, your team can waste many hours collecting, cleaning, modelling and result in a useless product. Such failures can reduce the perceived value of machine learning in your company.
To avoid this, before promoting the project, you must clearly ask the question you want to find the answer to. Then you must determine the objective function (for example, maximise accuracy) that you plan to use for measuring your progress. Even if your first question may not be quite correct, at least you will make progress towards a specific goal.
90% of the effort is collecting and pre-processing data, and only 10% is testing, debugging, and operating the created model. With shallow learning, you must study the information and its technical features and then translate it into a suitable format. Even if you use deep learning, data needs to be studied and transformed in order to get acceptable performance. And although many operations are performed by a few lines of code using libraries, studying the data and checking for suitability is still time-consuming.
Function development requires a complete understanding of the business specifics. Proper functions can significantly improve the result, so you cannot just charge a data scientist with solving a problem. People who start using ML in your company must have enough time to prepare and study the data. Mind the effort required to complete the preparatory phase.
Most likely, in the production environment, you do not collect all the data you are hoping for. Therefore, it is better to be prepared in advance that further improvements may be required. So plan the data pre-processing phase with a margin of time for managing expectations.
Machine learning and deep learning produce results based on some kind of mathematical transformation performed with training data. These algorithms can find regularity in data that people are not able to detect. However, the model cannot think and make decisions. It only adjusts values in order to maximise or minimise its objective function. Try this model on a different data set and you will realise how specific its ‘knowledge’ is.
You need to think why the model makes such a conclusion, whether it correlates with real data, and whether there will be unexpected consequences of use. It is important to pay attention to these questions because it is often easier to take the conclusion (i.e. the forecast) of the model and use it without taking into account possible bias. Since the model does not ‘think,’ it will not adapt to your ethics unless you include this clause in the objective function and training data.
Although machine learning tools and techniques are rapidly developing, there is a number of additional considerations that must be taken into account and used as well. Focusing on the right goal, processing the data with the proper quality, evaluating the output results must be done every time you implement a project using machine learning. As the technical capabilities of machines grow exponentially, we all need to promote our machine learning support activities faster and zealously.