Chapter 12

Common Problems With Machine Learning

Machine Learning can provide a great deal of advantages for any marketer as long as marketers use the technology efficiently. Knowing the possible issues and problems companies face can help you avoid the same mistakes and better use Machine Learning.

Common Practical Mistakes

Focusing Too Much on Algorithms and Theories

Leave advanced mathematics to the experts. As you embark on a journey with Machine Learning, you’ll be drawn in to the concepts that build the foundation of science, but you may still be on the other end of results that you won’t be able to achieve after learning everything. Fortunately, the experts have already taken care of the more complicated tasks and algorithmic and theoretical challenges. With this help, mastering all the foundational theories along with statistics of a Machine Learning project won’t be necessary.

Mastering ALL of Machine Learning

Once you become an expert in Machine Learning, you become a data scientist. For those who are not data scientists, you don’t need to master everything about Machine Learning. All you have to do is to identify the issues which you will be solving and find the best model resources to help you solve those issues.

For example, for those dealing with basic predictive modeling, you wouldn’t need the expertise of a master on natural language processing. Why would you spend time being an expert in the field when you can just master the niches of Machine Learning to solve specific problems?

Using Changing or Premade Tools

Previously, we’ve discussed the best tools such as R Code and Python which data scientists use for making customizable solutions for their projects. For the non-experts, tools such as Knime and Amazon S3 could already suffice. With these simple but handy tools, we are able to get busy, get working, and get answers quickly. All that is left to do when using these tools is to focus on making analyses.

When you have found that ideal tool to help you solve your problem, don’t switch tools. Many developers switch tools as soon as they find new ones in the market. Although trying out other tools may be essential to find your ideal option, you should stick to one tool as soon as you find it. Don’t play with other tools as this practice can make you lose track of solving your problem.

Having Algorithms Become Obsolete as Soon as Data Grows

Machine Learning algorithms will always require much data when being trained. Often, these Machine Learning algorithms will be trained over a particular data set and then used to predict future data, a process which you can’t easily anticipate. The previously “accurate” model over a data set may no longer be as accurate as it once was when the set of data changes. For a system that changes slowly, the accuracy may still not be compromised; however, if the system changes rapidly, the Machine Learning algorithm will have a lesser accuracy rate given that the past data no longer applies.

Getting Bad Predictions to Come Together With Biases

Machine Learning algorithms can pinpoint the specific biases which can cause problems for a business. An example of this problem can occur when a car insurance company tries to predict which client has a high rate of getting into a car accident and tries to strip out the gender preference given that the law does not allow such discrimination. Even without gender as a part of the data set, the algorithm can still determine the gender through correlates and eventually use gender as a predictor form.

With this example, we can draw out two principles. The first you need to impose additional constraints over an algorithm other than accuracy alone. Second, the smarter the algorithm becomes, the more difficulty you’ll have controlling it.

Developers always use Machine Learning to develop predictors. Such predictors include improving search results and product selections and anticipating the behavior of customers. One reason behind inaccurate predictions may be overfitting, which occurs when the Machine Learning algorithm adapts to the noise in its data instead of uncovering the basic signal.

When you want to fit complex models to a small amount of data, you can always do so. Doing so will then allow your complex model to hit every data point, including the random fluctuations. Depending on the amount of data and noise, you can fit a complex model that matches these requirements.

Marketers should always keep these items in mind when dealing with data sets. Make sure that your data is as clean of an inherent bias as possible and overfitting resulting from noise in the data set. You can deal with this concern immediately during the evaluation stage of an Machine Learning project while you’re looking at the variations between training and test data.

Making the Wrong Assumptions

Machine Learning models are not able to deal with datasets containing missing data points.Therefore, features that contain a large portion of missing data need to be deleted. Conversely, if there are only a few missing values in a feature, instead of deleting it, we could fill those empty cells. For features that contain continuous variables, one popular approach to this issue is using mean value as a replacement for the missing value.

If variables are discrete, we could consider using mode value to replace missing values. Whether they’re being used in automated systems or not, Machine Learning algorithms automatically assume that the data is random and representative. However, having random data in a company is not common.

The best way to deal with this issue is to make sure that your data does not come with gaping holes and can deliver a substantial amount of assumptions.

Receiving Bad Recommendations

Recommendation engines are already common today. While some may be reliable, others may not seem to be more accurate. Machine Learning algorithms impose what these recommendation engines learn. One example can be seen when a customer’s taste changes; the recommendations will already become useless. Experts call this phenomenon “exploitation versus exploration” trade-off. In the event the algorithm tries to exploit what it learned devoid of exploration, it will reinforce the data that it has, will not try to entertain new data, and will become unusable.

Having Bad Data Convert to Bad Results

Not all data will be relevant and valuable. If data is not well understood, Machine Learning results could also provide negative expectations. The initial testing would say that you are right about everything, but when launched, your model becomes disastrous. When creating products, data scientists should initiate tests using unforeseen variables, which include smart attackers, so that they can know about any possible outcome. Have your Machine Learning project start and end with high-quality data. Having garbage within the system automatically converts to garbage over the end of the system.

Machine Learning Goes Wrong

Despite the many success stories with ML, we can also find the failures. While machines are constantly evolving, events can also show us that ML is not as reliable in achieving intelligence which far exceeds that of humans.

Below are a few examples of when ML goes wrong.

Microsoft And The Chatbot Tay

Microsoft set up the chatbot Tay to simulate the image of a teenage girl over Twitter, show the world its most advanced technology, and connect with modern users. The company included what it assumed to bean impenetrable layer of Machine Learning and then ran the program over a certain search engine to get responses from its audiences.

The developers gave Tay an adolescent personality along with some common one-liners before presenting the program to the online world. Unfortunately, the program didn’t perform well with the internet crowd, bashed with racist comments, anti-Semitic ideas, and obscene words from audiences. In the end, Microsoft had shut down the experiment and apologized for the offensive and hurtful tweets.

With this example, it would seem that Machine Learning-powered programs are still not as advanced and intelligent as we expect them to be. Research shows that only two tweets were more than enough to bring Tay down and brand it as anti-Semitic. However, in Tay’s defense, the words she used were only those taught to her and those from conversations on the internet. In light of this observation, the appropriateness filter was not present in Tay’s system.


Uber has also dealt with the same problem when Machine Learning did not work well with them. This ride-sharing app comes with an algorithm which automatically responds to increased demands by increasing its fare rates. During the Martin Place siege over Sydney, the prices quadrupled, leaving criticisms from most of its customers. The app algorithm detected a sudden spike in the demand and alternatively increased its price to draw more drivers to that particular place with high demand. Machine Learning understood the demand; however, it could not interpret why the particular increased demand happened. With this example, we can safely say that algorithms need to have a few inputs which allow them to connect to real world scenarios.

These examples should not discourage a marketer from using Machine Learning tools to lessen their workloads. These tales are merely cautionary, common mistakes which marketers should keep in mind when developing Machine Learning algorithms and projects. Machine Learning models are constantly evolving and the insufficiency can be overcome with exponentially growing real-world data and computation power in the near future.