Introduction to machine learning

I once had a boss whom I told I was using machine learning to discover more about our data. His response was, What do you think you can learn that I don't already know! If you haven't encountered one of those in your career, congratulations. Also let me know if you have any openings! But you more than likely have, or will. Here's how it was handled. And no, I didn't quit!

Me: "The goal is to learn more information and details about the funds that we have and how they may relate to what the user actually means."

Him: "But I already know all that. And machine learning is just a buzzword, it's all data in the end, and we're all just data stewards. The rest is all buzzwords. Why should we be doing this and how is it going to help me in the end."

Me: "Well, let me ask you this. What do you think happens when you type a search for something in Google?"

Him: Deer-in-the headlights look with a slight hint of anger.

Him: "What do you mean? Google obviously compares my search against other searches that have historically looked for the same thing."

Me: "OK, and how does that get done?"

Him: A slightly bigger hint at anger and frustration.

Him: "Obviously its computers searching the web and matching up my search criteria against others." Me: "But did you ever think about how that search gets matched up amongst the billions of other searches going on, and how all the data behind the searches keeps getting updated? People obviously cannot be involved or it wouldn't scale."

Him: "Of course, algorithms are finely tuned and give the results we are looking for, or at least, recommendations."

Me: "Right, and it is machine learning that does just that." (not always but close enough!)

Him: "OK, well I don't see what more I can learn from the data so let's see how it goes."

So, let's be honest folks. Sometimes, no amount of logic will override blinders or resistance to change, but the story has a much different and more important meaning behind it than a boss who defies everything we learned in biology. In the world of machine learning, it's a lot harder to prove/show what's going on, whether or not things are working, how they are working, why they are (or are not) working, and so on to someone who isn't in the day-to-day trenches of development like you are. And even then, it could be very difficult for you to understand what the algorithm is doing as well.

Here are just some of the questions you should be asking yourself when it comes to deciding whether or not machine learning is right for you:

  • Are you just trying to be buzzword compliant (which might be what's really being asked for) or is there a true need for this type of solution?
  • Do you have the data you need?
  • Is the data clean enough for usage (more on that later)?
  • Do you know where, and whether, you can get data that you might be missing? More importantly, how do you know that data is in fact missing?
  • Do you have a lot of data or just a small amount?
  • Is there another known and proven solution that already exists that we could use instead?
  • Do you know what you are trying to accomplish?
  • Do you know how you are going to accomplish it?
  • How will you explain it to others?
  • How will you be able to prove what's going on under the hood when asked?

These are just some of the many questions we will tackle together as we embark on our machine learning journey. It's all about developing what I call the machine learning mindset.

Nowadays, it seems that if someone does a SQL query that returns more than one row, they call themselves a data scientist. Fair enough for the resume; everyone needs a pat on the back occasionally, even if it's self-provided. But are we really operating as data scientists, and what exactly does data scientist mean? Are we really doing machine learning, and what exactly does that mean? Well, by the end of this book, we'll hopefully have found the answers to all of that, or at the very least, created an environment where you can find the answers on your own!

Not all of us have the luxury of working in the research or academic world. Many of us have daily fires to fight, and the right solution just might be a tactical solution that has to be in place in 2 hours. That's what we, as C# developers, do. We sit behind our desks all day, headphones on if we're lucky, and type away. But do we ever really get the full time we want or need to develop a project the way we'd like? If we did, there wouldn't be as much technical debt in our projects as we have, right (you do track your technical debt, right)?

We need to be smart about how we can get ahead of the curve, and sometimes we do that by thinking more than we code, especially upfront. The academic side of things is invaluable; there's simply no replacement for knowledge. But most production code in corporate America isn't written in academic languages such as Python, R, Matlab and Octave. Even though all that academic wealth is available, it's not available in the form that suits us best to do our jobs.

In the meantime, let's stop and praise those that contribute to the open source community. It is because of them that we have some excellent third-party open source solutions out there that we can leverage to get the job done. It's such a privilege that the open source community allows us to utilize what they have developed, and the objective of this book is to expose you to just some of those tools and show how you can use them. Along the way, we'll try and give you at least some of the basic behind-the-scenes knowledge that you should know, just so that everything isn't a black hole versus a black box!

You've heard buzzwords everywhere. I used to have a 2-4 hour commute to and from work each day, and I can't remember the total number of billboards I would see that had the words machine learning or AI on them. They are everywhere, but what exactly does it all mean? AI, machine learning, data science, Natural Language Processing (NLP), data mining, neurons, phew! It seems like as soon as corporate America got involved, what was once a finely tuned art became a messy free-for-all, and micro-managed project with completely unreal expectations. I've even heard a prospective client say, I'm not sure what it means, but I just don't want to be left behind!

The first thing we must do is to learn the proper way to approach a machine learning project. Let's start with some definitions:

Tom Mitchell has defined machine learning as:

"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E."

Our definition is going to be just a bit different. It will hopefully be something that you can use when asked to defend your chosen path:

"Machine learning is a collection of techniques which can be used to deal with large amounts of data in the most efficient and effective manner possible, which will derive actionable results and insight for us from that data."

Now, what about those things we call techniques? Make no mistake; techniques such as probability, statistics, they are all there, just hidden under the covers. And the tools we're going to use to perform our examples will hide the details just like Python, R, and the rest of them do! That being said, it would be a complete disservice to you if we didn't at least make you aware of some of the basics, which we'll cover in a moment. I don't mean to lower the importance of any of them as they are all equally important, but our goal here is to get all C# developers up and running as quick as possible. We're going to give you enough information to make you buzzword compliant, and then you'll know more than just the block box API calls! I encourage each one of you to pursue as much academic knowledge as possible in this field. Machine Learning and Artificial Intelligence are changing daily it seems, so always keep up with the latest. The more you know, the better you will be at gaining acceptance for your project.

Since we brought up the topic of buzzword compliant, let's clear up a few terms right from the start. Data mining, machine learning, artificial intelligence, the list goes on and on. I'll only cover a few terms for now, but here's an easy way to think about it.

You're on a road trip with your family. Let's assume you have children, and let's put aside the are we there yet conversations! You are driving down the highway and one of your kids (a very young toddler), yells TRUCK and points out the window at a truck. This child is very young, so how did he know that particular vehicle was a truck (let's assume it really was!). They know it's a truck because every previous time they did the same thing you said Yes or No. That's machine learning. Then, when you told them Yes or No, that's reinforcement learning. If you said Yes, that's a big truck, that's adding context to the reinforcement, and that moves us down the road into deep learning. See what you've been teaching your children that you didn't even know about?

Hope that helped.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.255.250