for inference. Bias is a big concern in statistics, as
it can skew results and interpretations. Therefore,
it is also a big concern in machine learning.
Pre-made datasets exist, but they are often
unique to a particular problem or have no real-
world application. For example, if someone
shared a dataset to classify motion gestures,
the motions would be dependent on the type of
sensor used and its placement. Data collected
from a movement performed with a glove sensor
would look different than a similar motion where
the sensor is placed at the end of a wand.
However, a few pre-made datasets can help get
you started. I regularly use the Google Speech
Commands Dataset as the foundation for various
keyword spotting projects. This dataset consists
of several dozen spoken words, each containing
over 1,000 audio samples taken from different
speakers. I collect additional samples for my
target keyword or phrase, such as “trick or treat,
and use the pre-made dataset to fill out samples
for the “unknown” label.
The MNIST dataset contains thousands of
samples of the handwritten digits 0–9 (Figure
C
).
This dataset has been used in machine learning
research for decades and can be a great starting
point for optical character recognition (OCR)
systems. Converting handwritten addresses to
computer text, for example, helps postal services
WITH TRADITIONAL
PROGRAMMING, YOU
EXPLICITLY TELL A
COMPUTER WHAT IT NEEDS
TO DO USING CODE, BUT
WITH MACHINE LEARNING
THE COMPUTER FINDS
ITS OWN SOLUTION TO
A PROBLEM BASED ON
EXAMPLES YOU SHOW IT.
—HELEN LEIGH
25
makezine.com
Shawn Hymel, Josef Steppan CC BY-SA 4.0
B
C
M77_022-31_SS_MLdeepDive_F1.indd 25M77_022-31_SS_MLdeepDive_F1.indd 25 4/11/21 12:58 PM4/11/21 12:58 PM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.102.182