My Month with Artificial Intelligence
How AI Works: Chapter 1
I started reading Ronald T. Kneusel’s How AI Works.
- Deep learning is a subset of machine learning. Machine learning is a subset of artificial intelligence.
- There are AI technologies that fall outside of machine learning. This book will not focus on them.
- Machine learning builds models from data.
- Deep learning uses models that were once too large to be practical. Deep learning involves neural networks.
- Models are blank slates conditioned by data. If the data is bad, the model is bad.
- A machine learning model is a black box that accepts an input, usually a collection of numbers, and produces an output, typically a label like “dog” or “cat,” or a continuous value like the probability of being a “dog”.
- A model has “parameters”, which control the model’s output. Conditioning a model (“training”) sets the model’s parameters to produce accurate output for a given input.
- Training uses the collection of known inputs and outputs to adjust the
model’s parameters.
- Training is not programming. We don’t know what the algorithm should be. We only believes a relationship exists between input and the desired output. We hope a model can approximate that relationship well enough to be useful.
- We use known, “labeled data” to train the model. This approach is called “supervised learning” — we supervise the model while it learns to produce correct output.
- Models often want numeric class labels. Models don’t know what their inputs and outputs mean; they only make associations between sets of inputs and outputs.
It’s worth remembering the sage words of British statistician George Box, who said that all models are wrong, but some are useful. At the time, he was referring to other kinds of mathematical models, but the wisdom applies to machine learning.
- AI often uses vectors and matrices.
- A vector is a string of numbers treated as a single entity. For example, we can describe an iris flower as a string of four numbers (4.5, 2.3, 1.3, 0.3) — sepal length of 4.5 cm, sepal width of 2.3 cm, petal length of
1.3 cm, and petal width of 0.3 cm.
- By grouping these measurements together as a vector, we can refer to them as a single entity.
- The number of elements in a vector determine its dimensionality. The iris vector has four dimensions.
- Two-dimensional arrays of number are “matrices”. Machine learning often uses matrices to represent data sets, where each row is a vector.
- Inputs to models are its “features”. For the iris dataset, two features are petal length and petal width, which can be grouped into “feature vectors” (a.k.a., “samples”).
- The reliability of a model should not be judged on whether it can classify its training data correctly. Reserve some labeled data for testing after training.
- We can plot training data on an x,y grid. For vectors with more than two elements, there are techniques to flatten them onto a 2-D visual representation.
- A plot of the features visualized the “feature space”.
- Our training data may clump together in distinct, well-separated visual/spacial groups on the plot. This is true for our iris data.
- If the training data is well separated into distinct groups, our model could used as (and called) a “classifier”.
We have many model types to choose from for our classifier, including decision trees, which generate a series of yes/no questions related to the features used to decide the class label to output for a given input. When the questions are laid out visually, they form a structure reminiscent of an upside-down tree. Think of a decision tree as a computer-generated version of the game 20 Questions.
- The same approach can be used if the not as distinctly grouped, but the accuracy of the classifier will drop.
- Decision trees are one of the few types of models that are fairly easily explainable. Neural networks are less transparent than decision trees.
The MNIST database (Modified National Institute of Standards and Technology database[1]) is a large database of handwritten digits that is commonly used for training various image processing systems.[2][3] The database is also widely used for training and testing in the field of machine learning.
https://en.wikipedia.org/wiki/MNIST_database
https://www.nist.gov/itl/products-and-services/emnist-dataset
- Training data must include representative samples of the data likely to be input during real operation in order for a model to perform well.
- We train a model on some of the MNIST digits — 0, 1, 3, and 9 — with each 28×28 image unwound into a flat vector of 784 features.
- A “confusion matrix” (a two-dimensional table) shows how the model behaves for test data:
| 0 | 1 | 3 | 9
–|—–|——-|—–|—-
0 | 978 | 0 | 1 | 1
1 | 2 | 1,128 | 3 | 2
3 | 5 | 0 | 997 | 8
9 | 5 | 1 | 8 | 995
- The confusion matrix rows show the accurate labels for the training data. The columns are the model’s output. So, our model is fairly accurate.
- What if we input sevens, a digit the model wasn’t trained on? Our model has no way to say “I don’t know”.
- Consider the MNIST digits in terms of interpolation and extrapolation.
- Interpolation approximates within the range of known data.
- Extrapolation goes beyond known data.
- In terms of the MNIST digits, interpolation might include recognizing input of a digit that is tilted/rotated when none of the training data was tilted.
- Extrapolation might include recognizing a zero with a slash through it.
Another example for interpolation and extrapolation: model the world population between 1950 and 2020.
scikit-learn
https://scikit-learn.org/
Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities.
Links