Stage 1: How Computers See Pictures
Keep your Colab notebook tab open all session. Open in a new tab — don’t use the buttons in this page to leave the course.
a notebook section that loads CIFAR-10, inspects image shapes, and displays labeled examples
that images are grids of RGB numbers, not magic objects
a labeled image gallery plus a written explanation of `(50000, 32, 32, 3)`
Before students type, draw one large pixel on the board and label it (R, G, B). Then run the shape cell live and point to each number in (50000, 32, 32, 3).
The big idea
A computer does not see a cat. It sees a grid of numbers.
Each picture is made of pixels. Each pixel stores three color numbers: red, green, and blue. CIFAR-10 gives us 60,000 tiny color images, already labeled as one of 10 classes. Before we train anything, we need to prove what the data looks like.
A human may say "cat" because the picture has ears, fur, and a face. A computer starts lower than that. It receives rows and columns of RGB numbers. Machine learning begins when we show many of those number grids with their labels, then let a model look for patterns that connect the grids to the labels.
- 1Photos / CIFAR-10labeled image examplesStage 1
- 2Notebook variablesx_train, y_train, class_namesSetup-2
- 3Prepared datanormalized pixels and fair pilesStage 3
- 4Keras modelCNN layers and summaryStage 4
- 5Training historyepochs, loss, accuracyStage 5
- 6Test evidencesealed score and mistakesStages 6-7
- 7Improved modelaugmentation comparisonStage 8
- 8Inferenceuploaded image to top-3 guessesStage 9
- 9Demo evidencetable, confidence, limitationStage 10
Stage 1 starts the left side of the workflow. CIFAR-10 photos become notebook arrays and labels, which every later stage will reuse.
- pixel
- one tiny square of color in an image
- RGB
- the red, green, and blue numbers that make a pixel color
- dataset
- a collection of examples and answers
- label
- the correct answer for one example
- shape
- the size and structure of a tensor or array
- TensorFlow
- a Python machine-learning toolbox we will use to load data, build models, and train them
- Keras
- the friendlier model-building API inside TensorFlow
Open your saved Colab notebook from Setup. Work in the Stage 1 section.
Build it
Step 1 — Translate one pixel first
Before using any library, write this in a text cell:
If one pixel is [200, 40, 20]:
- the red amount is _____
- the green amount is _____
- the blue amount is _____
- I predict the color would look mostly __________ because __________.
This is the computer's starting point: color as numbers.
Step 2 — Predict the data shape
Before loading anything, write a text cell:
My prediction:
CIFAR-10 training images will have shape: __________
I think each image needs ___ color numbers per pixel because __________.
This makes the first code cell a test of your thinking, not just a command to run.
Step 3 — Meet the tools before importing them
The next cell imports tools. Each one has a job:
tensorflowis the machine-learning toolbox. Later it will build and train the model.tensorflow.keras.datasets.cifar10is a dataset helper inside TensorFlow. It gives us practice images and labels without making us download files by hand.matplotlib.pyplotis a drawing tool. It lets humans see the tiny images that the computer stores as numbers.
Nothing here is the AI yet. These tools only load and display data.
Step 4 — Import the tools and load CIFAR-10
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
import matplotlib.pyplot as plt
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
The first run may download the dataset. Nothing has trained yet; you only loaded examples and labels.
Step 5 — Inspect the shapes
print("Training images:", x_train.shape)
print("Training labels:", y_train.shape)
print("Test images:", x_test.shape)
print("Test labels:", y_test.shape)
You should see x_train as (50000, 32, 32, 3). Read it as: 50,000 images, each 32 pixels tall, 32 pixels wide, with 3 color channels.
Step 6 — Inspect one pixel
print("First image, top-left pixel:", x_train[0][0][0])
That one pixel should print three numbers like [59 62 63]. Those are red, green, and blue values.
Step 7 — Show labeled examples
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
fig, axes = plt.subplots(2, 5, figsize=(10, 5))
for i in range(10):
row = i // 5
col = i % 5
axes[row, col].imshow(x_train[i])
axes[row, col].set_title(class_names[y_train[i][0]])
axes[row, col].axis('off')
plt.show()
Now you can see the tiny images and their decoded labels.
Understand it
TensorFlow did not magically see pictures for us. It gave Python a ready dataset. Keras gave us a clean way to access that dataset. Matplotlib helped us display examples so our human eyes could check what the numbers represent.
The label is stored as a number because computers compare numbers more easily than words. The class_names list is a decoder for humans: if the label number is 3, the word at position 3 is cat.
The test set matters from the start. We will train on x_train, but hide x_test until we need an honest grade. If the model gets to practice on the test pictures, the final score stops meaning anything.
Try this
Try this
Three short experiments. Predict before you run, then test your guess.
Pick image index 17. Predict its label before you display it. Then show it with plt.imshow(x_train[17]) and decode y_train[17][0].
Compare x_train[0].shape and x_test[0].shape. Are individual images shaped differently, or are only the piles different sizes?
A black-and-white image needs one number per pixel instead of three. What would the last number in the shape probably be?
Test your stage
- Your notebook shows the training image shape and test image shape.
- You printed one pixel and can explain why it has three numbers.
- You displayed 10 labeled CIFAR-10 images.
- Workflow check. Point to this stage on the workflow map and explain what the image arrays create for later stages.
- Tool check. In a text cell, explain what TensorFlow, Keras, CIFAR-10, and Matplotlib each did in this stage.
- Design check. In a text cell, explain
(50000, 32, 32, 3)and why the final number is3.
If it breaks
NameError: name 'x_train' is not defined. Run the load cell again.- Images are blurry. That's normal. CIFAR-10 images are only 32 by 32 pixels.
- Labels show numbers instead of words. Use
class_names[y_train[i][0]].
Do not rush the shape. The course depends on students believing that images are numbers before they believe a model can learn from them.