Skip to main content

Stage 1: How Computers See Pictures

Course progressStage 1 of 10
~75 min
Your workspace

Keep your Colab notebook tab open all session. Open in a new tab — don’t use the buttons in this page to leave the course.

Build

a notebook section that loads CIFAR-10, inspects image shapes, and displays labeled examples

Learn

that images are grids of RGB numbers, not magic objects

Ship

a labeled image gallery plus a written explanation of `(50000, 32, 32, 3)`

Teacher demo

Before students type, draw one large pixel on the board and label it (R, G, B). Then run the shape cell live and point to each number in (50000, 32, 32, 3).

The big idea

A computer does not see a cat. It sees a grid of numbers.

Each picture is made of pixels. Each pixel stores three color numbers: red, green, and blue. CIFAR-10 gives us 60,000 tiny color images, already labeled as one of 10 classes. Before we train anything, we need to prove what the data looks like.

A human may say "cat" because the picture has ears, fur, and a face. A computer starts lower than that. It receives rows and columns of RGB numbers. Machine learning begins when we show many of those number grids with their labels, then let a model look for patterns that connect the grids to the labels.

How the Python ML workflow connects
  1. 1
    Photos / CIFAR-10labeled image examplesStage 1
  2. 2
    Notebook variablesx_train, y_train, class_namesSetup-2
  3. 3
    Prepared datanormalized pixels and fair pilesStage 3
  4. 4
    Keras modelCNN layers and summaryStage 4
  5. 5
    Training historyepochs, loss, accuracyStage 5
  6. 6
    Test evidencesealed score and mistakesStages 6-7
  7. 7
    Improved modelaugmentation comparisonStage 8
  8. 8
    Inferenceuploaded image to top-3 guessesStage 9
  9. 9
    Demo evidencetable, confidence, limitationStage 10

Stage 1 starts the left side of the workflow. CIFAR-10 photos become notebook arrays and labels, which every later stage will reuse.

New words
pixel
one tiny square of color in an image
RGB
the red, green, and blue numbers that make a pixel color
dataset
a collection of examples and answers
label
the correct answer for one example
shape
the size and structure of a tensor or array
TensorFlow
a Python machine-learning toolbox we will use to load data, build models, and train them
Keras
the friendlier model-building API inside TensorFlow
Before you start

Open your saved Colab notebook from Setup. Work in the Stage 1 section.

Build it

Step 1 — Translate one pixel first

Before using any library, write this in a text cell:

If one pixel is [200, 40, 20]:
- the red amount is _____
- the green amount is _____
- the blue amount is _____
- I predict the color would look mostly __________ because __________.

This is the computer's starting point: color as numbers.

Step 2 — Predict the data shape

Before loading anything, write a text cell:

My prediction:
CIFAR-10 training images will have shape: __________
I think each image needs ___ color numbers per pixel because __________.

This makes the first code cell a test of your thinking, not just a command to run.

Step 3 — Meet the tools before importing them

The next cell imports tools. Each one has a job:

  • tensorflow is the machine-learning toolbox. Later it will build and train the model.
  • tensorflow.keras.datasets.cifar10 is a dataset helper inside TensorFlow. It gives us practice images and labels without making us download files by hand.
  • matplotlib.pyplot is a drawing tool. It lets humans see the tiny images that the computer stores as numbers.

Nothing here is the AI yet. These tools only load and display data.

Step 4 — Import the tools and load CIFAR-10

import tensorflow as tf
from tensorflow.keras.datasets import cifar10
import matplotlib.pyplot as plt

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

The first run may download the dataset. Nothing has trained yet; you only loaded examples and labels.

Step 5 — Inspect the shapes

print("Training images:", x_train.shape)
print("Training labels:", y_train.shape)
print("Test images:", x_test.shape)
print("Test labels:", y_test.shape)

You should see x_train as (50000, 32, 32, 3). Read it as: 50,000 images, each 32 pixels tall, 32 pixels wide, with 3 color channels.

Step 6 — Inspect one pixel

print("First image, top-left pixel:", x_train[0][0][0])

That one pixel should print three numbers like [59 62 63]. Those are red, green, and blue values.

Step 7 — Show labeled examples

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']

fig, axes = plt.subplots(2, 5, figsize=(10, 5))
for i in range(10):
row = i // 5
col = i % 5
axes[row, col].imshow(x_train[i])
axes[row, col].set_title(class_names[y_train[i][0]])
axes[row, col].axis('off')
plt.show()

Now you can see the tiny images and their decoded labels.

Understand it

TensorFlow did not magically see pictures for us. It gave Python a ready dataset. Keras gave us a clean way to access that dataset. Matplotlib helped us display examples so our human eyes could check what the numbers represent.

The label is stored as a number because computers compare numbers more easily than words. The class_names list is a decoder for humans: if the label number is 3, the word at position 3 is cat.

The test set matters from the start. We will train on x_train, but hide x_test until we need an honest grade. If the model gets to practice on the test pictures, the final score stops meaning anything.

Try this

Learning beat

Try this

Three short experiments. Predict before you run, then test your guess.

Predict first

Pick image index 17. Predict its label before you display it. Then show it with plt.imshow(x_train[17]) and decode y_train[17][0].

Compare

Compare x_train[0].shape and x_test[0].shape. Are individual images shaped differently, or are only the piles different sizes?

Connect

A black-and-white image needs one number per pixel instead of three. What would the last number in the shape probably be?

Test your stage

  • Your notebook shows the training image shape and test image shape.
  • You printed one pixel and can explain why it has three numbers.
  • You displayed 10 labeled CIFAR-10 images.
  • Workflow check. Point to this stage on the workflow map and explain what the image arrays create for later stages.
  • Tool check. In a text cell, explain what TensorFlow, Keras, CIFAR-10, and Matplotlib each did in this stage.
  • Design check. In a text cell, explain (50000, 32, 32, 3) and why the final number is 3.

If it breaks

  • NameError: name 'x_train' is not defined. Run the load cell again.
  • Images are blurry. That's normal. CIFAR-10 images are only 32 by 32 pixels.
  • Labels show numbers instead of words. Use class_names[y_train[i][0]].
Coach notes

Do not rush the shape. The course depends on students believing that images are numbers before they believe a model can learn from them.