Skip to main content

Stage 4: Build a CNN One Block at a Time

Course progressStage 4 of 10
~90 min
Your workspace

Keep your Colab notebook tab open all session. Open in a new tab — don’t use the buttons in this page to leave the course.

Build

a CNN assembled in visible blocks with summary checks

Learn

how convolution, pooling, flattening, and dense layers pass information forward

Ship

a compiled model plus a layer-summary explanation

Teacher demo

Slide a 3x3 paper window over a drawn image grid. Then show how pooling keeps the strongest value from a small area.

The big idea

A CNN is built for images because it looks for small visual patterns first, then combines them into bigger ideas. Instead of pasting one giant model, you will build it in blocks and inspect the summary as it grows.

The Keras pieces have different jobs: Conv2D finds local patterns, MaxPooling2D shrinks the pattern maps, Flatten turns the maps into one list, and Dense uses that list to make class decisions. The model is not trained yet; Stage 4 is only designing the structure.

How the Python ML workflow connects
  1. 1
    Photos / CIFAR-10labeled image examplesStage 1
  2. 2
    Notebook variablesx_train, y_train, class_namesSetup-2
  3. 3
    Prepared datanormalized pixels and fair pilesStage 3
  4. 4
    Keras modelCNN layers and summaryStage 4
  5. 5
    Training historyepochs, loss, accuracyStage 5
  6. 6
    Test evidencesealed score and mistakesStages 6-7
  7. 7
    Improved modelaugmentation comparisonStage 8
  8. 8
    Inferenceuploaded image to top-3 guessesStage 9
  9. 9
    Demo evidencetable, confidence, limitationStage 10

Stage 4 builds the Keras model that will learn from the prepared data. The summary is the bridge between array shapes and the training run.

New words
CNN
a neural network designed for images
convolution
a sliding pattern finder
filter
one learned pattern detector
pooling
zooming out while keeping strong signals
parameter
a number the model learns during training
Before you start

Stage 3 must be complete. This model expects normalized images and one-hot labels.

Build it

Step 1 — Import model pieces

Before importing, write which piece you think turns image-shaped data into decision-shaped data:

Layer prediction:
- The layer that finds picture patterns is __________.
- The layer that turns grids into one list is __________.
- The final layer needs _____ outputs because __________.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

These are the building blocks, not the trained model yet.

Step 2 — Build the first pattern block

model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
MaxPooling2D((2, 2)),
])

model.summary()

Read the output shape. The image representation gets smaller after pooling.

Step 3 — Add a deeper pattern block

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

model.summary()

The second block learns from patterns the first block found. Notice how the number of filters grows from 32 to 64.

Step 4 — Add decision layers

model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))

model.summary()

Flatten turns the pattern grid into one list. The last layer has 10 outputs because CIFAR-10 has 10 classes.

Step 5 — Compile the model

model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)

Compiling chooses how the model will learn and which score we will watch.

Understand it

Convolution layers find local patterns that can appear anywhere. Pooling keeps the strongest nearby signal and makes the representation smaller. Dense layers make the final decision after the image has been turned into a list of learned features.

The summary is not decoration. It is the map of your model. You can see shapes shrink, filters increase, and parameters appear before training changes them.

Try this

Learning beat

Try this

Three short experiments. Predict before you run, then test your guess.

Predict first

Before adding the final layer, predict how many outputs it needs. Then tie your answer to the label shape from Stage 3.

Compare

Change the first layer from 32 filters to 16, rebuild, and compare parameter counts. Then put it back to 32.

Connect

Why would a model that looks at tiny windows be better for pictures than a model that flattens the image immediately?

Test your stage

  • You ran model.summary() after the first block, second block, and final model.
  • Your final layer has 10 outputs.
  • Your model is compiled with categorical_crossentropy.
  • Workflow check. Point to this stage on the workflow map and explain what the model needs before it can train.
  • Evidence check. Point to the summary line where image-shaped data becomes a flat list.
  • Design check. In a text cell, explain one layer's job in your own words.

If it breaks

  • Sequential is not defined. Run the import cell.
  • The final layer is not 10. Rebuild the model and set Dense(10, activation='softmax').
  • A yellow input_shape warning appears. The model still works.
Coach notes

This stage should feel like opening the black box. Pause after each summary and ask what changed.