Stage 4: Build a CNN One Block at a Time
Keep your Colab notebook tab open all session. Open in a new tab — don’t use the buttons in this page to leave the course.
a CNN assembled in visible blocks with summary checks
how convolution, pooling, flattening, and dense layers pass information forward
a compiled model plus a layer-summary explanation
Slide a 3x3 paper window over a drawn image grid. Then show how pooling keeps the strongest value from a small area.
The big idea
A CNN is built for images because it looks for small visual patterns first, then combines them into bigger ideas. Instead of pasting one giant model, you will build it in blocks and inspect the summary as it grows.
The Keras pieces have different jobs: Conv2D finds local patterns, MaxPooling2D shrinks the pattern maps, Flatten turns the maps into one list, and Dense uses that list to make class decisions. The model is not trained yet; Stage 4 is only designing the structure.
- 1Photos / CIFAR-10labeled image examplesStage 1
- 2Notebook variablesx_train, y_train, class_namesSetup-2
- 3Prepared datanormalized pixels and fair pilesStage 3
- 4Keras modelCNN layers and summaryStage 4
- 5Training historyepochs, loss, accuracyStage 5
- 6Test evidencesealed score and mistakesStages 6-7
- 7Improved modelaugmentation comparisonStage 8
- 8Inferenceuploaded image to top-3 guessesStage 9
- 9Demo evidencetable, confidence, limitationStage 10
Stage 4 builds the Keras model that will learn from the prepared data. The summary is the bridge between array shapes and the training run.
- CNN
- a neural network designed for images
- convolution
- a sliding pattern finder
- filter
- one learned pattern detector
- pooling
- zooming out while keeping strong signals
- parameter
- a number the model learns during training
Stage 3 must be complete. This model expects normalized images and one-hot labels.
Build it
Step 1 — Import model pieces
Before importing, write which piece you think turns image-shaped data into decision-shaped data:
Layer prediction:
- The layer that finds picture patterns is __________.
- The layer that turns grids into one list is __________.
- The final layer needs _____ outputs because __________.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
These are the building blocks, not the trained model yet.
Step 2 — Build the first pattern block
model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
MaxPooling2D((2, 2)),
])
model.summary()
Read the output shape. The image representation gets smaller after pooling.
Step 3 — Add a deeper pattern block
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.summary()
The second block learns from patterns the first block found. Notice how the number of filters grows from 32 to 64.
Step 4 — Add decision layers
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.summary()
Flatten turns the pattern grid into one list. The last layer has 10 outputs because CIFAR-10 has 10 classes.
Step 5 — Compile the model
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
Compiling chooses how the model will learn and which score we will watch.
Understand it
Convolution layers find local patterns that can appear anywhere. Pooling keeps the strongest nearby signal and makes the representation smaller. Dense layers make the final decision after the image has been turned into a list of learned features.
The summary is not decoration. It is the map of your model. You can see shapes shrink, filters increase, and parameters appear before training changes them.
Try this
Try this
Three short experiments. Predict before you run, then test your guess.
Before adding the final layer, predict how many outputs it needs. Then tie your answer to the label shape from Stage 3.
Change the first layer from 32 filters to 16, rebuild, and compare parameter counts. Then put it back to 32.
Why would a model that looks at tiny windows be better for pictures than a model that flattens the image immediately?
Test your stage
- You ran
model.summary()after the first block, second block, and final model. - Your final layer has 10 outputs.
- Your model is compiled with
categorical_crossentropy. - Workflow check. Point to this stage on the workflow map and explain what the model needs before it can train.
- Evidence check. Point to the summary line where image-shaped data becomes a flat list.
- Design check. In a text cell, explain one layer's job in your own words.
If it breaks
Sequentialis not defined. Run the import cell.- The final layer is not 10. Rebuild the model and set
Dense(10, activation='softmax'). - A yellow
input_shapewarning appears. The model still works.
This stage should feel like opening the black box. Pause after each summary and ask what changed.