Stage 7: Find What It Gets Wrong

Course progressStage 7 of 10

~90 min

Your workspace

Keep your Colab notebook tab open all session. Open in a new tab — don’t use the buttons in this page to leave the course.

Train your AIGoogle ColabOpen in a new tab ↗No-code warm-upMachine Learning for KidsOpen in a new tab ↗

Build

prediction analysis cells that show wrong examples and class-level accuracy

Learn

how error analysis turns a score into an explanation

Ship

three wrong predictions plus a hardest-class finding

Teacher demo

Show one wrong prediction and ask: is the model being silly, or is the picture genuinely confusing?

The big idea

One score is not enough. A useful machine-learning engineer asks: Which examples failed? Which classes are hardest? Are the mistakes understandable?

Today you turn mistakes into evidence.

How the Python ML workflow connects

1
Photos / CIFAR-10labeled image examplesStage 1
2
Notebook variablesx_train, y_train, class_namesSetup-2
3
Prepared datanormalized pixels and fair pilesStage 3
4
Keras modelCNN layers and summaryStage 4
5
Training historyepochs, loss, accuracyStage 5
6
Test evidencesealed score and mistakesStages 6-7
7
Improved modelaugmentation comparisonStage 8
8
Inferenceuploaded image to top-3 guessesStage 9
9
Demo evidencetable, confidence, limitationStage 10

Stage 7 stays in test evidence. Wrong predictions and per-class accuracy explain the score instead of hiding behind it.

New words

error analysis: studying wrong predictions to understand model behavior
prediction: the class the model chooses
true label: the correct answer from the dataset
per-class accuracy: accuracy measured separately for each category

Before you start

You need the trained baseline model from Stage 5 and test data from Stage 3.

Build it

Step 1 — Predict all test labels

model.predict asks the trained model for class scores without changing it. argmax picks the slot with the largest score, which becomes the predicted label number. NumPy helps us compare all of those predicted label numbers to the true label numbers.

Before running the cell, predict which class from Stage 2 will have the most mistakes.

test_predictions = model.predict(x_test)
predicted_labels = test_predictions.argmax(axis=1)
true_labels = y_test.argmax(axis=1)

print("Predicted labels shape:", predicted_labels.shape)

Each test image now has a predicted class number.

Step 2 — Find wrong predictions

wrong_indexes = np.where(predicted_labels != true_labels)[0]
print("Wrong predictions:", len(wrong_indexes))
print("First three wrong indexes:", wrong_indexes[:3])

Wrong predictions are not failures to hide. They are the best learning material.

Step 3 — Display three wrong examples

fig, axes = plt.subplots(1, 3, figsize=(10, 3))

for spot, image_index in enumerate(wrong_indexes[:3]):
    axes[spot].imshow(x_test[image_index])
    actual = class_names[true_labels[image_index]]
    predicted = class_names[predicted_labels[image_index]]
    confidence = test_predictions[image_index][predicted_labels[image_index]]
    axes[spot].set_title(f"Pred: {predicted}\nTrue: {actual}\n{confidence:.0%}")
    axes[spot].axis('off')

plt.tight_layout()
plt.show()

Look closely. Were the mistakes reasonable?

Step 4 — Calculate per-class accuracy

for class_index, class_name in enumerate(class_names):
    class_mask = true_labels == class_index
    class_accuracy = (predicted_labels[class_mask] == true_labels[class_mask]).mean()
    print(f"{class_name}: {class_accuracy:.3f}")

The hardest class is the one with the lowest accuracy.

Step 5 — Write an error report

Error report:
- Three wrong predictions I inspected: __________.
- The hardest class was __________.
- One reason this class may be hard is __________.
- This matches / does not match my Stage 2 prediction because __________.

Understand it

Accuracy tells you how often the model is right. Error analysis tells you what kind of model you built.

If cats and dogs fail often, the model may struggle with soft shapes and similar textures. If trucks and automobiles fail, the model may need more detail or larger images. Explaining failure is part of understanding the system.

Try this

Learning beat

Try this

Three short experiments. Predict before you run, then test your guess.

Predict first

Before running per-class accuracy, predict the hardest class from your Stage 2 report. Was your prediction right?

Compare

Compare a confident wrong prediction with a low-confidence wrong prediction. Which is more dangerous in a real product?

Connect

How could data augmentation help with the mistakes you saw today?

Test your stage

You displayed three wrong predictions.
You calculated per-class accuracy.
You named the hardest class.
Workflow check. Point to this stage on the workflow map and explain how mistakes improve the final demo.
Evidence check. Your report includes one correct or understandable prediction and one wrong prediction.
Design check. Your error report explains at least one mistake using visual evidence.

If it breaks

np is not defined. Run import numpy as np.
y_test.argmax fails. Make sure labels are one-hot from Stage 3.
Titles overlap. Use fewer images or increase figsize.

Coach notes

This stage is a major course upgrade. It makes model quality concrete and gives students language for honest final demos.

Stage 7 complete

You finished Stage 7!

You studied wrong predictions and found the model's hardest classes. Next, you improve the experiment with augmented training data.

Stage builtPlaytested

Stage 8: Improve the Experiment

The big idea​

Build it​

Step 1 — Predict all test labels​

Step 2 — Find wrong predictions​

Step 3 — Display three wrong examples​

Step 4 — Calculate per-class accuracy​

Step 5 — Write an error report​

Understand it​

Try this​

Test your stage​

If it breaks​

You finished Stage 7!

The big idea

Build it

Step 1 — Predict all test labels

Step 2 — Find wrong predictions

Step 3 — Display three wrong examples

Step 4 — Calculate per-class accuracy

Step 5 — Write an error report

Understand it

Try this

Test your stage

If it breaks