Stage 6: Grade It Honestly

Course progressStage 6 of 10

~75 min

Your workspace

Keep your Colab notebook tab open all session. Open in a new tab — don’t use the buttons in this page to leave the course.

Train your AIGoogle ColabOpen in a new tab ↗No-code warm-upMachine Learning for KidsOpen in a new tab ↗

Build

a sealed test-set evaluation and a train-validation-test comparison

Learn

why the honest score is different from training accuracy

Ship

a baseline test score and an overfitting gap calculation

Teacher demo

Put training accuracy, validation accuracy, and test accuracy on the board as three different claims. Ask which one belongs on a poster and why.

The big idea

The model practiced on training data and checked validation data during training. The test set is different: it stayed sealed. Today you open it once for the honest baseline score.

If training accuracy is much higher than test accuracy, the model learned some real patterns and memorized some training details. That gap is called overfitting.

How the Python ML workflow connects

1
Photos / CIFAR-10labeled image examplesStage 1
2
Notebook variablesx_train, y_train, class_namesSetup-2
3
Prepared datanormalized pixels and fair pilesStage 3
4
Keras modelCNN layers and summaryStage 4
5
Training historyepochs, loss, accuracyStage 5
6
Test evidencesealed score and mistakesStages 6-7
7
Improved modelaugmentation comparisonStage 8
8
Inferenceuploaded image to top-3 guessesStage 9
9
Demo evidencetable, confidence, limitationStage 10

Stage 6 moves from training history to test evidence. The sealed test score is the honest grade you can report later.

New words

evaluate: grade a trained model without changing it
test accuracy: the model's score on the sealed test set
generalize: work on new examples, not only practiced ones
overfitting: memorizing training examples instead of learning the general idea

Before you start

You need the trained baseline model and baseline_history from Stage 5.

Build it

Step 1 — Predict the honest score

Test-score prediction:
- My final validation accuracy was _____.
- I predict test accuracy will be _____ because __________.

Step 2 — Evaluate the sealed test set

model.evaluate grades the model on data without changing the model. That is why this cell is different from model.fit: fit practices, evaluate grades.

baseline_test_loss, baseline_test_accuracy = model.evaluate(x_test, y_test)

print(f"Baseline test loss: {baseline_test_loss:.3f}")
print(f"Baseline test accuracy: {baseline_test_accuracy:.3f}")

No learning happens here. This is a grade, not practice.

Step 3 — Compare all three scores

baseline_train_accuracy = baseline_history.history['accuracy'][-1]
baseline_val_accuracy = baseline_history.history['val_accuracy'][-1]

print(f"Training accuracy: {baseline_train_accuracy:.3f}")
print(f"Validation accuracy: {baseline_val_accuracy:.3f}")
print(f"Test accuracy: {baseline_test_accuracy:.3f}")
print(f"Train-test gap: {baseline_train_accuracy - baseline_test_accuracy:.3f}")

The gap is your overfitting signal.

Step 4 — Write the honest claim

Honest model claim:
My baseline model gets about _____% test accuracy.
It does _____ points better on training data than test data.
That means __________.

Understand it

Training accuracy is not a lie, but it is not the final truth. It measures data the model practiced on. Test accuracy measures data that stayed untouched until grading.

Validation accuracy helps us while building. Test accuracy is the number we report when we want to be honest about performance.

Try this

Learning beat

Try this

Three short experiments. Predict before you run, then test your guess.

Predict first

If a model had 99% training accuracy and 55% test accuracy, would you trust it more or less than yours? Explain before discussing.

Compare

Compare validation accuracy and test accuracy. Are they close? What would it mean if test was much lower?

Connect

Why is it cheating to keep changing your model until the test score goes up?

Test your stage

You printed baseline test accuracy.
You compared training, validation, and test accuracy.
You calculated the train-test gap.
Workflow check. Point to this stage on the workflow map and explain why test evidence comes after training.
Evidence check. Your honest claim includes training, validation, and test accuracy.
Design check. Explain from memory why changing the model after watching test accuracy would be cheating.

If it breaks

Test accuracy is near 0.10. The model probably is not trained in this session.
baseline_history is missing. Re-run Stage 5 training.
The gap is tiny. That is good; it means the model generalized well.

Coach notes

The code is short; the learning is the distinction between practiced, checked, and sealed data. Keep the conversation focused there.

Stage 6 complete

You finished Stage 6!

You graded the baseline model honestly and measured its overfitting gap. Next, you investigate the mistakes instead of hiding them.

Stage builtPlaytested

Stage 7: Find What It Gets Wrong

The big idea​

Build it​

Step 1 — Predict the honest score​

Step 2 — Evaluate the sealed test set​

Step 3 — Compare all three scores​

Step 4 — Write the honest claim​

Understand it​

Try this​

Test your stage​

If it breaks​

You finished Stage 6!

The big idea

Build it

Step 1 — Predict the honest score

Step 2 — Evaluate the sealed test set

Step 3 — Compare all three scores

Step 4 — Write the honest claim

Understand it

Try this

Test your stage

If it breaks