Skip to main content

Stage 8: Improve the Experiment

Course progressStage 8 of 10
~90 min
Your workspace

Keep your Colab notebook tab open all session. Open in a new tab — don’t use the buttons in this page to leave the course.

Build

an augmented-image training run and a baseline-vs-augmented comparison

Learn

how data augmentation can reduce memorizing and improve generalization

Ship

an improved model plus an evidence table comparing scores

Teacher demo

Show the same image shifted, rotated, and flipped. Ask whether it is still the same class and when a transformation becomes nonsense.

The big idea

If the model memorizes exact training images, give it harder practice. Data augmentation makes changed versions of training images: slightly rotated, shifted, or flipped. The label stays the same, but the model cannot rely on one exact image.

This is an experiment. You will compare baseline test accuracy to augmented test accuracy.

Only one thing should change in the experiment: the training images get realistic variations. The model structure, validation data, and test data stay comparable.

How the Python ML workflow connects
  1. 1
    Photos / CIFAR-10labeled image examplesStage 1
  2. 2
    Notebook variablesx_train, y_train, class_namesSetup-2
  3. 3
    Prepared datanormalized pixels and fair pilesStage 3
  4. 4
    Keras modelCNN layers and summaryStage 4
  5. 5
    Training historyepochs, loss, accuracyStage 5
  6. 6
    Test evidencesealed score and mistakesStages 6-7
  7. 7
    Improved modelaugmentation comparisonStage 8
  8. 8
    Inferenceuploaded image to top-3 guessesStage 9
  9. 9
    Demo evidencetable, confidence, limitationStage 10

Stage 8 creates the improved-model comparison. Augmentation only matters if the evidence shows it changed the result.

New words
augmentation
creating varied training examples from existing ones
rotation
turning an image a little
shift
moving an image slightly left, right, up, or down
batch
a small group of images trained together
Before you start

You need the baseline test score from Stage 6. If the runtime restarted, run all earlier stages first.

Build it

Step 1 — Create the augmentation tool

ImageDataGenerator is a Keras image helper that creates varied training batches. rotation_range=15 turns images a little, the shift settings slide them slightly, and horizontal_flip=True mirrors them left-to-right. Before running it, write which class you think these changes might help most and why.

from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True
)

These settings create believable image variations without turning the data into nonsense.

Step 2 — Inspect augmented samples before training

preview_flow = datagen.flow(x_train, y_train, batch_size=5)
preview_images, preview_labels = next(preview_flow)

fig, axes = plt.subplots(1, 5, figsize=(10, 3))
for i in range(5):
axes[i].imshow(preview_images[i])
axes[i].set_title(class_names[preview_labels[i].argmax()])
axes[i].axis('off')
plt.show()

If the images look impossible, your augmentation settings are too strong.

Step 3 — Build a fresh comparable model

augmented_model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Flatten(),
Dense(64, activation='relu'),
Dense(10, activation='softmax')
])

augmented_model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)

A fresh model makes the comparison cleaner: same architecture, different training data.

Step 4 — Train with augmented images

augmented_history = augmented_model.fit(
datagen.flow(x_train, y_train, batch_size=32),
epochs=10,
validation_data=(x_val, y_val)
)

The model trains on varied images while validation stays plain and fair.

Step 5 — Compare scores

augmented_test_loss, augmented_test_accuracy = augmented_model.evaluate(x_test, y_test)

print(f"Baseline test accuracy: {baseline_test_accuracy:.3f}")
print(f"Augmented test accuracy: {augmented_test_accuracy:.3f}")
print(f"Change: {augmented_test_accuracy - baseline_test_accuracy:.3f}")

The improvement may be small. The important question is whether the model generalizes more honestly.

Understand it

Augmentation teaches the model that a cat is still a cat if it shifts or flips. That helps it learn stable patterns instead of exact pixels.

Too much augmentation can hurt. A 90-degree rotated truck may no longer look like the kind of truck the test set contains. Good ML work includes choosing realistic transformations.

Try this

Learning beat

Try this

Three short experiments. Predict before you run, then test your guess.

Predict first

Before evaluating, predict whether augmentation will improve test accuracy by a lot, a little, or not at all.

Compare

Compare the augmented preview images to the original class examples. Do they still look believable?

Connect

Which mistakes from Stage 7 might augmentation help? Which mistakes probably need a different fix?

Test your stage

  • You inspected augmented images before training.
  • You trained a fresh augmented model.
  • You printed baseline and augmented test accuracy.
  • Workflow check. Point to this stage on the workflow map and explain what changed and what stayed comparable.
  • Evidence check. Your comparison states what changed and what stayed the same in the experiment.
  • Design check. Your comparison explains whether augmentation helped and cites evidence.

If it breaks

  • baseline_test_accuracy is undefined. Re-run Stage 6.
  • Augmented accuracy is lower. That can happen. Record it honestly and consider whether the transformations were too strong.
  • Training is slow. Confirm GPU is on, or reduce epochs for class timing.
Coach notes

The clean experimental design matters. Continuing the same model would mix two effects: more epochs and augmentation. A fresh comparable model is clearer.