Dissertation Project

Notebook 01

01 Scope And Research Questions

Frames the dissertation around tabular baseline evaluation, image-model development, and synthetic multimodal experimentation under explicit non-clinical constraints.

Purpose

Define the study scope, central research questions, and the boundaries around exploratory multimodal claims.

The Wisconsin branch is used as the tabular baseline for comparison.

The BreaKHis build is the main image contribution.

Synthetic fusion is positioned as exploratory methodology, not clinical evidence.

This notebook defines the study scope, the main research questions, and the project claim boundaries, so the key output is the notebook narrative rather than standalone figures.

Notebook 02

02 BreaKHis Dataset Exploration

Download raw notebook

Audits class balance, magnification coverage, image size variation, and sample appearance before training the image branch.

Purpose

Establish an empirical understanding of the BreaKHis binary dataset and surface practical preprocessing constraints.

The dataset spans multiple magnifications and heterogeneous staining appearances.

Image dimensions and colour distributions vary enough to justify careful normalization.

Exploration outputs anchor later preprocessing and augmentation decisions.

Class and magnification distribution across the BreaKHis workflow.

Representative sample mosaic used to ground the visual variability of the image branch.

Notebook 03

03 Split Audit And Patient Leakage

Download raw notebook

Demonstrates leakage in the naive image-level split and motivates the patient-level evaluation protocol.

Purpose

Prove why patient-level separation is necessary before reporting any image-model result.

Naive image-level splitting leaks patient information across train and test.

The patient-level split materially changes the credibility of downstream metrics.

Leakage auditing becomes a first-class part of the dissertation narrative.

Leakage evidence that motivates the patient-level protocol.

Notebook 04

04 Preprocessing And Dataloaders

Download raw notebook

Builds reproducible transforms, loaders, and normalization choices around the patient-level split.

Purpose

Translate the audited dataset into a stable image-processing pipeline ready for model development.

BreakHis-specific normalization is tracked explicitly rather than assumed.

Augmentation is controlled and lightweight instead of visually extreme.

The preprocessing pipeline is structured for reuse in later inference.

Examples from the final augmentation and preprocessing pipeline.

Comparison view for the normalization choices considered in the workflow.

Notebook 05

05 Model Development

Download raw notebook

Compares image-branch development runs and saves the patient-level ResNet18 checkpoint used by the app.

Purpose

Identify the most credible image model configuration for transfer-ready inference.

The final patient-level model is selected from the development sequence.

Best-checkpoint selection is based on validation behaviour under the patient-level split.

The saved clean checkpoint becomes the app-facing image artifact.

Comparison of the image-model development runs.

Training history for one development run in the final pipeline.

Training history for the alternate development run.

Notebook 06

06 Evaluation And Error Analysis

Download raw notebook

Evaluates the patient-level image model with ROC, calibration, confusion, magnification, and failure analysis outputs.

Purpose

Produce the test-set evidence used throughout the written dissertation and web application.

Patient-level performance remains credible under the leakage-safe split.

Calibration and error analysis are surfaced alongside accuracy and ROC rather than hidden.

Magnification-specific behaviour is examined instead of assuming uniform performance.

Patient-level ROC curve for the image branch.

Calibration plot for the patient-level model.

Patient-level confusion matrix for the holdout evaluation.

Selected failure cases used in the error analysis discussion.

Magnification-specific ROC/AUC view for the image branch.

Notebook 07

07 Wisconsin Review And Integration

Download raw notebook

Reviews the Wisconsin branch, documents its transfer contract, and aligns it with the image-side app integration.

Purpose

Prepare the tabular baseline for safe reuse without altering the original notebook or artifacts.

Its input contract is made explicit for downstream app integration.

The tabular branch is treated as a baseline and comparison anchor.

This notebook documents how the tabular workflow is reused in the application, so the main value is the integration contract and raw notebook rather than figure output.

Notebook 08

08 Synthetic Pairing Design

Download raw notebook

Constructs the synthetic pairing logic used to test fusion strategies across independent unimodal datasets.

Purpose

Define how same-label and random pairing experiments are constructed while preserving the project’s non-clinical framing.

Pairings are explicitly synthetic and should never be interpreted as patient-level multimodal truth.

Same-label and random strategies are both retained for comparison.

The pairing process is auditable and reproducible across seeds.

Visual summary of the synthetic pairing design used for the fusion experiments.

Notebook 09

09 Fusion Experiments

Download raw notebook

Runs exploratory early- and late-fusion experiments on synthetic pairings and benchmarks them against unimodal baselines.

Purpose

Evaluate whether synthetic pairings can still support useful fusion-method comparisons under data scarcity.

Fusion outputs are reported as exploratory only.

Repeated-seed evaluation surfaces stability rather than relying on a single run.

The experiments are retained for comparison while keeping the claim boundary explicit.

Comparison of synthetic fusion experiment families across pairing strategies.

Notebook 10

10 Model Comparison And Joint Analysis

Download raw notebook

Brings the tabular, image, and synthetic-fusion branches into one comparison space for the final dissertation analysis.

Purpose

Create the cross-model evidence tables and figures used to explain the strengths and limits of each branch.

The tabular branch remains the strongest benchmark numerically.

The image branch provides the main new contribution.

Synthetic fusion comparisons are kept visible but carefully caveated.

High-level accuracy comparison between the model families.

ROC/AUC comparison across unimodal and exploratory synthetic-fusion branches.

Notebook 11

11 Results Synthesis And Defense Pack

Download raw notebook

Packages the core dissertation claims, defense figures, and final synthesis outputs for written delivery and presentation.

Purpose

Provide the final narrative layer that turns the research workflow into a defendable dissertation artifact set.

The final pack distills the project into a concise claim set.

Figures and tables are curated for communication rather than exploration alone.

The web experience can reuse this notebook as its top-level narrative anchor.

Patient-level training history included in the defense-facing synthesis outputs.

Notebook 12

12 Demo Preset Generation

Download raw notebook

Generates the traceable preset manifest used by the web app for tabular, image, and synthetic-fusion demo cases.

Purpose

Create auditable demo inputs without hard-coding preset data directly inside the application.

Tabular presets reuse the existing BreaScope AI profile values and validate them against Wisconsin feature ranges.

Image presets cover each binary BreaKHis label and magnification using real held-out examples.

Fusion presets are explicitly synthetic story cases with recorded probability construction.

This notebook writes JSON and CSV preset artifacts for the app, so its main outputs are reports rather than figures.