{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "df2cc7b1",
   "metadata": {},
   "source": [
    "# Wisconsin Branch Review and Integration\n",
    "\n",
    "## Objective\n",
    "This notebook documents the Wisconsin model I built earlier and am now reusing unchanged. I am not retraining it here. The aim is simply to make its dataset, artifacts, assumptions, and expected inputs explicit before I bring it into the synthetic pairing workflow.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95338756",
   "metadata": {
    "codex_research_commentary": true
   },
   "source": [
    "## Notebook Purpose\n",
    "\n",
    "This notebook reviews the Wisconsin tabular branch and documents how it is reused. It does not retrain or alter the original notebook; it extracts the integration contract needed for comparison and web inference.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f10b156",
   "metadata": {
    "codex_research_commentary": true
   },
   "source": [
    "## Why This Matters\n",
    "\n",
    "The Wisconsin branch is a strong comparator, but it is not the new contribution of this dissertation. Documenting it carefully prevents accidental claims that the branch was rebuilt or optimized in the current workflow.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76de3b6c",
   "metadata": {
    "codex_research_commentary": true
   },
   "source": [
    "## Load Wisconsin Data\n",
    "\n",
    "This setup cell reads the original Wisconsin CSV and maps labels into the same benign/malignant language used elsewhere in the project. The original branch remains untouched.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9babea40",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-19T20:17:09.137656Z",
     "iopub.status.busy": "2026-04-19T20:17:09.137413Z",
     "iopub.status.idle": "2026-04-19T20:17:09.765809Z",
     "shell.execute_reply": "2026-04-19T20:17:09.765334Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Project root: /Users/sergeysotskiy/Documents/UNI/year 3/Dissertation/dissertation_project\n",
      "Outputs: /Users/sergeysotskiy/Documents/UNI/year 3/Dissertation/dissertation_project/outputs\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>x.radius_mean</th>\n",
       "      <th>x.texture_mean</th>\n",
       "      <th>x.perimeter_mean</th>\n",
       "      <th>x.area_mean</th>\n",
       "      <th>x.smoothness_mean</th>\n",
       "      <th>x.compactness_mean</th>\n",
       "      <th>x.concavity_mean</th>\n",
       "      <th>x.concave_pts_mean</th>\n",
       "      <th>x.symmetry_mean</th>\n",
       "      <th>x.fractal_dim_mean</th>\n",
       "      <th>...</th>\n",
       "      <th>x.perimeter_worst</th>\n",
       "      <th>x.area_worst</th>\n",
       "      <th>x.smoothness_worst</th>\n",
       "      <th>x.compactness_worst</th>\n",
       "      <th>x.concavity_worst</th>\n",
       "      <th>x.concave_pts_worst</th>\n",
       "      <th>x.symmetry_worst</th>\n",
       "      <th>x.fractal_dim_worst</th>\n",
       "      <th>y</th>\n",
       "      <th>label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>13.540</td>\n",
       "      <td>14.36</td>\n",
       "      <td>87.46</td>\n",
       "      <td>566.3</td>\n",
       "      <td>0.09779</td>\n",
       "      <td>0.08129</td>\n",
       "      <td>0.06664</td>\n",
       "      <td>0.047810</td>\n",
       "      <td>0.1885</td>\n",
       "      <td>0.05766</td>\n",
       "      <td>...</td>\n",
       "      <td>99.70</td>\n",
       "      <td>711.2</td>\n",
       "      <td>0.14400</td>\n",
       "      <td>0.17730</td>\n",
       "      <td>0.23900</td>\n",
       "      <td>0.12880</td>\n",
       "      <td>0.2977</td>\n",
       "      <td>0.07259</td>\n",
       "      <td>B</td>\n",
       "      <td>benign</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>13.080</td>\n",
       "      <td>15.71</td>\n",
       "      <td>85.63</td>\n",
       "      <td>520.0</td>\n",
       "      <td>0.10750</td>\n",
       "      <td>0.12700</td>\n",
       "      <td>0.04568</td>\n",
       "      <td>0.031100</td>\n",
       "      <td>0.1967</td>\n",
       "      <td>0.06811</td>\n",
       "      <td>...</td>\n",
       "      <td>96.09</td>\n",
       "      <td>630.5</td>\n",
       "      <td>0.13120</td>\n",
       "      <td>0.27760</td>\n",
       "      <td>0.18900</td>\n",
       "      <td>0.07283</td>\n",
       "      <td>0.3184</td>\n",
       "      <td>0.08183</td>\n",
       "      <td>B</td>\n",
       "      <td>benign</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9.504</td>\n",
       "      <td>12.44</td>\n",
       "      <td>60.34</td>\n",
       "      <td>273.9</td>\n",
       "      <td>0.10240</td>\n",
       "      <td>0.06492</td>\n",
       "      <td>0.02956</td>\n",
       "      <td>0.020760</td>\n",
       "      <td>0.1815</td>\n",
       "      <td>0.06905</td>\n",
       "      <td>...</td>\n",
       "      <td>65.13</td>\n",
       "      <td>314.9</td>\n",
       "      <td>0.13240</td>\n",
       "      <td>0.11480</td>\n",
       "      <td>0.08867</td>\n",
       "      <td>0.06227</td>\n",
       "      <td>0.2450</td>\n",
       "      <td>0.07773</td>\n",
       "      <td>B</td>\n",
       "      <td>benign</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>13.030</td>\n",
       "      <td>18.42</td>\n",
       "      <td>82.61</td>\n",
       "      <td>523.8</td>\n",
       "      <td>0.08983</td>\n",
       "      <td>0.03766</td>\n",
       "      <td>0.02562</td>\n",
       "      <td>0.029230</td>\n",
       "      <td>0.1467</td>\n",
       "      <td>0.05863</td>\n",
       "      <td>...</td>\n",
       "      <td>84.46</td>\n",
       "      <td>545.9</td>\n",
       "      <td>0.09701</td>\n",
       "      <td>0.04619</td>\n",
       "      <td>0.04833</td>\n",
       "      <td>0.05013</td>\n",
       "      <td>0.1987</td>\n",
       "      <td>0.06169</td>\n",
       "      <td>B</td>\n",
       "      <td>benign</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8.196</td>\n",
       "      <td>16.84</td>\n",
       "      <td>51.71</td>\n",
       "      <td>201.9</td>\n",
       "      <td>0.08600</td>\n",
       "      <td>0.05943</td>\n",
       "      <td>0.01588</td>\n",
       "      <td>0.005917</td>\n",
       "      <td>0.1769</td>\n",
       "      <td>0.06503</td>\n",
       "      <td>...</td>\n",
       "      <td>57.26</td>\n",
       "      <td>242.2</td>\n",
       "      <td>0.12970</td>\n",
       "      <td>0.13570</td>\n",
       "      <td>0.06880</td>\n",
       "      <td>0.02564</td>\n",
       "      <td>0.3105</td>\n",
       "      <td>0.07409</td>\n",
       "      <td>B</td>\n",
       "      <td>benign</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows \u00d7 32 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   x.radius_mean  x.texture_mean  x.perimeter_mean  x.area_mean  \\\n",
       "0         13.540           14.36             87.46        566.3   \n",
       "1         13.080           15.71             85.63        520.0   \n",
       "2          9.504           12.44             60.34        273.9   \n",
       "3         13.030           18.42             82.61        523.8   \n",
       "4          8.196           16.84             51.71        201.9   \n",
       "\n",
       "   x.smoothness_mean  x.compactness_mean  x.concavity_mean  \\\n",
       "0            0.09779             0.08129           0.06664   \n",
       "1            0.10750             0.12700           0.04568   \n",
       "2            0.10240             0.06492           0.02956   \n",
       "3            0.08983             0.03766           0.02562   \n",
       "4            0.08600             0.05943           0.01588   \n",
       "\n",
       "   x.concave_pts_mean  x.symmetry_mean  x.fractal_dim_mean  ...  \\\n",
       "0            0.047810           0.1885             0.05766  ...   \n",
       "1            0.031100           0.1967             0.06811  ...   \n",
       "2            0.020760           0.1815             0.06905  ...   \n",
       "3            0.029230           0.1467             0.05863  ...   \n",
       "4            0.005917           0.1769             0.06503  ...   \n",
       "\n",
       "   x.perimeter_worst  x.area_worst  x.smoothness_worst  x.compactness_worst  \\\n",
       "0              99.70         711.2             0.14400              0.17730   \n",
       "1              96.09         630.5             0.13120              0.27760   \n",
       "2              65.13         314.9             0.13240              0.11480   \n",
       "3              84.46         545.9             0.09701              0.04619   \n",
       "4              57.26         242.2             0.12970              0.13570   \n",
       "\n",
       "   x.concavity_worst  x.concave_pts_worst  x.symmetry_worst  \\\n",
       "0            0.23900              0.12880            0.2977   \n",
       "1            0.18900              0.07283            0.3184   \n",
       "2            0.08867              0.06227            0.2450   \n",
       "3            0.04833              0.05013            0.1987   \n",
       "4            0.06880              0.02564            0.3105   \n",
       "\n",
       "   x.fractal_dim_worst  y   label  \n",
       "0              0.07259  B  benign  \n",
       "1              0.08183  B  benign  \n",
       "2              0.07773  B  benign  \n",
       "3              0.06169  B  benign  \n",
       "4              0.07409  B  benign  \n",
       "\n",
       "[5 rows x 32 columns]"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from __future__ import annotations\n",
    "\n",
    "import json\n",
    "import random\n",
    "import sys\n",
    "from pathlib import Path\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "SEED = 42\n",
    "random.seed(SEED)\n",
    "np.random.seed(SEED)\n",
    "plt.style.use('seaborn-v0_8-whitegrid')\n",
    "\n",
    "CWD = Path.cwd().resolve()\n",
    "if (CWD / 'src').exists() and (CWD / 'data').exists():\n",
    "    PROJECT_ROOT = CWD\n",
    "elif (CWD.parent / 'src').exists() and (CWD.parent / 'data').exists():\n",
    "    PROJECT_ROOT = CWD.parent\n",
    "elif (CWD.parent.parent / 'src').exists() and (CWD.parent.parent / 'data').exists():\n",
    "    PROJECT_ROOT = CWD.parent.parent\n",
    "else:\n",
    "    raise RuntimeError(f'Could not resolve dissertation_project root from {CWD}')\n",
    "\n",
    "REPO_ROOT = PROJECT_ROOT.parent\n",
    "OUTPUTS = PROJECT_ROOT / 'outputs'\n",
    "FIGURES = OUTPUTS / 'figures'\n",
    "METRICS = OUTPUTS / 'metrics'\n",
    "REPORTS = OUTPUTS / 'reports'\n",
    "MODELS = PROJECT_ROOT / 'models'\n",
    "DATA_ROOT = PROJECT_ROOT / 'data' / 'dataset_cancer_v1' / 'dataset_cancer_v1'\n",
    "WISCONSIN_ROOT = PROJECT_ROOT / 'notebook_Wisconsin'\n",
    "\n",
    "for path in [FIGURES, METRICS, REPORTS]:\n",
    "    path.mkdir(parents=True, exist_ok=True)\n",
    "\n",
    "if str(PROJECT_ROOT) not in sys.path:\n",
    "    sys.path.append(str(PROJECT_ROOT))\n",
    "\n",
    "print('Project root:', PROJECT_ROOT)\n",
    "print('Outputs:', OUTPUTS)\n",
    "\n",
    "from IPython.display import display\n",
    "\n",
    "wisconsin_df = pd.read_csv(WISCONSIN_ROOT / 'brca.csv').drop(columns=['Unnamed: 0'], errors='ignore')\n",
    "wisconsin_df['label'] = wisconsin_df['y'].map({'B': 'benign', 'M': 'malignant'})\n",
    "wisconsin_df.head()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "70035f29",
   "metadata": {
    "codex_research_commentary": true
   },
   "source": [
    "## Summarize Labels And Features\n",
    "\n",
    "This cell saves label counts and a small feature-summary table. These outputs make the tabular dataset visible in the final report without modifying the source notebook.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3778d2cc",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-19T20:17:09.767236Z",
     "iopub.status.busy": "2026-04-19T20:17:09.767136Z",
     "iopub.status.idle": "2026-04-19T20:17:09.789742Z",
     "shell.execute_reply": "2026-04-19T20:17:09.789213Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>benign</td>\n",
       "      <td>357</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>malignant</td>\n",
       "      <td>212</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       label  count\n",
       "0     benign    357\n",
       "1  malignant    212"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>x.radius_mean</th>\n",
       "      <td>14.127292</td>\n",
       "      <td>3.524049</td>\n",
       "      <td>6.98100</td>\n",
       "      <td>28.11000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>x.texture_mean</th>\n",
       "      <td>19.289649</td>\n",
       "      <td>4.301036</td>\n",
       "      <td>9.71000</td>\n",
       "      <td>39.28000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>x.perimeter_mean</th>\n",
       "      <td>91.969033</td>\n",
       "      <td>24.298981</td>\n",
       "      <td>43.79000</td>\n",
       "      <td>188.50000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>x.area_mean</th>\n",
       "      <td>654.889104</td>\n",
       "      <td>351.914129</td>\n",
       "      <td>143.50000</td>\n",
       "      <td>2501.00000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>x.smoothness_mean</th>\n",
       "      <td>0.096360</td>\n",
       "      <td>0.014064</td>\n",
       "      <td>0.05263</td>\n",
       "      <td>0.16340</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>x.compactness_mean</th>\n",
       "      <td>0.104341</td>\n",
       "      <td>0.052813</td>\n",
       "      <td>0.01938</td>\n",
       "      <td>0.34540</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>x.concavity_mean</th>\n",
       "      <td>0.088799</td>\n",
       "      <td>0.079720</td>\n",
       "      <td>0.00000</td>\n",
       "      <td>0.42680</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>x.concave_pts_mean</th>\n",
       "      <td>0.048919</td>\n",
       "      <td>0.038803</td>\n",
       "      <td>0.00000</td>\n",
       "      <td>0.20120</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>x.symmetry_mean</th>\n",
       "      <td>0.181162</td>\n",
       "      <td>0.027414</td>\n",
       "      <td>0.10600</td>\n",
       "      <td>0.30400</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>x.fractal_dim_mean</th>\n",
       "      <td>0.062798</td>\n",
       "      <td>0.007060</td>\n",
       "      <td>0.04996</td>\n",
       "      <td>0.09744</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                          mean         std        min         max\n",
       "x.radius_mean        14.127292    3.524049    6.98100    28.11000\n",
       "x.texture_mean       19.289649    4.301036    9.71000    39.28000\n",
       "x.perimeter_mean     91.969033   24.298981   43.79000   188.50000\n",
       "x.area_mean         654.889104  351.914129  143.50000  2501.00000\n",
       "x.smoothness_mean     0.096360    0.014064    0.05263     0.16340\n",
       "x.compactness_mean    0.104341    0.052813    0.01938     0.34540\n",
       "x.concavity_mean      0.088799    0.079720    0.00000     0.42680\n",
       "x.concave_pts_mean    0.048919    0.038803    0.00000     0.20120\n",
       "x.symmetry_mean       0.181162    0.027414    0.10600     0.30400\n",
       "x.fractal_dim_mean    0.062798    0.007060    0.04996     0.09744"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "label_counts = wisconsin_df['label'].value_counts().rename_axis('label').reset_index(name='count')\n",
    "feature_summary = wisconsin_df.drop(columns=['y', 'label']).describe().T[['mean', 'std', 'min', 'max']].head(10)\n",
    "label_counts.to_csv(REPORTS / 'wisconsin_label_counts.csv', index=False)\n",
    "feature_summary.to_csv(REPORTS / 'wisconsin_feature_summary_head.csv')\n",
    "display(label_counts)\n",
    "display(feature_summary)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31826a12",
   "metadata": {
    "codex_research_commentary": true
   },
   "source": [
    "## Inspect Source Notebook Structure\n",
    "\n",
    "This cell reads headings from the published BreaScope AI notebook. It provides a quick audit trail of what the original branch contains and supports later integration notes.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "0e57cefc",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-19T20:17:09.790982Z",
     "iopub.status.busy": "2026-04-19T20:17:09.790902Z",
     "iopub.status.idle": "2026-04-19T20:17:09.798710Z",
     "shell.execute_reply": "2026-04-19T20:17:09.798281Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>heading</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td># BreaScope AI: Bayesian Deep Learning for Bre...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>## 1. Import Required Libraries</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>### Reflection</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>## 2. Load Dataset</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>### Reflection</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>### Class Label Encoding</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>### Reflection</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>### Preparing Features and Target Variables</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>### Reflection</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>## 3. Exploratory Data Analysis (EDA)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>### Reflection</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>### Summary Statistics</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>## 3.1 Class Distribution</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>## 3.2 Feature Correlation Heatmap</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>### Reflection</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>### 3.3 Feature Distribution (Histograms &amp; KDE)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>### Reflection</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>### 3.4 Box-Plots (Spread &amp; Outliers)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>### Reflection on Box-Plots (Outliers &amp; Spread)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>### 3.5 Skewness &amp; Kurtosis</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                              heading\n",
       "0   # BreaScope AI: Bayesian Deep Learning for Bre...\n",
       "1                     ## 1. Import Required Libraries\n",
       "2                                      ### Reflection\n",
       "3                                  ## 2. Load Dataset\n",
       "4                                      ### Reflection\n",
       "5                            ### Class Label Encoding\n",
       "6                                      ### Reflection\n",
       "7         ### Preparing Features and Target Variables\n",
       "8                                      ### Reflection\n",
       "9               ## 3. Exploratory Data Analysis (EDA)\n",
       "10                                     ### Reflection\n",
       "11                             ### Summary Statistics\n",
       "12                          ## 3.1 Class Distribution\n",
       "13                 ## 3.2 Feature Correlation Heatmap\n",
       "14                                     ### Reflection\n",
       "15    ### 3.3 Feature Distribution (Histograms & KDE)\n",
       "16                                     ### Reflection\n",
       "17              ### 3.4 Box-Plots (Spread & Outliers)\n",
       "18    ### Reflection on Box-Plots (Outliers & Spread)\n",
       "19                        ### 3.5 Skewness & Kurtosis"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import json\n",
    "with open(WISCONSIN_ROOT / 'BreaScope AI.ipynb', 'r', encoding='utf-8') as f:\n",
    "    wisconsin_nb = json.load(f)\n",
    "\n",
    "headings = []\n",
    "for cell in wisconsin_nb['cells']:\n",
    "    if cell.get('cell_type') == 'markdown':\n",
    "        for line in ''.join(cell.get('source', [])).splitlines():\n",
    "            if line.startswith('#'):\n",
    "                headings.append(line.strip())\n",
    "headings_df = pd.DataFrame({'heading': headings})\n",
    "headings_df.head(20)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf100506",
   "metadata": {
    "codex_research_commentary": true
   },
   "source": [
    "## Record Published Metrics\n",
    "\n",
    "This cell stores the metrics reported by the published Wisconsin branch. They are treated as published comparator values, not newly produced results from this workflow.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "cc8d2d1c",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-19T20:17:09.799875Z",
     "iopub.status.busy": "2026-04-19T20:17:09.799797Z",
     "iopub.status.idle": "2026-04-19T20:17:09.803843Z",
     "shell.execute_reply": "2026-04-19T20:17:09.803454Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>source</th>\n",
       "      <th>metric</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Published Wisconsin notebook</td>\n",
       "      <td>accuracy</td>\n",
       "      <td>0.960</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Published Wisconsin notebook</td>\n",
       "      <td>roc_auc</td>\n",
       "      <td>0.997</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         source    metric  value\n",
       "0  Published Wisconsin notebook  accuracy  0.960\n",
       "1  Published Wisconsin notebook   roc_auc  0.997"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "published_metrics = pd.DataFrame(\n",
    "    [\n",
    "        {'source': 'Published Wisconsin notebook', 'metric': 'accuracy', 'value': 0.96},\n",
    "        {'source': 'Published Wisconsin notebook', 'metric': 'roc_auc', 'value': 0.997},\n",
    "    ]\n",
    ")\n",
    "published_metrics.to_csv(REPORTS / 'wisconsin_published_metrics.csv', index=False)\n",
    "published_metrics\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c82cf804",
   "metadata": {
    "codex_research_commentary": true
   },
   "source": [
    "## Define Integration Contract\n",
    "\n",
    "This cell records what the web app and later notebooks need from the Wisconsin branch: feature schema, label mapping, model path, scaler path, and uncertainty method.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "d60d5797",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-19T20:17:09.805638Z",
     "iopub.status.busy": "2026-04-19T20:17:09.805485Z",
     "iopub.status.idle": "2026-04-19T20:17:09.810099Z",
     "shell.execute_reply": "2026-04-19T20:17:09.809722Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>item</th>\n",
       "      <th>detail</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>feature_schema</td>\n",
       "      <td>30 numeric diagnostic features after dropping ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>label_mapping</td>\n",
       "      <td>B -&gt; benign, M -&gt; malignant</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>saved_model</td>\n",
       "      <td>/Users/sergeysotskiy/Documents/UNI/year 3/Diss...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>saved_scaler</td>\n",
       "      <td>/Users/sergeysotskiy/Documents/UNI/year 3/Diss...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>uncertainty_method</td>\n",
       "      <td>Monte-Carlo Dropout reported in the published not...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 item                                             detail\n",
       "0      feature_schema  30 numeric diagnostic features after dropping ...\n",
       "1       label_mapping                        B -> benign, M -> malignant\n",
       "2         saved_model  /Users/sergeysotskiy/Documents/UNI/year 3/Diss...\n",
       "3        saved_scaler  /Users/sergeysotskiy/Documents/UNI/year 3/Diss...\n",
       "4  uncertainty_method  Monte-Carlo Dropout reported in the published not..."
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "integration_contract = pd.DataFrame(\n",
    "    [\n",
    "        {'item': 'feature_schema', 'detail': '30 numeric diagnostic features after dropping the CSV export index column.'},\n",
    "        {'item': 'label_mapping', 'detail': 'B -> benign, M -> malignant'},\n",
    "        {'item': 'saved_model', 'detail': str(WISCONSIN_ROOT / 'model.pt')},\n",
    "        {'item': 'saved_scaler', 'detail': str(WISCONSIN_ROOT / 'scaler.joblib')},\n",
    "        {'item': 'uncertainty_method', 'detail': 'Monte-Carlo Dropout reported in the published notebook'},\n",
    "    ]\n",
    ")\n",
    "integration_contract.to_csv(REPORTS / 'wisconsin_integration_contract.csv', index=False)\n",
    "integration_contract\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "187437d7",
   "metadata": {},
   "source": [
    "## Interpretation\n",
    "\n",
    "I am leaving the Wisconsin branch exactly as it is. At this stage I do not need another round of optimisation there, I need a clear record of what it expects and what it produces. That gives me a fixed tabular comparator, a stable input schema, and a clean handoff into the pairing notebooks without reopening a piece of work that was already finished.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ef24a96",
   "metadata": {
    "codex_research_commentary": true
   },
   "source": [
    "## How This Notebook Supports The Dissertation\n",
    "\n",
    "This notebook makes the tabular branch reusable without blurring ownership. It provides enough structure for comparison and deployment while preserving the rule that the original Wisconsin branch remains unchanged.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python (dissertation_dl)",
   "language": "python",
   "name": "dissertation_dl"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
