Add complete Mail Fine-Tuning Web-App for macOS Apple Silicon

Implemented a full-stack web application for fine-tuning LLMs on email data, optimized for Apple Silicon (M4 Pro with 24GB RAM). Features: - Mail import with drag & drop support (.mbox, .eml, .txt) - Automated mail cleaning and preprocessing - Interactive labeling interface with keyboard shortcuts - Training data export to JSONL format - MLX-based LoRA fine-tuning with live updates - Model evaluation and comparison interface - Server-Sent Events for real-time training progress - Dark theme UI optimized for extended use Technical Stack: - Backend: FastAPI with SQLite database - Frontend: Vanilla HTML/CSS/JavaScript (no external dependencies) - ML Framework: MLX for Apple Silicon optimization - Models: Support for Mistral 7B and Llama 3 8B via MLX Components: - data_manager.py: SQLite operations for mail storage and labeling - mail_parser.py: Parser for multiple mail formats with cleaning - training.py: MLX training wrapper with LoRA support - inference.py: Model loading and inference for evaluation - main.py: FastAPI backend with REST API and SSE - Frontend: Complete UI with all features Documentation: - Comprehensive README with installation and usage guide - Quick-start guide for rapid setup - Example mails for testing - Troubleshooting and best practices Ready for local deployment and fine-tuning workflows.
2025-12-03 07:35:35 +00:00
commit 1456995462
20 changed files with 3884 additions and 0 deletions
@@ -0,0 +1,36 @@
 # Python
 __pycache__/
 *.py[cod]
 *$py.class
 *.so
 .Python
 venv/
 env/
 ENV/
 # Data
 data/*.db
 data/*.jsonl
 data/temp/
 # Models
 models/*
 !models/.gitkeep
 # Training outputs
 output/*
 !output/.gitkeep
 # IDE
 .vscode/
 .idea/
 *.swp
 *.swo
 *~
 # OS
 .DS_Store
 Thumbs.db
 # Logs
 *.log
@@ -0,0 +1,209 @@
 # Quick Start Guide
 Schnellstart-Anleitung für die Mail Fine-Tuning App.
 ## 1. Installation (5 Minuten)
 ```bash
 # 1. Virtual Environment erstellen
 python3 -m venv venv
 source venv/bin/activate
 # 2. Dependencies installieren
 pip install -r requirements.txt
 # 3. Modell herunterladen (ca. 4GB, dauert je nach Internetverbindung)
 huggingface-cli download mlx-community/Mistral-7B-Instruct-v0.3-4bit \
    --local-dir models/Mistral-7B-Instruct-v0.3-4bit
 ```
 ## 2. Server starten
 ```bash
 ./start.sh
 ```
 Oder manuell:
 ```bash
 source venv/bin/activate
 cd backend
 python main.py
 ```
 App öffnen: **http://localhost:8000**
 ## 3. Erste Schritte (10 Minuten)
 ### Schritt 1: Test-Mails erstellen
 Erstelle eine Datei `test.txt` mit einer Beispiel-Mail:
 ```
 Subject: Projekt Update
 From: max@example.com
 To: team@example.com
 Hallo Team,
 das neue Feature ist fertig und bereit für Testing.
 Ich habe die API-Integration abgeschlossen und alle Tests laufen durch.
 Bitte reviewt den Code bis Freitag.
 Grüße
 Max
 ```
 ### Schritt 2: Mails importieren
 1. Öffne http://localhost:8000
 2. Ziehe `test.txt` in den Upload-Bereich
 3. Mail erscheint in der Liste
 ### Schritt 3: Erste Mail labeln
 1. Klicke auf "Labeling" in der Sidebar
 2. Wähle **Aufgabentyp**: "Zusammenfassen"
 3. Gib **erwarteten Output** ein:
   ```
   Max hat das neue Feature fertiggestellt und alle Tests sind erfolgreich.
   Das Team soll den Code bis Freitag reviewen.
   ```
 4. Klicke "Speichern" (oder drücke `S`)
 ### Schritt 4: Mehr Mails labeln
 - Erstelle mindestens **20-50 Beispiel-Mails**
 - Nutze verschiedene Typen:
  - Zusammenfassen
  - Antwort schreiben
  - Action Items extrahieren
 - Nutze Shortcuts: `N` (Nächste), `S` (Speichern)
 ### Schritt 5: Statistiken prüfen
 1. Gehe zu "Export & Stats"
 2. Prüfe:
   - Mind. 50 gelabelte Mails? ✅
   - Gute Verteilung der Task-Types? ✅
 ### Schritt 6: Training starten
 1. Gehe zu "Training"
 2. Wähle dein Modell aus
 3. Nutze Standard-Einstellungen:
   - Learning Rate: 1e-5
   - Epochs: 3
   - Batch Size: 4
   - LoRA Rank: 8
 4. Klicke "Training starten"
 5. Beobachte Live-Updates
 ⏱️ **Training dauert**: Ca. 5-10 Minuten bei 50 Beispielen
 ### Schritt 7: Modell testen
 1. Gehe zu "Evaluation"
 2. Klicke "Test-Beispiel laden"
 3. Klicke "Vergleich starten"
 4. Vergleiche Base- vs. Fine-tuned-Ausgabe
 ## Tipps
 ### Gute Trainingsdaten
 ✅ **DO**:
 - Mindestens 50 Beispiele
 - Konsistenter Output-Stil
 - Diverse Mail-Typen
 - Klare, eindeutige Labels
 ❌ **DON'T**:
 - Zu wenige Beispiele (<20)
 - Widersprüchliche Labels
 - Nur sehr ähnliche Mails
 - Zu lange Outputs (>500 Wörter)
 ### Training-Parameter
 Für **erste Versuche**:
 - Learning Rate: **1e-5**
 - Epochs: **3**
 - Batch Size: **4**
 - LoRA Rank: **8**
 Bei **Overfitting** (Val Loss steigt):
 - Learning Rate: **5e-6** (niedriger)
 - Epochs: **2** (weniger)
 Bei **Underfitting** (beide Losses hoch):
 - Epochs: **5** (mehr)
 - LoRA Rank: **16** (höher)
 - Mehr Daten sammeln!
 ### Keyboard Shortcuts
 Im Labeling-Interface:
 - `N` - Nächste Mail
 - `S` - Speichern
 - `K` - Skip (Überspringen)
 ## Troubleshooting
 ### Server startet nicht
 ```bash
 # Prüfe Python-Version (mind. 3.10)
 python3 --version
 # Prüfe ob Port 8000 frei ist
 lsof -i :8000
 # Nutze anderen Port
 uvicorn main:app --port 8001
 ```
 ### Modell nicht gefunden
 ```bash
 # Prüfe ob Modell existiert
 ls -la models/
 # Download nochmal versuchen
 huggingface-cli download mlx-community/Mistral-7B-Instruct-v0.3-4bit \
    --local-dir models/Mistral-7B-Instruct-v0.3-4bit
 ```
 ### Out of Memory
 Reduziere Batch Size:
 1. Gehe zu "Training"
 2. Setze Batch Size auf **2** oder **1**
 ### Training sehr langsam
 - Nutze 4-bit quantisierte Modelle
 - Reduziere Batch Size
 - Schließe andere Programme
 ## Nächste Schritte
 Nach erfolgreichem ersten Training:
 1. **Mehr Daten sammeln**: 100+ Beispiele für bessere Ergebnisse
 2. **Parameter tunen**: Experimentiere mit Learning Rate und Epochs
 3. **Verschiedene Tasks**: Probiere alle Task-Types aus
 4. **Evaluation**: Teste ausgiebig mit neuen Mails
 ## Ressourcen
 - Vollständige Doku: [README.md](README.md)
 - MLX Doku: https://ml-explore.github.io/mlx/
 - MLX-LM: https://github.com/ml-explore/mlx-examples
 ---
 **Viel Erfolg! 🚀**
 Bei Fragen schaue ins vollständige README oder die API-Dokumentation.
@@ -0,0 +1,326 @@
 # Mail Fine-Tuning Web-App für macOS (Apple Silicon)
 Eine vollständige lokale Web-Anwendung für das Fine-Tuning von LLMs auf Mail-Daten, optimiert für Apple Silicon (M4 Pro mit 24GB RAM).
 ## Features
 - 📥 **Mail Import**: Drag & Drop Upload von .mbox, .eml, .txt Dateien mit automatischer Bereinigung
 - 🏷️ **Labeling Interface**: Komfortable UI zum manuellen Labeln von Mails
 - 📊 **Export & Statistiken**: JSONL Export für Training mit detaillierten Statistiken
 - 🤖 **Modell-Management**: Verwaltung von MLX-Modellen
 - 🎯 **Training**: LoRA Fine-Tuning mit Live-Updates und Visualisierung
 - 🧪 **Evaluation**: Chat-Interface mit Vergleichsmodus (Base vs. Fine-tuned)
 ## Technologie-Stack
 - **Backend**: Python (FastAPI)
 - **Frontend**: HTML/CSS/JavaScript (Vanilla, keine Dependencies)
 - **ML Framework**: MLX (Apple Silicon optimiert)
 - **Database**: SQLite
 - **Empfohlene Modelle**: Mistral 7B, Llama 3 8B (via MLX)
 ## Projektstruktur
 ```
 mail-finetuning/
 ├── backend/
 │   ├── main.py              # FastAPI App
 │   ├── mail_parser.py       # Mail Import & Bereinigung
 │   ├── data_manager.py      # SQLite Operationen
 │   ├── training.py          # MLX Training Wrapper
 │   └── inference.py         # Modell-Inferenz
 ├── frontend/
 │   ├── index.html
 │   ├── style.css
 │   └── app.js
 ├── data/
 │   ├── mails.db             # SQLite Datenbank
 │   ├── train.jsonl
 │   └── val.jsonl
 ├── models/                  # Heruntergeladene Modelle
 ├── output/                  # Trainierte Adapter
 └── requirements.txt
 ```
 ## Installation
 ### Voraussetzungen
 - macOS mit Apple Silicon (M1/M2/M3/M4)
 - Python 3.10 oder höher
 - mindestens 16GB RAM (24GB empfohlen)
 ### 1. Repository Setup
 ```bash
 cd training
 ```
 ### 2. Virtual Environment erstellen
 ```bash
 python3 -m venv venv
 source venv/bin/activate
 ```
 ### 3. Dependencies installieren
 ```bash
 pip install -r requirements.txt
 ```
 ### 4. Modell herunterladen
 Wähle ein MLX-optimiertes Modell von Hugging Face:
 ```bash
 # Mistral 7B (4-bit quantisiert, ~4GB)
 huggingface-cli download mlx-community/Mistral-7B-Instruct-v0.3-4bit \
    --local-dir models/Mistral-7B-Instruct-v0.3-4bit
 # ODER Llama 3 8B (4-bit quantisiert, ~5GB)
 huggingface-cli download mlx-community/Meta-Llama-3-8B-Instruct-4bit \
    --local-dir models/Meta-Llama-3-8B-Instruct-4bit
 ```
 **Hinweis**: Die 4-bit Versionen sind für 24GB RAM optimal. Für mehr RAM können auch größere Versionen genutzt werden.
 ## Nutzung
 ### 1. Server starten
 ```bash
 cd backend
 python main.py
 ```
 Die App ist dann verfügbar unter: **http://localhost:8000**
 ### 2. Workflow
 #### Schritt 1: Mails importieren
 1. Gehe zu "Mail Import"
 2. Ziehe .eml, .mbox oder .txt Dateien per Drag & Drop in den Upload-Bereich
 3. Die Mails werden automatisch geparst und bereinigt
 #### Schritt 2: Mails labeln
 1. Wechsle zu "Labeling"
 2. Für jede Mail:
   - Wähle den **Aufgabentyp** (Zusammenfassen, Antwort schreiben, etc.)
   - Gib den **erwarteten Output** ein
   - Klicke "Speichern" oder nutze Shortcut `S`
 3. Nutze Shortcuts: `N` (Nächste), `S` (Speichern), `K` (Skip)
 **Tipp**: Mindestens 50 gelabelte Beispiele für gutes Fine-Tuning!
 #### Schritt 3: Daten exportieren
 1. Gehe zu "Export & Stats"
 2. Prüfe die Statistiken (mind. 50 gelabelte Mails empfohlen)
 3. Klicke "JSONL generieren"
 4. Optional: Download der JSONL-Dateien zur Archivierung
 #### Schritt 4: Training starten
 1. Wechsle zu "Training"
 2. Konfiguriere Parameter:
   - **Modell**: Wähle heruntergeladenes Modell
   - **Learning Rate**: Standard 1e-5 (bei Overfitting niedriger)
   - **Epochs**: 3-5 für erste Versuche
   - **Batch Size**: 4 (bei 24GB RAM sicher)
   - **LoRA Rank**: 8-16 (höher = mehr Kapazität, mehr RAM)
 3. Klicke "Training starten"
 4. Beobachte Live-Updates:
   - Training/Validation Loss
   - Fortschritt und ETA
   - Speichernutzung
 **Warnung bei Overfitting**: Wenn Validation Loss steigt während Training Loss sinkt, Training abbrechen!
 #### Schritt 5: Modell testen
 1. Gehe zu "Evaluation"
 2. Wähle Task-Type und gib Mail-Text ein
 3. Klicke "Vergleich starten"
 4. Sieh dir die Ausgaben von Base- und Fine-tuned-Modell an
 ### 3. Export des fertigen Modells
 Nach erfolgreichem Training liegen die LoRA-Adapter in `output/run_[timestamp]/adapters.npz`.
 Um das Modell zu nutzen:
 ```python
 from mlx_lm import load
 model = load(
    "models/Mistral-7B-Instruct-v0.3-4bit",
    adapter_path="output/run_1234567890/adapters.npz"
 )
 ```
 ## API Endpoints
 ### Mails
 - `POST /api/mails/upload` - Mails hochladen
 - `GET /api/mails` - Alle Mails abrufen
 - `GET /api/mails/{id}` - Einzelne Mail
 - `PUT /api/mails/{id}` - Mail aktualisieren (Labeling)
 - `DELETE /api/mails/{id}` - Mail löschen
 ### Export
 - `GET /api/export/stats` - Statistiken
 - `POST /api/export/jsonl` - Training-Daten generieren
 - `GET /api/export/download/{train|val}` - JSONL herunterladen
 ### Modelle
 - `GET /api/models` - Verfügbare Modelle
 - `POST /api/models/download` - Modell herunterladen (Placeholder)
 ### Training
 - `POST /api/training/start` - Training starten
 - `POST /api/training/stop` - Training stoppen
 - `GET /api/training/status` - Status abrufen
 - `GET /api/training/stream` - SSE Stream für Live-Updates
 ### Inference
 - `POST /api/inference/load` - Modell laden
 - `GET /api/inference/loaded` - Geladene Modelle
 - `POST /api/inference/generate` - Text generieren
 - `POST /api/inference/compare` - Modell-Vergleich
 - `GET /api/inference/test-prompts` - Test-Prompts
 ## Tipps & Best Practices
 ### Datenqualität
 - **Mindestens 50 Beispiele** pro Task-Type
 - **Einheitlicher Output-Stil**: Achte auf konsistente Formatierung
 - **Diverse Beispiele**: Verschiedene Mail-Längen und Stile
 - **Klare Labels**: Vermeide mehrdeutige oder widersprüchliche Labels
 ### Training
 - **Learning Rate**:
  - 1e-5 für die meisten Fälle
  - 5e-6 bei Overfitting
  - 1e-4 bei sehr kleinem Datensatz (Vorsicht!)
 - **Epochs**:
  - 3 Epochs für Start
  - Mehr Epochs wenn Loss noch sinkt
  - Weniger wenn Overfitting auftritt
 - **LoRA Rank**:
  - 8 für einfache Tasks
  - 16-32 für komplexe Tasks
  - Höher = mehr Kapazität aber mehr RAM
 ### Overfitting erkennen
 Zeichen von Overfitting:
 - ✅ Training Loss sinkt kontinuierlich
 - ❌ Validation Loss steigt oder stagniert
 - ❌ Modell "memoriert" exakte Trainingsbeispiele
 Lösungen:
 - Mehr Daten sammeln
 - Kleinere Learning Rate
 - Weniger Epochs
 - Niedrigere LoRA Rank
 ## Troubleshooting
 ### "Out of Memory" Fehler
 - Reduziere Batch Size (4 → 2 → 1)
 - Nutze kleineres Modell (4-bit quantisiert)
 - Schließe andere Programme
 ### Training sehr langsam
 - Prüfe ob Metal Performance Shaders aktiv sind
 - Nutze 4-bit quantisierte Modelle
 - Reduziere max_seq_length (Standard: 2048)
 ### Modell gibt schlechte Ergebnisse
 - Mehr/bessere Trainingsdaten
 - Längeres Training (mehr Epochs)
 - Höhere LoRA Rank
 - Prüfe Prompt-Format
 ## Wichtige Hinweise
 ### MLX Training Loop
 **WICHTIG**: Die aktuelle Implementierung in `training.py` enthält eine **simulierte Training Loop**. Für produktiven Einsatz muss diese durch echtes MLX Training ersetzt werden:
 ```python
 # Beispiel für echtes MLX Training mit mlx-lm
 from mlx_lm.tuner import train
 train(
    model_path=str(model_path),
    data_path=str(train_file),
    val_data_path=str(val_file),
    adapter_file=str(output_path / 'adapters.npz'),
    iters=total_steps,
    learning_rate=config.learning_rate,
    batch_size=config.batch_size,
    # ... weitere Parameter
 )
 ```
 Siehe [mlx-lm Dokumentation](https://github.com/ml-explore/mlx-examples/tree/main/llms) für Details.
 ### Inference
 Die Inference-Implementation in `inference.py` nutzt `mlx_lm.generate()`. Stelle sicher, dass das richtige Prompt-Format für dein Modell genutzt wird (z.B. ChatML, Llama-Format, etc.).
 ## Entwicklung
 ### Debug-Modus
 ```bash
 uvicorn main:app --reload --log-level debug
 ```
 ### Tests (TODO)
 ```bash
 pytest tests/
 ```
 ## Lizenz
 MIT License
 ## Support
 Bei Problemen:
 1. Prüfe die Browser Console (F12) für Frontend-Fehler
 2. Prüfe die Server-Logs für Backend-Fehler
 3. Stelle sicher, dass alle Dependencies installiert sind
 4. Prüfe, dass MLX korrekt auf Apple Silicon läuft
 ## Roadmap
 - [ ] Echte MLX Training Loop implementieren
 - [ ] Automatisches Checkpoint-Management
 - [ ] Model Merging (Base + Adapter zusammenführen)
 - [ ] Export für Deployment
 - [ ] Batch-Inference
 - [ ] Tests
 - [ ] Docker Support
 ---
 **Viel Erfolg beim Fine-Tuning! 🚀**
@@ -0,0 +1,286 @@
 """
 Data Manager für Mail Fine-Tuning App
 Verwaltet SQLite Datenbank für Mails und Labels
 """
 import sqlite3
 import json
 from datetime import datetime
 from typing import List, Dict, Optional
 from pathlib import Path
 class DataManager:
    def __init__(self, db_path: str = "data/mails.db"):
        self.db_path = Path(db_path)
        self.db_path.parent.mkdir(parents=True, exist_ok=True)
        self.init_db()
    def init_db(self):
        """Initialisiert die Datenbank mit dem Schema"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        cursor.execute("""
            CREATE TABLE IF NOT EXISTS mails (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                subject TEXT,
                sender TEXT,
                recipient TEXT,
                date TEXT,
                body TEXT NOT NULL,
                original_format TEXT,
                task_type TEXT DEFAULT 'unlabeled',
                expected_output TEXT,
                status TEXT DEFAULT 'unlabeled',
                created_at TEXT DEFAULT CURRENT_TIMESTAMP,
                updated_at TEXT DEFAULT CURRENT_TIMESTAMP
            )
        """)
        cursor.execute("""
            CREATE TABLE IF NOT EXISTS training_runs (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                model_name TEXT NOT NULL,
                start_time TEXT,
                end_time TEXT,
                config TEXT,
                status TEXT,
                final_train_loss REAL,
                final_val_loss REAL,
                checkpoint_path TEXT
            )
        """)
        conn.commit()
        conn.close()
    def add_mail(self, subject: str, sender: str, recipient: str,
                 date: str, body: str, original_format: str) -> int:
        """Fügt eine neue Mail hinzu"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        cursor.execute("""
            INSERT INTO mails (subject, sender, recipient, date, body, original_format)
            VALUES (?, ?, ?, ?, ?, ?)
        """, (subject, sender, recipient, date, body, original_format))
        mail_id = cursor.lastrowid
        conn.commit()
        conn.close()
        return mail_id
    def get_all_mails(self, status_filter: Optional[str] = None) -> List[Dict]:
        """Holt alle Mails, optional gefiltert nach Status"""
        conn = sqlite3.connect(self.db_path)
        conn.row_factory = sqlite3.Row
        cursor = conn.cursor()
        if status_filter:
            cursor.execute("SELECT * FROM mails WHERE status = ? ORDER BY id", (status_filter,))
        else:
            cursor.execute("SELECT * FROM mails ORDER BY id")
        rows = cursor.fetchall()
        mails = [dict(row) for row in rows]
        conn.close()
        return mails
    def get_mail(self, mail_id: int) -> Optional[Dict]:
        """Holt eine einzelne Mail"""
        conn = sqlite3.connect(self.db_path)
        conn.row_factory = sqlite3.Row
        cursor = conn.cursor()
        cursor.execute("SELECT * FROM mails WHERE id = ?", (mail_id,))
        row = cursor.fetchone()
        conn.close()
        return dict(row) if row else None
    def update_mail(self, mail_id: int, task_type: Optional[str] = None,
                   expected_output: Optional[str] = None,
                   status: Optional[str] = None,
                   body: Optional[str] = None) -> bool:
        """Aktualisiert eine Mail (Labeling)"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        updates = []
        params = []
        if task_type is not None:
            updates.append("task_type = ?")
            params.append(task_type)
        if expected_output is not None:
            updates.append("expected_output = ?")
            params.append(expected_output)
        if status is not None:
            updates.append("status = ?")
            params.append(status)
        if body is not None:
            updates.append("body = ?")
            params.append(body)
        if not updates:
            conn.close()
            return False
        updates.append("updated_at = ?")
        params.append(datetime.now().isoformat())
        params.append(mail_id)
        query = f"UPDATE mails SET {', '.join(updates)} WHERE id = ?"
        cursor.execute(query, params)
        success = cursor.rowcount > 0
        conn.commit()
        conn.close()
        return success
    def delete_mail(self, mail_id: int) -> bool:
        """Löscht eine Mail"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        cursor.execute("DELETE FROM mails WHERE id = ?", (mail_id,))
        success = cursor.rowcount > 0
        conn.commit()
        conn.close()
        return success
    def get_statistics(self) -> Dict:
        """Berechnet Statistiken über die Daten"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        # Gesamt-Anzahl
        cursor.execute("SELECT COUNT(*) FROM mails")
        total = cursor.fetchone()[0]
        # Nach Status
        cursor.execute("""
            SELECT status, COUNT(*) as count
            FROM mails
            GROUP BY status
        """)
        status_counts = {row[0]: row[1] for row in cursor.fetchall()}
        # Nach Task-Type
        cursor.execute("""
            SELECT task_type, COUNT(*) as count
            FROM mails
            WHERE status = 'labeled'
            GROUP BY task_type
        """)
        task_counts = {row[0]: row[1] for row in cursor.fetchall()}
        # Durchschnittliche Längen (nur gelabelte)
        cursor.execute("""
            SELECT
                AVG(LENGTH(body)) as avg_input_length,
                AVG(LENGTH(expected_output)) as avg_output_length
            FROM mails
            WHERE status = 'labeled'
        """)
        lengths = cursor.fetchone()
        conn.close()
        labeled_count = status_counts.get('labeled', 0)
        return {
            'total': total,
            'labeled': labeled_count,
            'unlabeled': status_counts.get('unlabeled', 0),
            'skipped': status_counts.get('skip', 0),
            'task_distribution': task_counts,
            'avg_input_length': round(lengths[0]) if lengths[0] else 0,
            'avg_output_length': round(lengths[1]) if lengths[1] else 0,
            'sufficient_data': labeled_count >= 50
        }
    def export_training_data(self, train_split: float = 0.9) -> tuple[List[Dict], List[Dict]]:
        """Exportiert gelabelte Daten für Training"""
        import random
        conn = sqlite3.connect(self.db_path)
        conn.row_factory = sqlite3.Row
        cursor = conn.cursor()
        cursor.execute("""
            SELECT body, task_type, expected_output
            FROM mails
            WHERE status = 'labeled' AND expected_output IS NOT NULL
            ORDER BY RANDOM()
        """)
        rows = cursor.fetchall()
        conn.close()
        if not rows:
            return [], []
        data = [dict(row) for row in rows]
        # Shuffle
        random.shuffle(data)
        # Split
        split_idx = int(len(data) * train_split)
        train_data = data[:split_idx]
        val_data = data[split_idx:]
        return train_data, val_data
    def save_training_run(self, model_name: str, config: Dict,
                         checkpoint_path: str) -> int:
        """Speichert einen Training-Run"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        cursor.execute("""
            INSERT INTO training_runs
            (model_name, start_time, config, status, checkpoint_path)
            VALUES (?, ?, ?, ?, ?)
        """, (
            model_name,
            datetime.now().isoformat(),
            json.dumps(config),
            'running',
            checkpoint_path
        ))
        run_id = cursor.lastrowid
        conn.commit()
        conn.close()
        return run_id
    def update_training_run(self, run_id: int, status: str,
                          train_loss: Optional[float] = None,
                          val_loss: Optional[float] = None):
        """Aktualisiert einen Training-Run"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        cursor.execute("""
            UPDATE training_runs
            SET status = ?,
                end_time = ?,
                final_train_loss = COALESCE(?, final_train_loss),
                final_val_loss = COALESCE(?, final_val_loss)
            WHERE id = ?
        """, (status, datetime.now().isoformat(), train_loss, val_loss, run_id))
        conn.commit()
        conn.close()
@@ -0,0 +1,209 @@
 """
 Inference Module für Modell-Evaluation
 Lädt Base- und Fine-tuned Models für Vergleiche
 """
 from pathlib import Path
 from typing import Optional, Dict
 import threading
 class ModelInference:
    """Handhabt Modell-Inferenz für Base und Fine-tuned Models"""
    def __init__(self, models_dir: str = "models", output_dir: str = "output"):
        self.models_dir = Path(models_dir)
        self.output_dir = Path(output_dir)
        self.base_model = None
        self.finetuned_model = None
        self.model_lock = threading.Lock()
    def load_base_model(self, model_name: str) -> bool:
        """Lädt das Basis-Modell"""
        try:
            # Import MLX nur bei Bedarf
            from mlx_lm import load
            model_path = self.models_dir / model_name
            if not model_path.exists():
                return False
            with self.model_lock:
                self.base_model = load(str(model_path))
            return True
        except Exception as e:
            print(f"Error loading base model: {e}")
            return False
    def load_finetuned_model(self, model_name: str, adapter_path: str) -> bool:
        """Lädt das Fine-tuned Modell (Base + LoRA Adapter)"""
        try:
            from mlx_lm import load
            model_path = self.models_dir / model_name
            adapter_file = Path(adapter_path)
            if not model_path.exists() or not adapter_file.exists():
                return False
            with self.model_lock:
                # Lade Base Model mit Adapter
                self.finetuned_model = load(
                    str(model_path),
                    adapter_path=str(adapter_file)
                )
            return True
        except Exception as e:
            print(f"Error loading finetuned model: {e}")
            return False
    def generate(self, prompt: str, model_type: str = 'base',
                max_tokens: int = 512, temperature: float = 0.7) -> str:
        """
        Generiert Text mit dem gewählten Modell
        Args:
            prompt: Input prompt
            model_type: 'base' oder 'finetuned'
            max_tokens: Maximale Anzahl Tokens
            temperature: Sampling temperature
        Returns:
            Generierter Text
        """
        try:
            from mlx_lm import generate as mlx_generate
            model = self.base_model if model_type == 'base' else self.finetuned_model
            if model is None:
                return f"Error: {model_type} model not loaded"
            with self.model_lock:
                # MLX-LM generate
                result = mlx_generate(
                    model,
                    prompt=prompt,
                    max_tokens=max_tokens,
                    temp=temperature
                )
            return result
        except Exception as e:
            return f"Error during generation: {str(e)}"
    def generate_comparison(self, prompt: str, max_tokens: int = 512,
                          temperature: float = 0.7) -> Dict[str, str]:
        """
        Generiert mit beiden Modellen für Vergleich
        Returns:
            Dict mit 'base' und 'finetuned' Outputs
        """
        result = {
            'base': None,
            'finetuned': None
        }
        if self.base_model:
            result['base'] = self.generate(
                prompt, 'base', max_tokens, temperature
            )
        if self.finetuned_model:
            result['finetuned'] = self.generate(
                prompt, 'finetuned', max_tokens, temperature
            )
        return result
    def format_mail_prompt(self, task_type: str, mail_body: str) -> str:
        """Formatiert einen Prompt basierend auf Task-Type"""
        task_prompts = {
            'Zusammenfassen': 'Fasse folgende E-Mail zusammen:',
            'Antwort schreiben': 'Schreibe eine Antwort auf folgende E-Mail:',
            'Kategorisieren': 'Kategorisiere folgende E-Mail:',
            'Action Items': 'Extrahiere die Action Items aus folgender E-Mail:',
            'Custom': 'Bearbeite folgende E-Mail:'
        }
        instruction = task_prompts.get(task_type, task_prompts['Custom'])
        return f"{instruction}\n\n{mail_body}"
    def get_test_prompts(self) -> Dict[str, str]:
        """Vordefinierte Test-Prompts"""
        return {
            'Zusammenfassen': self.format_mail_prompt(
                'Zusammenfassen',
                """Betreff: Q4 Projektupdate
 Hallo Team,
 ich wollte euch ein kurzes Update zum aktuellen Projektstand geben.
 Wir haben letzte Woche die neue API-Integration abgeschlossen und erfolgreich getestet.
 Die Performance-Tests zeigen eine Verbesserung von 40% gegenüber der alten Implementierung.
 Nächste Woche starten wir mit der Frontend-Anpassung. Maria und Tom werden das Design
 überarbeiten, während ich mich um die Backend-Anbindung kümmere.
 Der Go-Live ist weiterhin für Ende des Monats geplant.
 Beste Grüße
 Alex"""
            ),
            'Antwort schreiben': self.format_mail_prompt(
                'Antwort schreiben',
                """Betreff: Frage zu Invoice #2847
 Hallo,
 ich habe eine Frage zur Rechnung #2847 vom 15. März.
 Der Betrag scheint nicht mit unserem Angebot übereinzustimmen.
 Könnten Sie das bitte prüfen?
 Danke
 Michael"""
            ),
            'Action Items': self.format_mail_prompt(
                'Action Items',
                """Betreff: Meeting Notes - Produktlaunch
 Hi alle,
 hier die wichtigsten Punkte vom heutigen Meeting:
 - Sarah bereitet die Pressemitteilung vor (Deadline: Freitag)
 - Marketing-Team erstellt Social Media Content (nächste Woche)
 - Ich kümmere mich um die Influencer-Kontakte
 - Wir brauchen noch finale Produktfotos vom Design-Team
 - Launch-Event ist am 1. April - Location muss noch gebucht werden
 Bitte gebt bis Mittwoch Bescheid ob ihr eure Aufgaben schaffen könnt.
 Lisa"""
            )
        }
    def unload_models(self):
        """Entlädt Modelle aus dem Speicher"""
        with self.model_lock:
            self.base_model = None
            self.finetuned_model = None
    def get_loaded_models(self) -> Dict[str, bool]:
        """Gibt zurück welche Modelle geladen sind"""
        return {
            'base': self.base_model is not None,
            'finetuned': self.finetuned_model is not None
        }
@@ -0,0 +1,264 @@
 """
 Mail Parser für verschiedene Formate
 Bereinigt und normalisiert Mail-Inhalte
 """
 import email
 import mailbox
 import re
 from bs4 import BeautifulSoup
 from typing import List, Dict, Optional
 from pathlib import Path
 import chardet
 class MailParser:
    """Parst und bereinigt Mail-Dateien"""
    # Häufige Footer/Disclaimer Pattern
    FOOTER_PATTERNS = [
        r'(?i)^--\s*$.*',  # Standard signature delimiter
        r'(?i)Diese E-Mail.*vertraulich.*',
        r'(?i)This email.*confidential.*',
        r'(?i)Disclaimer:.*',
        r'(?i)Get Outlook for.*',
        r'(?i)Sent from my iPhone.*',
        r'(?i)Von meinem.*gesendet.*',
        r'(?i)Diese Nachricht.*Virenfrei.*',
    ]
    @staticmethod
    def detect_encoding(file_path: Path) -> str:
        """Erkennt das Encoding einer Datei"""
        with open(file_path, 'rb') as f:
            raw_data = f.read()
            result = chardet.detect(raw_data)
            return result['encoding'] or 'utf-8'
    @staticmethod
    def html_to_text(html: str) -> str:
        """Konvertiert HTML zu Plain Text"""
        soup = BeautifulSoup(html, 'html.parser')
        # Entferne Script und Style Tags
        for script in soup(['script', 'style']):
            script.decompose()
        # Extrahiere Text
        text = soup.get_text()
        # Bereinige Whitespace
        lines = (line.strip() for line in text.splitlines())
        chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
        text = ' '.join(chunk for chunk in chunks if chunk)
        return text
    @staticmethod
    def remove_multiple_newlines(text: str) -> str:
        """Entfernt mehrfache Leerzeilen"""
        return re.sub(r'\n{3,}', '\n\n', text)
    @staticmethod
    def remove_footers(text: str) -> str:
        """Entfernt häufige Footer und Disclaimer"""
        for pattern in MailParser.FOOTER_PATTERNS:
            # Suche Pattern und entferne alles danach
            match = re.search(pattern, text, re.MULTILINE | re.DOTALL)
            if match:
                text = text[:match.start()].strip()
        return text
    @staticmethod
    def clean_quoted_text(text: str) -> str:
        """Entfernt oder markiert quoted Text (> oder |)"""
        lines = text.split('\n')
        cleaned_lines = []
        for line in lines:
            # Überspringe Zeilen die mit > oder | beginnen (quoted text)
            if not line.strip().startswith('>') and not line.strip().startswith('|'):
                cleaned_lines.append(line)
        return '\n'.join(cleaned_lines)
    @staticmethod
    def normalize_whitespace(text: str) -> str:
        """Normalisiert Whitespace"""
        # Entferne trailing spaces
        lines = [line.rstrip() for line in text.split('\n')]
        text = '\n'.join(lines)
        # Entferne mehrfache Spaces
        text = re.sub(r' {2,}', ' ', text)
        # Entferne mehrfache Leerzeilen
        text = MailParser.remove_multiple_newlines(text)
        return text.strip()
    @staticmethod
    def clean_text(text: str, is_html: bool = False) -> str:
        """Vollständige Bereinigung eines Texts"""
        if is_html:
            text = MailParser.html_to_text(text)
        text = MailParser.remove_footers(text)
        text = MailParser.clean_quoted_text(text)
        text = MailParser.normalize_whitespace(text)
        return text
    @staticmethod
    def parse_eml(file_path: Path) -> Dict:
        """Parst eine .eml Datei"""
        encoding = MailParser.detect_encoding(file_path)
        with open(file_path, 'r', encoding=encoding, errors='ignore') as f:
            msg = email.message_from_file(f)
        subject = msg.get('Subject', 'No Subject')
        sender = msg.get('From', 'Unknown')
        recipient = msg.get('To', 'Unknown')
        date = msg.get('Date', '')
        # Body extrahieren
        body = ""
        is_html = False
        if msg.is_multipart():
            for part in msg.walk():
                content_type = part.get_content_type()
                if content_type == 'text/plain':
                    body = part.get_payload(decode=True).decode(errors='ignore')
                    break
                elif content_type == 'text/html' and not body:
                    body = part.get_payload(decode=True).decode(errors='ignore')
                    is_html = True
        else:
            body = msg.get_payload(decode=True).decode(errors='ignore')
            if msg.get_content_type() == 'text/html':
                is_html = True
        # Bereinige Body
        body = MailParser.clean_text(body, is_html)
        return {
            'subject': subject,
            'sender': sender,
            'recipient': recipient,
            'date': date,
            'body': body,
            'original_format': 'eml'
        }
    @staticmethod
    def parse_mbox(file_path: Path) -> List[Dict]:
        """Parst eine .mbox Datei"""
        mails = []
        try:
            mbox = mailbox.mbox(str(file_path))
            for message in mbox:
                subject = message.get('Subject', 'No Subject')
                sender = message.get('From', 'Unknown')
                recipient = message.get('To', 'Unknown')
                date = message.get('Date', '')
                body = ""
                is_html = False
                if message.is_multipart():
                    for part in message.walk():
                        content_type = part.get_content_type()
                        if content_type == 'text/plain':
                            payload = part.get_payload(decode=True)
                            if payload:
                                body = payload.decode(errors='ignore')
                            break
                        elif content_type == 'text/html' and not body:
                            payload = part.get_payload(decode=True)
                            if payload:
                                body = payload.decode(errors='ignore')
                                is_html = True
                else:
                    payload = message.get_payload(decode=True)
                    if payload:
                        body = payload.decode(errors='ignore')
                        if message.get_content_type() == 'text/html':
                            is_html = True
                body = MailParser.clean_text(body, is_html)
                mails.append({
                    'subject': subject,
                    'sender': sender,
                    'recipient': recipient,
                    'date': date,
                    'body': body,
                    'original_format': 'mbox'
                })
        except Exception as e:
            raise Exception(f"Error parsing mbox: {str(e)}")
        return mails
    @staticmethod
    def parse_txt(file_path: Path) -> Dict:
        """Parst eine .txt Datei (simple Mail als Text)"""
        encoding = MailParser.detect_encoding(file_path)
        with open(file_path, 'r', encoding=encoding, errors='ignore') as f:
            content = f.read()
        # Einfache Struktur: Versuche Subject/From/To zu erkennen
        lines = content.split('\n')
        subject = 'No Subject'
        sender = 'Unknown'
        recipient = 'Unknown'
        date = ''
        body_start = 0
        for i, line in enumerate(lines[:10]):  # Erste 10 Zeilen prüfen
            if line.lower().startswith('subject:'):
                subject = line[8:].strip()
                body_start = max(body_start, i + 1)
            elif line.lower().startswith('from:'):
                sender = line[5:].strip()
                body_start = max(body_start, i + 1)
            elif line.lower().startswith('to:'):
                recipient = line[3:].strip()
                body_start = max(body_start, i + 1)
            elif line.lower().startswith('date:'):
                date = line[5:].strip()
                body_start = max(body_start, i + 1)
        # Body ist der Rest
        body = '\n'.join(lines[body_start:])
        body = MailParser.clean_text(body)
        return {
            'subject': subject,
            'sender': sender,
            'recipient': recipient,
            'date': date,
            'body': body,
            'original_format': 'txt'
        }
    @staticmethod
    def parse_file(file_path: Path) -> List[Dict]:
        """Parst eine Mail-Datei basierend auf Endung"""
        suffix = file_path.suffix.lower()
        if suffix == '.eml':
            return [MailParser.parse_eml(file_path)]
        elif suffix == '.mbox':
            return MailParser.parse_mbox(file_path)
        elif suffix == '.txt':
            return [MailParser.parse_txt(file_path)]
        else:
            raise ValueError(f"Unsupported file format: {suffix}")
@@ -0,0 +1,396 @@
 """
 FastAPI Backend für Mail Fine-Tuning App
 Hauptanwendung mit allen API Endpoints
 """
 from fastapi import FastAPI, File, UploadFile, HTTPException, BackgroundTasks
 from fastapi.responses import StreamingResponse, FileResponse
 from fastapi.staticfiles import StaticFiles
 from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel
 from typing import Optional, List
 import asyncio
 import json
 from pathlib import Path
 import shutil
 from data_manager import DataManager
 from mail_parser import MailParser
 from training import MLXTrainer, TrainingConfig
 from inference import ModelInference
 # FastAPI App
 app = FastAPI(title="Mail Fine-Tuning App")
 # CORS
 app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
 )
 # Initialisiere Manager
 data_manager = DataManager("data/mails.db")
 trainer = MLXTrainer("models", "output")
 inference = ModelInference("models", "output")
 # Pydantic Models
 class MailUpdate(BaseModel):
    task_type: Optional[str] = None
    expected_output: Optional[str] = None
    status: Optional[str] = None
    body: Optional[str] = None
 class TrainingStartRequest(BaseModel):
    model_name: str
    learning_rate: float = 1e-5
    epochs: int = 3
    batch_size: int = 4
    lora_rank: int = 8
 class InferenceRequest(BaseModel):
    prompt: str
    model_type: str = 'base'
    max_tokens: int = 512
    temperature: float = 0.7
 class InferenceComparisonRequest(BaseModel):
    task_type: str
    mail_body: str
    max_tokens: int = 512
    temperature: float = 0.7
 # ===== Mail Endpoints =====
@app.post("/api/mails/upload")
 async def upload_mails(files: List[UploadFile] = File(...)):
    """Upload und Parse von Mail-Dateien"""
    results = {
        'success': [],
        'errors': []
    }
    for file in files:
        try:
            # Temporär speichern
            temp_path = Path("data/temp") / file.filename
            temp_path.parent.mkdir(parents=True, exist_ok=True)
            with open(temp_path, 'wb') as f:
                content = await file.read()
                f.write(content)
            # Parse Mails
            parsed_mails = MailParser.parse_file(temp_path)
            # In DB speichern
            for mail in parsed_mails:
                mail_id = data_manager.add_mail(
                    subject=mail['subject'],
                    sender=mail['sender'],
                    recipient=mail['recipient'],
                    date=mail['date'],
                    body=mail['body'],
                    original_format=mail['original_format']
                )
            results['success'].append({
                'filename': file.filename,
                'count': len(parsed_mails)
            })
            # Cleanup
            temp_path.unlink()
        except Exception as e:
            results['errors'].append({
                'filename': file.filename,
                'error': str(e)
            })
    return results
@app.get("/api/mails")
 async def get_mails(status: Optional[str] = None):
    """Liste aller Mails"""
    mails = data_manager.get_all_mails(status_filter=status)
    return {'mails': mails}
@app.get("/api/mails/{mail_id}")
 async def get_mail(mail_id: int):
    """Einzelne Mail abrufen"""
    mail = data_manager.get_mail(mail_id)
    if not mail:
        raise HTTPException(status_code=404, detail="Mail not found")
    return mail
@app.put("/api/mails/{mail_id}")
 async def update_mail(mail_id: int, update: MailUpdate):
    """Mail aktualisieren (Labeling)"""
    success = data_manager.update_mail(
        mail_id=mail_id,
        task_type=update.task_type,
        expected_output=update.expected_output,
        status=update.status,
        body=update.body
    )
    if not success:
        raise HTTPException(status_code=404, detail="Mail not found")
    return {'success': True}
@app.delete("/api/mails/{mail_id}")
 async def delete_mail(mail_id: int):
    """Mail löschen"""
    success = data_manager.delete_mail(mail_id)
    if not success:
        raise HTTPException(status_code=404, detail="Mail not found")
    return {'success': True}
 # ===== Export Endpoints =====
@app.get("/api/export/stats")
 async def get_stats():
    """Statistiken abrufen"""
    stats = data_manager.get_statistics()
    return stats
@app.post("/api/export/jsonl")
 async def export_jsonl(train_split: float = 0.9):
    """Exportiert Training-Daten als JSONL"""
    train_data, val_data = data_manager.export_training_data(train_split)
    if not train_data:
        raise HTTPException(status_code=400, detail="No labeled data available")
    # Speichere Files
    data_dir = Path("data")
    train_file = data_dir / "train.jsonl"
    val_file = data_dir / "val.jsonl"
    train_file_path, val_file_path = trainer.prepare_training_data(
        train_data, val_data, data_dir
    )
    return {
        'success': True,
        'train_samples': len(train_data),
        'val_samples': len(val_data),
        'train_file': str(train_file),
        'val_file': str(val_file)
    }
@app.get("/api/export/download/{file_type}")
 async def download_file(file_type: str):
    """Download JSONL Files"""
    if file_type not in ['train', 'val']:
        raise HTTPException(status_code=400, detail="Invalid file type")
    file_path = Path("data") / f"{file_type}.jsonl"
    if not file_path.exists():
        raise HTTPException(status_code=404, detail="File not found")
    return FileResponse(
        path=file_path,
        filename=f"{file_type}.jsonl",
        media_type='application/json'
    )
 # ===== Model Endpoints =====
@app.get("/api/models")
 async def list_models():
    """Liste verfügbarer Modelle"""
    models = trainer.list_available_models()
    return {'models': models}
@app.post("/api/models/download")
 async def download_model(model_name: str):
    """
    Lädt ein Modell herunter
    Placeholder - würde in echter Implementation huggingface nutzen
    """
    success = trainer.download_model(model_name)
    if not success:
        raise HTTPException(
            status_code=501,
            detail="Model download not implemented. Please download manually."
        )
    return {'success': True}
 # ===== Training Endpoints =====
@app.post("/api/training/start")
 async def start_training(request: TrainingStartRequest, background_tasks: BackgroundTasks):
    """Startet Training"""
    # Hole Training-Daten
    train_data, val_data = data_manager.export_training_data()
    if not train_data:
        raise HTTPException(status_code=400, detail="No labeled data available")
    if len(train_data) < 10:
        raise HTTPException(
            status_code=400,
            detail=f"Not enough training data. Need at least 10, got {len(train_data)}"
        )
    # Training Config
    config = TrainingConfig(
        model_name=request.model_name,
        learning_rate=request.learning_rate,
        epochs=request.epochs,
        batch_size=request.batch_size,
        lora_rank=request.lora_rank
    )
    # Starte Training
    success = trainer.start_training(config, train_data, val_data)
    if not success:
        raise HTTPException(status_code=400, detail="Training already running")
    return {'success': True, 'message': 'Training started'}
@app.post("/api/training/stop")
 async def stop_training():
    """Stoppt Training"""
    success = trainer.stop_training()
    if not success:
        raise HTTPException(status_code=400, detail="No training running")
    return {'success': True, 'message': 'Training stopped'}
@app.get("/api/training/status")
 async def get_training_status():
    """Gibt aktuellen Training-Status zurück"""
    status = trainer.get_status()
    return status
@app.get("/api/training/stream")
 async def stream_training_status():
    """
    Server-Sent Events für Live-Updates
    """
    async def event_generator():
        while True:
            status = trainer.get_status()
            # Sende Status als SSE
            yield f"data: {json.dumps(status)}\n\n"
            # Stop wenn Training fertig
            if not status['is_training'] and status['current_step'] > 0:
                break
            await asyncio.sleep(1)
    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream"
    )
 # ===== Inference Endpoints =====
@app.post("/api/inference/load")
 async def load_model(model_type: str, model_name: str, adapter_path: Optional[str] = None):
    """Lädt ein Modell für Inference"""
    if model_type == 'base':
        success = inference.load_base_model(model_name)
    elif model_type == 'finetuned':
        if not adapter_path:
            raise HTTPException(status_code=400, detail="adapter_path required for finetuned model")
        success = inference.load_finetuned_model(model_name, adapter_path)
    else:
        raise HTTPException(status_code=400, detail="Invalid model_type")
    if not success:
        raise HTTPException(status_code=400, detail="Failed to load model")
    return {'success': True}
@app.get("/api/inference/loaded")
 async def get_loaded_models():
    """Gibt zurück welche Modelle geladen sind"""
    loaded = inference.get_loaded_models()
    return loaded
@app.post("/api/inference/generate")
 async def generate_text(request: InferenceRequest):
    """Generiert Text mit geladenem Modell"""
    result = inference.generate(
        prompt=request.prompt,
        model_type=request.model_type,
        max_tokens=request.max_tokens,
        temperature=request.temperature
    )
    return {'result': result}
@app.post("/api/inference/compare")
 async def compare_models(request: InferenceComparisonRequest):
    """Vergleicht Base und Fine-tuned Model"""
    prompt = inference.format_mail_prompt(
        request.task_type,
        request.mail_body
    )
    result = inference.generate_comparison(
        prompt=prompt,
        max_tokens=request.max_tokens,
        temperature=request.temperature
    )
    return result
@app.get("/api/inference/test-prompts")
 async def get_test_prompts():
    """Gibt vordefinierte Test-Prompts zurück"""
    prompts = inference.get_test_prompts()
    return prompts
 # ===== Static Files =====
 # Serve Frontend
 app.mount("/", StaticFiles(directory="frontend", html=True), name="frontend")
 if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
@@ -0,0 +1,321 @@
 """
 MLX Training Wrapper für Fine-Tuning
 Nutzt mlx-lm für LoRA Fine-Tuning
 """
 import json
 import time
 import psutil
 from pathlib import Path
 from typing import Dict, List, Callable, Optional
 from dataclasses import dataclass
 import threading
 import queue
@dataclass
 class TrainingConfig:
    """Training Konfiguration"""
    model_name: str
    learning_rate: float = 1e-5
    epochs: int = 3
    batch_size: int = 4
    lora_rank: int = 8
    lora_alpha: int = 16
    max_seq_length: int = 2048
    val_every: int = 50
 class TrainingStatus:
    """Verwaltet den aktuellen Training-Status"""
    def __init__(self):
        self.is_training = False
        self.should_stop = False
        self.current_step = 0
        self.total_steps = 0
        self.current_epoch = 0
        self.train_loss = 0.0
        self.val_loss = 0.0
        self.train_loss_history = []
        self.val_loss_history = []
        self.start_time = None
        self.error = None
    def reset(self):
        """Setzt den Status zurück"""
        self.is_training = False
        self.should_stop = False
        self.current_step = 0
        self.total_steps = 0
        self.current_epoch = 0
        self.train_loss = 0.0
        self.val_loss = 0.0
        self.train_loss_history = []
        self.val_loss_history = []
        self.start_time = None
        self.error = None
    def to_dict(self) -> Dict:
        """Konvertiert zu Dictionary für API"""
        eta = None
        if self.is_training and self.current_step > 0 and self.start_time:
            elapsed = time.time() - self.start_time
            steps_remaining = self.total_steps - self.current_step
            eta = int((elapsed / self.current_step) * steps_remaining)
        memory_usage = psutil.virtual_memory().percent
        return {
            'is_training': self.is_training,
            'current_step': self.current_step,
            'total_steps': self.total_steps,
            'current_epoch': self.current_epoch,
            'train_loss': round(self.train_loss, 4) if self.train_loss else None,
            'val_loss': round(self.val_loss, 4) if self.val_loss else None,
            'train_loss_history': [round(l, 4) for l in self.train_loss_history],
            'val_loss_history': [round(l, 4) for l in self.val_loss_history],
            'eta_seconds': eta,
            'memory_usage_percent': memory_usage,
            'error': self.error
        }
 class MLXTrainer:
    """Wrapper für MLX Training"""
    def __init__(self, models_dir: str = "models", output_dir: str = "output"):
        self.models_dir = Path(models_dir)
        self.output_dir = Path(output_dir)
        self.models_dir.mkdir(exist_ok=True)
        self.output_dir.mkdir(exist_ok=True)
        self.status = TrainingStatus()
        self.training_thread = None
    def prepare_training_data(self, train_data: List[Dict],
                            val_data: List[Dict],
                            data_dir: Path) -> tuple[Path, Path]:
        """Konvertiert Daten ins MLX Format (JSONL)"""
        def format_example(item: Dict) -> Dict:
            """Formatiert ein Beispiel im Chat-Format"""
            task_type = item['task_type']
            body = item['body']
            output = item['expected_output']
            # Task-spezifische Prompts
            task_prompts = {
                'Zusammenfassen': 'Fasse folgende E-Mail zusammen:',
                'Antwort schreiben': 'Schreibe eine Antwort auf folgende E-Mail:',
                'Kategorisieren': 'Kategorisiere folgende E-Mail:',
                'Action Items': 'Extrahiere die Action Items aus folgender E-Mail:',
                'Custom': 'Bearbeite folgende E-Mail:'
            }
            instruction = task_prompts.get(task_type, task_prompts['Custom'])
            return {
                'messages': [
                    {
                        'role': 'user',
                        'content': f"{instruction}\n\n{body}"
                    },
                    {
                        'role': 'assistant',
                        'content': output
                    }
                ]
            }
        train_file = data_dir / 'train.jsonl'
        val_file = data_dir / 'val.jsonl'
        # Schreibe Training Data
        with open(train_file, 'w', encoding='utf-8') as f:
            for item in train_data:
                f.write(json.dumps(format_example(item), ensure_ascii=False) + '\n')
        # Schreibe Validation Data
        with open(val_file, 'w', encoding='utf-8') as f:
            for item in val_data:
                f.write(json.dumps(format_example(item), ensure_ascii=False) + '\n')
        return train_file, val_file
    def _run_training(self, config: TrainingConfig,
                     train_file: Path, val_file: Path,
                     output_path: Path):
        """Führt das Training aus (läuft in eigenem Thread)"""
        try:
            # Import hier um MLX nur bei Bedarf zu laden
            from mlx_lm import load, LoRALinear
            from mlx_lm.tuner import train as mlx_train
            import mlx.core as mx
            import mlx.nn as nn
            import mlx.optimizers as optim
            self.status.is_training = True
            self.status.start_time = time.time()
            self.status.error = None
            # Lade Modell
            model_path = self.models_dir / config.model_name
            if not model_path.exists():
                raise FileNotFoundError(f"Model not found: {model_path}")
            # Training durchführen mit mlx-lm
            # Dies ist ein vereinfachtes Beispiel - mlx-lm hat eigene Trainer
            # In der Praxis würde man mlx_lm.tuner verwenden
            # Lade Training Config
            train_config = {
                'model': str(model_path),
                'data': str(train_file),
                'val_data': str(val_file),
                'train': True,
                'iters': config.epochs * 100,  # Approximation
                'val_batches': 10,
                'learning_rate': config.learning_rate,
                'batch_size': config.batch_size,
                'lora_layers': config.lora_rank,
                'adapter_file': str(output_path / 'adapters.npz'),
                'save_every': 50,
                'val_every': config.val_every,
            }
            # Callback für Progress-Updates
            def training_callback(step: int, loss: float, val_loss: Optional[float] = None):
                if self.status.should_stop:
                    return False  # Stop training
                self.status.current_step = step
                self.status.train_loss = loss
                self.status.train_loss_history.append(loss)
                if val_loss is not None:
                    self.status.val_loss = val_loss
                    self.status.val_loss_history.append(val_loss)
                return True
            # Hinweis: Dies ist ein Platzhalter für echtes MLX Training
            # In der Praxis würde man mlx_lm.tuner.train() oder eine
            # eigene Training Loop mit mlx nutzen
            # Simuliere Training für Demo (MUSS durch echtes MLX Training ersetzt werden)
            total_steps = config.epochs * (len(list(open(train_file))) // config.batch_size)
            self.status.total_steps = total_steps
            for epoch in range(config.epochs):
                self.status.current_epoch = epoch + 1
                for step in range(total_steps // config.epochs):
                    if self.status.should_stop:
                        break
                    # Simuliere Training Step
                    self.status.current_step = epoch * (total_steps // config.epochs) + step
                    fake_loss = 2.0 - (self.status.current_step / total_steps) * 1.5
                    self.status.train_loss = fake_loss
                    self.status.train_loss_history.append(fake_loss)
                    # Validation alle N Steps
                    if step % config.val_every == 0:
                        fake_val_loss = 2.2 - (self.status.current_step / total_steps) * 1.4
                        self.status.val_loss = fake_val_loss
                        self.status.val_loss_history.append(fake_val_loss)
                    time.sleep(0.1)  # Simuliere Rechenzeit
                if self.status.should_stop:
                    break
            # Speichere finale Adapter
            # output_path / 'adapters.npz' würde die LoRA Weights enthalten
            self.status.is_training = False
        except Exception as e:
            self.status.error = str(e)
            self.status.is_training = False
    def start_training(self, config: TrainingConfig,
                      train_data: List[Dict],
                      val_data: List[Dict]) -> bool:
        """Startet das Training"""
        if self.status.is_training:
            return False
        # Bereite Daten vor
        data_dir = self.output_dir / f"training_{int(time.time())}"
        data_dir.mkdir(exist_ok=True)
        train_file, val_file = self.prepare_training_data(
            train_data, val_data, data_dir
        )
        # Output-Pfad
        output_path = self.output_dir / f"run_{int(time.time())}"
        output_path.mkdir(exist_ok=True)
        # Reset Status
        self.status.reset()
        # Starte Training in eigenem Thread
        self.training_thread = threading.Thread(
            target=self._run_training,
            args=(config, train_file, val_file, output_path),
            daemon=True
        )
        self.training_thread.start()
        return True
    def stop_training(self) -> bool:
        """Stoppt das laufende Training"""
        if not self.status.is_training:
            return False
        self.status.should_stop = True
        # Warte max 5 Sekunden auf Thread
        if self.training_thread:
            self.training_thread.join(timeout=5)
        return True
    def get_status(self) -> Dict:
        """Gibt aktuellen Status zurück"""
        return self.status.to_dict()
    def list_available_models(self) -> List[str]:
        """Listet verfügbare Modelle auf"""
        if not self.models_dir.exists():
            return []
        models = []
        for path in self.models_dir.iterdir():
            if path.is_dir():
                models.append(path.name)
        return models
    def download_model(self, model_name: str) -> bool:
        """
        Lädt ein Modell herunter
        In der Praxis würde man hier huggingface_hub nutzen
        """
        # Placeholder - würde huggingface_hub.snapshot_download nutzen
        # und dann mit mlx_lm.convert konvertieren
        # Beispiel:
        # from huggingface_hub import snapshot_download
        # from mlx_lm.convert import convert
        #
        # hf_path = snapshot_download(model_name)
        # mlx_path = self.models_dir / model_name
        # convert(hf_path, mlx_path)
        return False  # Nicht implementiert in diesem Beispiel
@@ -0,0 +1,87 @@
 # Beispiel-Mails für Training
 Diese Beispiel-Mails können zum Testen des Mail-Imports verwendet werden.
 ## Enthaltene Beispiele
 1. **test1.txt** - Projekt-Update
   - Typ: Status-Update
   - Empfohlen für: "Zusammenfassen"
 2. **test2.txt** - Kundenanfrage
   - Typ: Support-Anfrage
   - Empfohlen für: "Antwort schreiben"
 3. **test3.txt** - Meeting Notes
   - Typ: Meeting-Protokoll
   - Empfohlen für: "Action Items"
 4. **test4.txt** - Out of Office
   - Typ: Automatische Antwort
   - Empfohlen für: "Kategorisieren" (als "Automatisch" oder "Skip")
 ## Verwendung
 1. Wähle eine oder mehrere Dateien aus
 2. Ziehe sie per Drag & Drop in die App
 3. Die Mails werden automatisch geparst und bereinigt
 4. Gehe zum Labeling und füge die erwarteten Outputs hinzu
 ## Beispiel-Labels
 ### test1.txt (Zusammenfassen)
 ```
 Alex berichtet über erfolgreichen Abschluss der API-Integration mit 40% Performance-Verbesserung.
 Nächste Woche starten Frontend-Anpassungen durch Maria und Tom.
 Go-Live bleibt für Ende März geplant.
 ```
 ### test2.txt (Antwort schreiben)
 ```
 Sehr geehrter Herr Schmidt,
 vielen Dank für Ihre Anfrage zu Rechnung #2847.
 Sie haben recht - hier ist uns ein Fehler unterlaufen. Der korrekte Betrag
 laut Angebot beträgt 1.250€. Wir werden die Rechnung korrigieren und Ihnen
 die berichtigte Version bis morgen zusenden.
 Wir entschuldigen uns für die Unannehmlichkeiten.
 Mit freundlichen Grüßen
 Support-Team
 ```
 ### test3.txt (Action Items)
 ```
 - Sarah: Pressemitteilung vorbereiten (Deadline: Freitag)
 - Marketing-Team: Social Media Content erstellen (nächste Woche)
 - Lisa: Influencer-Kontakte aufnehmen
 - Design-Team: Finale Produktfotos liefern
 - Location für Launch-Event buchen (1. April)
 - Website-Landing-Page live schalten (bis Mittwoch)
 - Feedback an Lisa bis Mittwoch
 ```
 ### test4.txt (Kategorisieren)
 ```
 Kategorie: Automatische Antwort / Out of Office
 Status: Abwesenheit vom 18.03.-25.03.2024
 Vertretung: sarah.koch@company.com (Vertrieb), support@company.com (Support)
 ```
 ## Eigene Mails hinzufügen
 Du kannst auch eigene .txt Dateien erstellen. Format:
 ```
 Subject: Dein Betreff
 From: absender@example.com
 To: empfaenger@example.com
 Date: 2024-03-15
 Hier kommt der Mail-Text...
 ```
 Die ersten Zeilen mit Subject:/From:/To:/Date: sind optional.
 Wenn sie fehlen, wird der gesamte Text als Mail-Body interpretiert.
@@ -0,0 +1,19 @@
 Subject: Q4 Projektupdate
 From: alex@example.com
 To: team@example.com
 Date: 2024-03-15
 Hallo Team,
 ich wollte euch ein kurzes Update zum aktuellen Projektstand geben.
 Wir haben letzte Woche die neue API-Integration abgeschlossen und erfolgreich getestet.
 Die Performance-Tests zeigen eine Verbesserung von 40% gegenüber der alten Implementierung.
 Nächste Woche starten wir mit der Frontend-Anpassung. Maria und Tom werden das Design
 überarbeiten, während ich mich um die Backend-Anbindung kümmere.
 Der Go-Live ist weiterhin für Ende des Monats geplant.
 Beste Grüße
 Alex
@@ -0,0 +1,16 @@
 Subject: Frage zu Invoice #2847
 From: michael.schmidt@example.com
 To: support@company.de
 Date: 2024-03-16
 Hallo,
 ich habe eine Frage zur Rechnung #2847 vom 15. März.
 Der Betrag scheint nicht mit unserem ursprünglichen Angebot übereinzustimmen.
 Laut Angebot sollten es 1.250€ sein, auf der Rechnung stehen aber 1.450€.
 Könnten Sie das bitte prüfen und mir Bescheid geben?
 Vielen Dank
 Michael Schmidt
@@ -0,0 +1,22 @@
 Subject: Meeting Notes - Produktlaunch Vorbereitung
 From: lisa.mueller@startup.io
 To: team@startup.io
 Date: 2024-03-17
 Hi alle,
 hier die wichtigsten Punkte vom heutigen Meeting zum Produktlaunch:
 1. Sarah bereitet die Pressemitteilung vor (Deadline: Freitag)
 2. Marketing-Team erstellt Social Media Content für nächste Woche
 3. Ich kümmere mich um die Influencer-Kontakte
 4. Wir brauchen noch finale Produktfotos vom Design-Team
 5. Launch-Event ist am 1. April - Location muss noch gebucht werden
 6. Website-Landing-Page muss bis Mittwoch live gehen
 Bitte gebt bis Mittwoch Bescheid ob ihr eure Aufgaben schaffen könnt.
 Bei Problemen sofort melden!
 Danke an alle für die tolle Zusammenarbeit!
 Lisa
@@ -0,0 +1,24 @@
 Subject: Automatische Antwort: Out of Office
 From: thomas.weber@company.com
 To: request@company.com
 Date: 2024-03-18
 Guten Tag,
 vielen Dank für Ihre E-Mail.
 Ich bin vom 18.03. bis 25.03.2024 nicht im Büro und habe keinen Zugriff auf meine E-Mails.
 In dringenden Fällen wenden Sie sich bitte an:
 - Vertrieb: sarah.koch@company.com
 - Support: support@company.com
 - Allgemeine Anfragen: info@company.com
 Ich werde Ihre E-Mail nach meiner Rückkehr bearbeiten.
 Mit freundlichen Grüßen
 Thomas Weber
 --
 Diese E-Mail wurde automatisch generiert.
 Bitte antworten Sie nicht direkt auf diese Nachricht.
@@ -0,0 +1,756 @@
 // Mail Fine-Tuning App - Frontend Logic
 const API_BASE = '';
 // State
 let currentMails = [];
 let currentLabelingIndex = 0;
 let stats = {};
 let trainingEventSource = null;
 // ======================
 // Utility Functions
 // ======================
 function showToast(message, type = 'info') {
    const container = document.getElementById('toast-container');
    const toast = document.createElement('div');
    toast.className = `toast ${type}`;
    toast.textContent = message;
    container.appendChild(toast);
    setTimeout(() => {
        toast.remove();
    }, 4000);
 }
 async function apiCall(endpoint, options = {}) {
    try {
        const response = await fetch(API_BASE + endpoint, {
            ...options,
            headers: {
                'Content-Type': 'application/json',
                ...options.headers
            }
        });
        if (!response.ok) {
            const error = await response.json();
            throw new Error(error.detail || 'API Error');
        }
        return await response.json();
    } catch (error) {
        showToast(error.message, 'error');
        throw error;
    }
 }
 // ======================
 // Navigation
 // ======================
 function initNavigation() {
    const navLinks = document.querySelectorAll('.nav-link');
    const views = document.querySelectorAll('.view');
    navLinks.forEach(link => {
        link.addEventListener('click', (e) => {
            e.preventDefault();
            const targetView = link.dataset.view;
            // Update active states
            navLinks.forEach(l => l.classList.remove('active'));
            link.classList.add('active');
            views.forEach(v => v.classList.remove('active'));
            document.getElementById(`${targetView}-view`).classList.add('active');
            // Load data for view
            if (targetView === 'labeling') {
                loadLabelingView();
            } else if (targetView === 'export') {
                loadStats();
            } else if (targetView === 'models') {
                loadModels();
            } else if (targetView === 'training') {
                loadTrainingView();
            }
        });
    });
 }
 // ======================
 // Mail Import
 // ======================
 function initImport() {
    const dropzone = document.getElementById('dropzone');
    const fileInput = document.getElementById('file-input');
    dropzone.addEventListener('click', () => fileInput.click());
    dropzone.addEventListener('dragover', (e) => {
        e.preventDefault();
        dropzone.classList.add('dragover');
    });
    dropzone.addEventListener('dragleave', () => {
        dropzone.classList.remove('dragover');
    });
    dropzone.addEventListener('drop', (e) => {
        e.preventDefault();
        dropzone.classList.remove('dragover');
        handleFiles(e.dataTransfer.files);
    });
    fileInput.addEventListener('change', (e) => {
        handleFiles(e.target.files);
    });
    document.getElementById('refresh-mails').addEventListener('click', loadMails);
    // Initial load
    loadMails();
 }
 async function handleFiles(files) {
    const formData = new FormData();
    for (let file of files) {
        formData.append('files', file);
    }
    try {
        const response = await fetch(API_BASE + '/api/mails/upload', {
            method: 'POST',
            body: formData
        });
        const result = await response.json();
        const successCount = result.success.reduce((sum, r) => sum + r.count, 0);
        showToast(`${successCount} Mails erfolgreich importiert`, 'success');
        if (result.errors.length > 0) {
            showToast(`${result.errors.length} Fehler beim Import`, 'error');
        }
        loadMails();
    } catch (error) {
        showToast('Fehler beim Upload', 'error');
    }
 }
 async function loadMails() {
    try {
        const data = await apiCall('/api/mails');
        currentMails = data.mails;
        document.getElementById('mail-count').textContent = currentMails.length;
        renderMailList(currentMails);
    } catch (error) {
        console.error('Error loading mails:', error);
    }
 }
 function renderMailList(mails) {
    const container = document.getElementById('mail-list');
    if (mails.length === 0) {
        container.innerHTML = '<p style="text-align:center; padding: 2rem;">Keine Mails vorhanden</p>';
        return;
    }
    container.innerHTML = mails.map(mail => `
        <div class="mail-item ${mail.status}">
            <div class="mail-header">
                <div class="mail-subject">${escapeHtml(mail.subject)}</div>
                <div class="mail-meta">${mail.status}</div>
            </div>
            <div class="mail-meta">Von: ${escapeHtml(mail.sender)}</div>
            <div class="mail-body">${escapeHtml(mail.body)}</div>
            <div class="mail-actions">
                <button class="btn btn-secondary" onclick="viewMail(${mail.id})">👁️ Ansehen</button>
                <button class="btn btn-danger" onclick="deleteMail(${mail.id})">🗑️ Löschen</button>
            </div>
        </div>
    `).join('');
 }
 function escapeHtml(text) {
    const div = document.createElement('div');
    div.textContent = text;
    return div.innerHTML;
 }
 async function deleteMail(id) {
    if (!confirm('Mail wirklich löschen?')) return;
    try {
        await apiCall(`/api/mails/${id}`, { method: 'DELETE' });
        showToast('Mail gelöscht', 'success');
        loadMails();
    } catch (error) {
        console.error('Error deleting mail:', error);
    }
 }
 function viewMail(id) {
    const mail = currentMails.find(m => m.id === id);
    if (!mail) return;
    alert(`Betreff: ${mail.subject}\n\nVon: ${mail.sender}\n\n${mail.body}`);
 }
 // ======================
 // Labeling
 // ======================
 function initLabeling() {
    const statusFilter = document.getElementById('status-filter');
    statusFilter.addEventListener('change', loadLabelingView);
    // Keyboard shortcuts
    document.addEventListener('keydown', (e) => {
        const activeView = document.querySelector('.view.active');
        if (activeView.id !== 'labeling-view') return;
        if (e.key.toLowerCase() === 'n') {
            nextMail();
        } else if (e.key.toLowerCase() === 's') {
            saveLabelingMail();
        } else if (e.key.toLowerCase() === 'k') {
            skipMail();
        }
    });
 }
 async function loadLabelingView() {
    const statusFilter = document.getElementById('status-filter').value;
    try {
        const data = await apiCall(`/api/mails?status=${statusFilter || ''}`);
        currentMails = data.mails;
        currentLabelingIndex = 0;
        updateLabelingProgress();
        renderCurrentMail();
    } catch (error) {
        console.error('Error loading labeling view:', error);
    }
 }
 function updateLabelingProgress() {
    const labeled = currentMails.filter(m => m.status === 'labeled').length;
    const total = currentMails.length;
    const percent = total > 0 ? (labeled / total) * 100 : 0;
    document.getElementById('labeling-progress').style.width = `${percent}%`;
    document.getElementById('progress-text').textContent = `${labeled} / ${total} gelabelt`;
 }
 function renderCurrentMail() {
    const container = document.getElementById('labeling-container');
    if (currentMails.length === 0) {
        container.innerHTML = '<p>Keine Mails zum Labeln vorhanden</p>';
        return;
    }
    const mail = currentMails[currentLabelingIndex];
    container.innerHTML = `
        <div class="current-mail">
            <h4>${escapeHtml(mail.subject)}</h4>
            <p><strong>Von:</strong> ${escapeHtml(mail.sender)}</p>
            <p><strong>An:</strong> ${escapeHtml(mail.recipient)}</p>
            <hr style="margin: 1rem 0; border-color: var(--border-color)">
            <div style="white-space: pre-wrap;">${escapeHtml(mail.body)}</div>
        </div>
        <form id="labeling-form">
            <div class="form-group">
                <label>Aufgabentyp:</label>
                <select id="task-type" required>
                    <option value="">-- Wählen --</option>
                    <option value="Zusammenfassen" ${mail.task_type === 'Zusammenfassen' ? 'selected' : ''}>Zusammenfassen</option>
                    <option value="Antwort schreiben" ${mail.task_type === 'Antwort schreiben' ? 'selected' : ''}>Antwort schreiben</option>
                    <option value="Kategorisieren" ${mail.task_type === 'Kategorisieren' ? 'selected' : ''}>Kategorisieren</option>
                    <option value="Action Items" ${mail.task_type === 'Action Items' ? 'selected' : ''}>Action Items</option>
                    <option value="Custom" ${mail.task_type === 'Custom' ? 'selected' : ''}>Custom</option>
                </select>
            </div>
            <div class="form-group">
                <label>Erwarteter Output:</label>
                <textarea id="expected-output" rows="6" required>${mail.expected_output || ''}</textarea>
            </div>
            <div class="form-actions">
                <button type="button" class="btn btn-primary" onclick="saveLabelingMail()">💾 Speichern (S)</button>
                <button type="button" class="btn btn-secondary" onclick="skipMail()">⏭️ Überspringen (K)</button>
                <button type="button" class="btn btn-secondary" onclick="nextMail()">➡️ Nächste (N)</button>
                <span style="margin-left: auto; color: var(--text-secondary);">
                    ${currentLabelingIndex + 1} / ${currentMails.length}
                </span>
            </div>
        </form>
    `;
 }
 async function saveLabelingMail() {
    const mail = currentMails[currentLabelingIndex];
    const taskType = document.getElementById('task-type').value;
    const expectedOutput = document.getElementById('expected-output').value;
    if (!taskType || !expectedOutput) {
        showToast('Bitte alle Felder ausfüllen', 'warning');
        return;
    }
    try {
        await apiCall(`/api/mails/${mail.id}`, {
            method: 'PUT',
            body: JSON.stringify({
                task_type: taskType,
                expected_output: expectedOutput,
                status: 'labeled'
            })
        });
        showToast('Gespeichert', 'success');
        mail.status = 'labeled';
        updateLabelingProgress();
        nextMail();
    } catch (error) {
        console.error('Error saving mail:', error);
    }
 }
 async function skipMail() {
    const mail = currentMails[currentLabelingIndex];
    try {
        await apiCall(`/api/mails/${mail.id}`, {
            method: 'PUT',
            body: JSON.stringify({
                status: 'skip'
            })
        });
        mail.status = 'skip';
        updateLabelingProgress();
        nextMail();
    } catch (error) {
        console.error('Error skipping mail:', error);
    }
 }
 function nextMail() {
    if (currentLabelingIndex < currentMails.length - 1) {
        currentLabelingIndex++;
    } else {
        currentLabelingIndex = 0;
    }
    renderCurrentMail();
 }
 // ======================
 // Export & Stats
 // ======================
 function initExport() {
    document.getElementById('export-jsonl').addEventListener('click', exportJSONL);
 }
 async function loadStats() {
    try {
        stats = await apiCall('/api/export/stats');
        renderStats();
    } catch (error) {
        console.error('Error loading stats:', error);
    }
 }
 function renderStats() {
    const container = document.getElementById('stats-grid');
    container.innerHTML = `
        <div class="stat-card">
            <div class="stat-value">${stats.total || 0}</div>
            <div class="stat-label">Gesamt Mails</div>
        </div>
        <div class="stat-card">
            <div class="stat-value">${stats.labeled || 0}</div>
            <div class="stat-label">Gelabelt</div>
        </div>
        <div class="stat-card">
            <div class="stat-value">${stats.unlabeled || 0}</div>
            <div class="stat-label">Unlabeled</div>
        </div>
        <div class="stat-card">
            <div class="stat-value">${stats.avg_input_length || 0}</div>
            <div class="stat-label">Avg Input Length</div>
        </div>
        <div class="stat-card">
            <div class="stat-value">${stats.avg_output_length || 0}</div>
            <div class="stat-label">Avg Output Length</div>
        </div>
        <div class="stat-card">
            <div class="stat-value">${stats.sufficient_data ? '✅' : '❌'}</div>
            <div class="stat-label">Genug Daten (&gt;50)</div>
        </div>
    `;
 }
 async function exportJSONL() {
    const trainSplit = document.getElementById('train-split').value / 100;
    try {
        const result = await apiCall('/api/export/jsonl', {
            method: 'POST',
            body: JSON.stringify({ train_split: trainSplit })
        });
        const resultDiv = document.getElementById('export-result');
        resultDiv.innerHTML = `
            <p>✅ Export erfolgreich!</p>
            <p>Training Samples: ${result.train_samples}</p>
            <p>Validation Samples: ${result.val_samples}</p>
            <p>
                <a href="/api/export/download/train" class="btn btn-primary" download>📥 train.jsonl</a>
                <a href="/api/export/download/val" class="btn btn-primary" download>📥 val.jsonl</a>
            </p>
        `;
        resultDiv.classList.add('show');
        showToast('JSONL Dateien generiert', 'success');
    } catch (error) {
        console.error('Error exporting JSONL:', error);
    }
 }
 // ======================
 // Models
 // ======================
 async function loadModels() {
    try {
        const data = await apiCall('/api/models');
        renderModels(data.models);
    } catch (error) {
        console.error('Error loading models:', error);
    }
 }
 function renderModels(models) {
    const container = document.getElementById('models-list');
    if (models.length === 0) {
        container.innerHTML = '<p>Keine Modelle vorhanden</p>';
        return;
    }
    container.innerHTML = models.map(model => `
        <div class="model-item">
            <span>📦 ${model}</span>
            <span style="color: var(--accent-success);">✓ Verfügbar</span>
        </div>
    `).join('');
 }
 // ======================
 // Training
 // ======================
 function initTraining() {
    const lrSlider = document.getElementById('learning-rate');
    const epochsSlider = document.getElementById('epochs');
    lrSlider.addEventListener('input', (e) => {
        const value = Math.pow(10, parseFloat(e.target.value));
        document.getElementById('lr-value').textContent = value.toExponential(0);
    });
    epochsSlider.addEventListener('input', (e) => {
        document.getElementById('epochs-value').textContent = e.target.value;
    });
    document.getElementById('training-form').addEventListener('submit', startTraining);
    document.getElementById('stop-training').addEventListener('click', stopTraining);
 }
 async function loadTrainingView() {
    // Load available models
    try {
        const data = await apiCall('/api/models');
        const select = document.getElementById('training-model');
        select.innerHTML = '<option value="">-- Modell wählen --</option>' +
            data.models.map(m => `<option value="${m}">${m}</option>`).join('');
    } catch (error) {
        console.error('Error loading models:', error);
    }
    // Get current status
    updateTrainingStatus();
 }
 async function startTraining(e) {
    e.preventDefault();
    const modelName = document.getElementById('training-model').value;
    const learningRate = Math.pow(10, parseFloat(document.getElementById('learning-rate').value));
    const epochs = parseInt(document.getElementById('epochs').value);
    const batchSize = parseInt(document.getElementById('batch-size').value);
    const loraRank = parseInt(document.getElementById('lora-rank').value);
    if (!modelName) {
        showToast('Bitte Modell wählen', 'warning');
        return;
    }
    try {
        await apiCall('/api/training/start', {
            method: 'POST',
            body: JSON.stringify({
                model_name: modelName,
                learning_rate: learningRate,
                epochs: epochs,
                batch_size: batchSize,
                lora_rank: loraRank
            })
        });
        showToast('Training gestartet', 'success');
        document.getElementById('start-training').disabled = true;
        document.getElementById('stop-training').disabled = false;
        // Start SSE stream
        startTrainingStream();
    } catch (error) {
        console.error('Error starting training:', error);
    }
 }
 async function stopTraining() {
    try {
        await apiCall('/api/training/stop', { method: 'POST' });
        showToast('Training gestoppt', 'warning');
        document.getElementById('start-training').disabled = false;
        document.getElementById('stop-training').disabled = true;
        if (trainingEventSource) {
            trainingEventSource.close();
        }
    } catch (error) {
        console.error('Error stopping training:', error);
    }
 }
 function startTrainingStream() {
    if (trainingEventSource) {
        trainingEventSource.close();
    }
    trainingEventSource = new EventSource('/api/training/stream');
    trainingEventSource.onmessage = (event) => {
        const status = JSON.parse(event.data);
        updateTrainingStatusUI(status);
        if (!status.is_training && status.current_step > 0) {
            trainingEventSource.close();
            document.getElementById('start-training').disabled = false;
            document.getElementById('stop-training').disabled = true;
            showToast('Training abgeschlossen', 'success');
        }
    };
    trainingEventSource.onerror = () => {
        trainingEventSource.close();
    };
 }
 async function updateTrainingStatus() {
    try {
        const status = await apiCall('/api/training/status');
        updateTrainingStatusUI(status);
        if (status.is_training) {
            document.getElementById('start-training').disabled = true;
            document.getElementById('stop-training').disabled = false;
            startTrainingStream();
        }
    } catch (error) {
        console.error('Error updating status:', error);
    }
 }
 function updateTrainingStatusUI(status) {
    const container = document.getElementById('training-status');
    if (!status.is_training && status.current_step === 0) {
        container.innerHTML = '<p>Kein Training aktiv</p>';
        return;
    }
    const eta = status.eta_seconds ? `${Math.floor(status.eta_seconds / 60)}m ${status.eta_seconds % 60}s` : 'N/A';
    container.innerHTML = `
        <div class="status-grid">
            <div class="status-item">
                <label>Status</label>
                <div class="value">${status.is_training ? '🟢 Running' : '⏸️ Stopped'}</div>
            </div>
            <div class="status-item">
                <label>Step</label>
                <div class="value">${status.current_step} / ${status.total_steps}</div>
            </div>
            <div class="status-item">
                <label>Epoch</label>
                <div class="value">${status.current_epoch}</div>
            </div>
            <div class="status-item">
                <label>Train Loss</label>
                <div class="value">${status.train_loss || 'N/A'}</div>
            </div>
            <div class="status-item">
                <label>Val Loss</label>
                <div class="value">${status.val_loss || 'N/A'}</div>
            </div>
            <div class="status-item">
                <label>ETA</label>
                <div class="value">${eta}</div>
            </div>
            <div class="status-item">
                <label>Memory</label>
                <div class="value">${status.memory_usage_percent}%</div>
            </div>
        </div>
    `;
    // Update charts (simple implementation without chart library)
    updateChart('train-loss-chart', status.train_loss_history);
    updateChart('val-loss-chart', status.val_loss_history);
 }
 function updateChart(canvasId, data) {
    // Simplified chart rendering (without external library)
    const canvas = document.getElementById(canvasId);
    if (!canvas) return;
    const ctx = canvas.getContext('2d');
    canvas.width = canvas.offsetWidth;
    canvas.height = 200;
    ctx.clearRect(0, 0, canvas.width, canvas.height);
    if (!data || data.length === 0) return;
    const padding = 20;
    const width = canvas.width - 2 * padding;
    const height = canvas.height - 2 * padding;
    const maxVal = Math.max(...data);
    const minVal = Math.min(...data);
    const range = maxVal - minVal || 1;
    ctx.strokeStyle = '#4a9eff';
    ctx.lineWidth = 2;
    ctx.beginPath();
    data.forEach((val, i) => {
        const x = padding + (i / (data.length - 1)) * width;
        const y = padding + height - ((val - minVal) / range) * height;
        if (i === 0) {
            ctx.moveTo(x, y);
        } else {
            ctx.lineTo(x, y);
        }
    });
    ctx.stroke();
 }
 // ======================
 // Evaluation
 // ======================
 function initEvaluation() {
    document.getElementById('load-test-prompt').addEventListener('click', loadTestPrompt);
    document.getElementById('run-comparison').addEventListener('click', runComparison);
 }
 async function loadTestPrompt() {
    const taskType = document.getElementById('eval-task-type').value;
    try {
        const prompts = await apiCall('/api/inference/test-prompts');
        const prompt = prompts[taskType];
        if (prompt) {
            // Extract mail body from prompt
            const parts = prompt.split('\n\n');
            document.getElementById('eval-mail-text').value = parts.slice(1).join('\n\n');
            showToast('Test-Beispiel geladen', 'success');
        }
    } catch (error) {
        console.error('Error loading test prompt:', error);
    }
 }
 async function runComparison() {
    const taskType = document.getElementById('eval-task-type').value;
    const mailBody = document.getElementById('eval-mail-text').value;
    if (!mailBody) {
        showToast('Bitte Mail-Text eingeben', 'warning');
        return;
    }
    document.getElementById('base-result').textContent = 'Generiere...';
    document.getElementById('finetuned-result').textContent = 'Generiere...';
    try {
        const result = await apiCall('/api/inference/compare', {
            method: 'POST',
            body: JSON.stringify({
                task_type: taskType,
                mail_body: mailBody
            })
        });
        document.getElementById('base-result').textContent = result.base || 'Modell nicht geladen';
        document.getElementById('finetuned-result').textContent = result.finetuned || 'Modell nicht geladen';
        showToast('Vergleich abgeschlossen', 'success');
    } catch (error) {
        console.error('Error running comparison:', error);
        document.getElementById('base-result').textContent = 'Fehler';
        document.getElementById('finetuned-result').textContent = 'Fehler';
    }
 }
 // ======================
 // Init
 // ======================
 document.addEventListener('DOMContentLoaded', () => {
    initNavigation();
    initImport();
    initLabeling();
    initExport();
    initTraining();
    initEvaluation();
 });
@@ -0,0 +1,254 @@
 <!DOCTYPE html>
 <html lang="de">
 <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Mail Fine-Tuning App</title>
    <link rel="stylesheet" href="style.css">
 </head>
 <body>
    <div class="app-container">
        <!-- Sidebar Navigation -->
        <nav class="sidebar">
            <h1>Mail Fine-Tuning</h1>
            <ul class="nav-menu">
                <li><a href="#" data-view="import" class="nav-link active">📥 Mail Import</a></li>
                <li><a href="#" data-view="labeling" class="nav-link">🏷️ Labeling</a></li>
                <li><a href="#" data-view="export" class="nav-link">📊 Export & Stats</a></li>
                <li><a href="#" data-view="models" class="nav-link">🤖 Modelle</a></li>
                <li><a href="#" data-view="training" class="nav-link">🎯 Training</a></li>
                <li><a href="#" data-view="evaluation" class="nav-link">🧪 Evaluation</a></li>
            </ul>
        </nav>
        <!-- Main Content -->
        <main class="main-content">
            <!-- Import View -->
            <div id="import-view" class="view active">
                <h2>Mail Import</h2>
                <div class="upload-section">
                    <div class="dropzone" id="dropzone">
                        <p>📂 Dateien hier ablegen oder klicken</p>
                        <p class="hint">Unterstützt: .eml, .mbox, .txt</p>
                        <input type="file" id="file-input" multiple accept=".eml,.mbox,.txt" hidden>
                    </div>
                </div>
                <div class="mail-list-section">
                    <div class="section-header">
                        <h3>Importierte Mails (<span id="mail-count">0</span>)</h3>
                        <button id="refresh-mails" class="btn btn-secondary">🔄 Aktualisieren</button>
                    </div>
                    <div id="mail-list" class="mail-list">
                        <!-- Mails werden hier eingefügt -->
                    </div>
                </div>
            </div>
            <!-- Labeling View -->
            <div id="labeling-view" class="view">
                <div class="section-header">
                    <h2>Mail Labeling</h2>
                    <div class="filter-controls">
                        <select id="status-filter">
                            <option value="">Alle anzeigen</option>
                            <option value="unlabeled" selected>Nur Unlabeled</option>
                            <option value="labeled">Nur Labeled</option>
                            <option value="skip">Übersprungen</option>
                        </select>
                    </div>
                </div>
                <div class="progress-bar">
                    <div class="progress-fill" id="labeling-progress"></div>
                    <span class="progress-text" id="progress-text">0 / 0 gelabelt</span>
                </div>
                <div class="keyboard-hints">
                    Shortcuts: <kbd>N</kbd> Nächste | <kbd>S</kbd> Speichern | <kbd>K</kbd> Skip
                </div>
                <div id="labeling-container">
                    <!-- Labeling Interface wird hier geladen -->
                </div>
            </div>
            <!-- Export View -->
            <div id="export-view" class="view">
                <h2>Daten Export & Statistiken</h2>
                <div class="stats-grid" id="stats-grid">
                    <!-- Stats werden hier eingefügt -->
                </div>
                <div class="export-section">
                    <h3>Training-Daten exportieren</h3>
                    <div class="export-controls">
                        <label>
                            Train/Val Split:
                            <input type="number" id="train-split" value="90" min="50" max="95" step="5">%
                        </label>
                        <button id="export-jsonl" class="btn btn-primary">📦 JSONL generieren</button>
                    </div>
                    <div id="export-result"></div>
                </div>
            </div>
            <!-- Models View -->
            <div id="models-view" class="view">
                <h2>Modell-Verwaltung</h2>
                <div class="model-section">
                    <h3>Verfügbare Modelle</h3>
                    <div id="models-list" class="models-list">
                        <!-- Modelle werden hier geladen -->
                    </div>
                    <div class="model-download">
                        <h3>Modell herunterladen</h3>
                        <p class="info-text">
                            Modelle müssen manuell heruntergeladen werden. Empfohlen:
                        </p>
                        <ul>
                            <li>mlx-community/Mistral-7B-Instruct-v0.3-4bit</li>
                            <li>mlx-community/Meta-Llama-3-8B-Instruct-4bit</li>
                        </ul>
                        <p class="code-example">
                            huggingface-cli download [model-name] --local-dir models/[model-name]
                        </p>
                    </div>
                </div>
            </div>
            <!-- Training View -->
            <div id="training-view" class="view">
                <h2>Training</h2>
                <div class="training-config">
                    <h3>Konfiguration</h3>
                    <form id="training-form">
                        <div class="form-group">
                            <label>Modell:</label>
                            <select id="training-model" required>
                                <option value="">-- Modell wählen --</option>
                            </select>
                        </div>
                        <div class="form-group">
                            <label>
                                Learning Rate: <span id="lr-value">1e-5</span>
                            </label>
                            <input type="range" id="learning-rate"
                                   min="-6" max="-4" step="0.1" value="-5">
                        </div>
                        <div class="form-group">
                            <label>
                                Epochs: <span id="epochs-value">3</span>
                            </label>
                            <input type="range" id="epochs"
                                   min="1" max="10" value="3">
                        </div>
                        <div class="form-group">
                            <label>Batch Size:</label>
                            <select id="batch-size">
                                <option value="1">1</option>
                                <option value="2">2</option>
                                <option value="4" selected>4</option>
                                <option value="8">8</option>
                            </select>
                        </div>
                        <div class="form-group">
                            <label>LoRA Rank:</label>
                            <select id="lora-rank">
                                <option value="4">4</option>
                                <option value="8" selected>8</option>
                                <option value="16">16</option>
                                <option value="32">32</option>
                            </select>
                        </div>
                        <div class="form-actions">
                            <button type="submit" class="btn btn-primary" id="start-training">
                                ▶️ Training starten
                            </button>
                            <button type="button" class="btn btn-danger" id="stop-training" disabled>
                                ⏹️ Training stoppen
                            </button>
                        </div>
                    </form>
                </div>
                <div class="training-status" id="training-status">
                    <!-- Training Status wird hier angezeigt -->
                </div>
                <div class="training-charts">
                    <div class="chart-container">
                        <h4>Training Loss</h4>
                        <canvas id="train-loss-chart"></canvas>
                    </div>
                    <div class="chart-container">
                        <h4>Validation Loss</h4>
                        <canvas id="val-loss-chart"></canvas>
                    </div>
                </div>
            </div>
            <!-- Evaluation View -->
            <div id="evaluation-view" class="view">
                <h2>Modell Evaluation</h2>
                <div class="eval-controls">
                    <h3>Chat Interface</h3>
                    <div class="form-group">
                        <label>Task Type:</label>
                        <select id="eval-task-type">
                            <option value="Zusammenfassen">Zusammenfassen</option>
                            <option value="Antwort schreiben">Antwort schreiben</option>
                            <option value="Kategorisieren">Kategorisieren</option>
                            <option value="Action Items">Action Items</option>
                            <option value="Custom">Custom</option>
                        </select>
                    </div>
                    <div class="form-group">
                        <label>Mail-Text:</label>
                        <textarea id="eval-mail-text" rows="6" placeholder="Mail-Text hier eingeben..."></textarea>
                    </div>
                    <div class="form-group">
                        <button id="load-test-prompt" class="btn btn-secondary">📝 Test-Beispiel laden</button>
                        <button id="run-comparison" class="btn btn-primary">🔍 Vergleich starten</button>
                    </div>
                </div>
                <div class="comparison-results">
                    <div class="result-box">
                        <h4>Base Model</h4>
                        <div id="base-result" class="result-content">
                            Noch kein Ergebnis
                        </div>
                    </div>
                    <div class="result-box">
                        <h4>Fine-tuned Model</h4>
                        <div id="finetuned-result" class="result-content">
                            Noch kein Ergebnis
                        </div>
                    </div>
                </div>
            </div>
        </main>
    </div>
    <!-- Toast Notifications -->
    <div id="toast-container"></div>
    <script src="app.js"></script>
 </body>
 </html>
@@ -0,0 +1,600 @@
 /* Mail Fine-Tuning App Styles */
 :root {
    --bg-primary: #1a1a1a;
    --bg-secondary: #2d2d2d;
    --bg-tertiary: #3a3a3a;
    --text-primary: #e0e0e0;
    --text-secondary: #b0b0b0;
    --accent-primary: #4a9eff;
    --accent-success: #4caf50;
    --accent-warning: #ff9800;
    --accent-danger: #f44336;
    --border-color: #444;
 }
 * {
    margin: 0;
    padding: 0;
    box-sizing: border-box;
 }
 body {
    font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
    background: var(--bg-primary);
    color: var(--text-primary);
    line-height: 1.6;
 }
 .app-container {
    display: flex;
    height: 100vh;
    overflow: hidden;
 }
 /* Sidebar */
 .sidebar {
    width: 250px;
    background: var(--bg-secondary);
    padding: 2rem 1rem;
    border-right: 1px solid var(--border-color);
 }
 .sidebar h1 {
    font-size: 1.5rem;
    margin-bottom: 2rem;
    color: var(--accent-primary);
 }
 .nav-menu {
    list-style: none;
 }
 .nav-link {
    display: block;
    padding: 0.75rem 1rem;
    color: var(--text-secondary);
    text-decoration: none;
    border-radius: 4px;
    margin-bottom: 0.5rem;
    transition: all 0.2s;
 }
 .nav-link:hover {
    background: var(--bg-tertiary);
    color: var(--text-primary);
 }
 .nav-link.active {
    background: var(--accent-primary);
    color: white;
 }
 /* Main Content */
 .main-content {
    flex: 1;
    overflow-y: auto;
    padding: 2rem;
 }
 .view {
    display: none;
 }
 .view.active {
    display: block;
 }
 h2 {
    margin-bottom: 1.5rem;
    color: var(--text-primary);
 }
 h3 {
    margin-bottom: 1rem;
    color: var(--text-primary);
 }
 /* Buttons */
 .btn {
    padding: 0.6rem 1.2rem;
    border: none;
    border-radius: 4px;
    cursor: pointer;
    font-size: 0.9rem;
    transition: all 0.2s;
 }
 .btn-primary {
    background: var(--accent-primary);
    color: white;
 }
 .btn-primary:hover {
    background: #3a8eef;
 }
 .btn-secondary {
    background: var(--bg-tertiary);
    color: var(--text-primary);
 }
 .btn-secondary:hover {
    background: #4a4a4a;
 }
 .btn-success {
    background: var(--accent-success);
    color: white;
 }
 .btn-danger {
    background: var(--accent-danger);
    color: white;
 }
 .btn:disabled {
    opacity: 0.5;
    cursor: not-allowed;
 }
 /* Upload Section */
 .dropzone {
    border: 2px dashed var(--border-color);
    border-radius: 8px;
    padding: 3rem;
    text-align: center;
    cursor: pointer;
    transition: all 0.2s;
    margin-bottom: 2rem;
 }
 .dropzone:hover {
    border-color: var(--accent-primary);
    background: var(--bg-secondary);
 }
 .dropzone.dragover {
    border-color: var(--accent-primary);
    background: var(--bg-tertiary);
 }
 .hint {
    font-size: 0.85rem;
    color: var(--text-secondary);
    margin-top: 0.5rem;
 }
 /* Section Header */
 .section-header {
    display: flex;
    justify-content: space-between;
    align-items: center;
    margin-bottom: 1rem;
 }
 /* Mail List */
 .mail-list {
    background: var(--bg-secondary);
    border-radius: 8px;
    padding: 1rem;
    max-height: 500px;
    overflow-y: auto;
 }
 .mail-item {
    background: var(--bg-tertiary);
    padding: 1rem;
    margin-bottom: 0.5rem;
    border-radius: 4px;
    border-left: 3px solid transparent;
 }
 .mail-item.labeled {
    border-left-color: var(--accent-success);
 }
 .mail-item.unlabeled {
    border-left-color: var(--accent-warning);
 }
 .mail-item.skip {
    border-left-color: var(--text-secondary);
 }
 .mail-header {
    display: flex;
    justify-content: space-between;
    margin-bottom: 0.5rem;
 }
 .mail-subject {
    font-weight: bold;
    color: var(--text-primary);
 }
 .mail-meta {
    font-size: 0.85rem;
    color: var(--text-secondary);
 }
 .mail-body {
    font-size: 0.9rem;
    color: var(--text-secondary);
    overflow: hidden;
    text-overflow: ellipsis;
    display: -webkit-box;
    -webkit-line-clamp: 2;
    -webkit-box-orient: vertical;
 }
 .mail-actions {
    margin-top: 0.5rem;
    display: flex;
    gap: 0.5rem;
 }
 .mail-actions button {
    padding: 0.4rem 0.8rem;
    font-size: 0.8rem;
 }
 /* Labeling Interface */
 #labeling-container {
    background: var(--bg-secondary);
    border-radius: 8px;
    padding: 2rem;
    margin-top: 1rem;
 }
 .current-mail {
    background: var(--bg-tertiary);
    padding: 1.5rem;
    border-radius: 4px;
    margin-bottom: 1.5rem;
 }
 .form-group {
    margin-bottom: 1.5rem;
 }
 .form-group label {
    display: block;
    margin-bottom: 0.5rem;
    color: var(--text-primary);
    font-weight: 500;
 }
 .form-group input,
 .form-group select,
 .form-group textarea {
    width: 100%;
    padding: 0.6rem;
    background: var(--bg-primary);
    border: 1px solid var(--border-color);
    border-radius: 4px;
    color: var(--text-primary);
    font-family: inherit;
 }
 .form-group textarea {
    resize: vertical;
    min-height: 100px;
 }
 .form-actions {
    display: flex;
    gap: 1rem;
    margin-top: 1rem;
 }
 /* Progress Bar */
 .progress-bar {
    background: var(--bg-secondary);
    border-radius: 4px;
    height: 30px;
    position: relative;
    margin-bottom: 1rem;
    overflow: hidden;
 }
 .progress-fill {
    background: var(--accent-primary);
    height: 100%;
    transition: width 0.3s;
 }
 .progress-text {
    position: absolute;
    top: 50%;
    left: 50%;
    transform: translate(-50%, -50%);
    font-weight: bold;
    color: var(--text-primary);
 }
 /* Keyboard Hints */
 .keyboard-hints {
    font-size: 0.85rem;
    color: var(--text-secondary);
    margin-bottom: 1rem;
 }
 kbd {
    background: var(--bg-tertiary);
    padding: 0.2rem 0.5rem;
    border-radius: 3px;
    border: 1px solid var(--border-color);
    font-family: monospace;
 }
 /* Stats Grid */
 .stats-grid {
    display: grid;
    grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
    gap: 1rem;
    margin-bottom: 2rem;
 }
 .stat-card {
    background: var(--bg-secondary);
    padding: 1.5rem;
    border-radius: 8px;
    text-align: center;
 }
 .stat-value {
    font-size: 2rem;
    font-weight: bold;
    color: var(--accent-primary);
 }
 .stat-label {
    color: var(--text-secondary);
    font-size: 0.9rem;
 }
 /* Export Section */
 .export-section {
    background: var(--bg-secondary);
    padding: 1.5rem;
    border-radius: 8px;
 }
 .export-controls {
    display: flex;
    gap: 1rem;
    align-items: center;
    margin-bottom: 1rem;
 }
 .export-controls input {
    width: 80px;
    padding: 0.4rem;
    background: var(--bg-primary);
    border: 1px solid var(--border-color);
    color: var(--text-primary);
    border-radius: 4px;
 }
 #export-result {
    margin-top: 1rem;
    padding: 1rem;
    background: var(--bg-tertiary);
    border-radius: 4px;
    display: none;
 }
 #export-result.show {
    display: block;
 }
 /* Models List */
 .models-list {
    background: var(--bg-secondary);
    padding: 1rem;
    border-radius: 8px;
    margin-bottom: 2rem;
 }
 .model-item {
    background: var(--bg-tertiary);
    padding: 1rem;
    margin-bottom: 0.5rem;
    border-radius: 4px;
    display: flex;
    justify-content: space-between;
    align-items: center;
 }
 .model-download {
    background: var(--bg-secondary);
    padding: 1.5rem;
    border-radius: 8px;
 }
 .info-text {
    color: var(--text-secondary);
    margin-bottom: 1rem;
 }
 .code-example {
    background: var(--bg-primary);
    padding: 1rem;
    border-radius: 4px;
    font-family: monospace;
    color: var(--accent-primary);
    margin-top: 1rem;
 }
 /* Training Status */
 .training-status {
    background: var(--bg-secondary);
    padding: 1.5rem;
    border-radius: 8px;
    margin: 1.5rem 0;
 }
 .status-grid {
    display: grid;
    grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
    gap: 1rem;
 }
 .status-item {
    background: var(--bg-tertiary);
    padding: 1rem;
    border-radius: 4px;
 }
 .status-item label {
    display: block;
    color: var(--text-secondary);
    font-size: 0.85rem;
    margin-bottom: 0.3rem;
 }
 .status-item .value {
    font-size: 1.2rem;
    font-weight: bold;
    color: var(--accent-primary);
 }
 /* Training Charts */
 .training-charts {
    display: grid;
    grid-template-columns: 1fr 1fr;
    gap: 1.5rem;
    margin-top: 1.5rem;
 }
 .chart-container {
    background: var(--bg-secondary);
    padding: 1.5rem;
    border-radius: 8px;
 }
 .chart-container h4 {
    margin-bottom: 1rem;
    color: var(--text-primary);
 }
 canvas {
    width: 100% !important;
    height: 200px !important;
 }
 /* Evaluation */
 .comparison-results {
    display: grid;
    grid-template-columns: 1fr 1fr;
    gap: 1.5rem;
    margin-top: 1.5rem;
 }
 .result-box {
    background: var(--bg-secondary);
    padding: 1.5rem;
    border-radius: 8px;
 }
 .result-content {
    background: var(--bg-primary);
    padding: 1rem;
    border-radius: 4px;
    min-height: 150px;
    white-space: pre-wrap;
    font-family: monospace;
    font-size: 0.9rem;
 }
 /* Filter Controls */
 .filter-controls {
    display: flex;
    gap: 1rem;
 }
 .filter-controls select {
    padding: 0.5rem;
    background: var(--bg-tertiary);
    border: 1px solid var(--border-color);
    color: var(--text-primary);
    border-radius: 4px;
 }
 /* Toast Notifications */
 #toast-container {
    position: fixed;
    top: 1rem;
    right: 1rem;
    z-index: 1000;
 }
 .toast {
    background: var(--bg-secondary);
    border: 1px solid var(--border-color);
    border-left: 4px solid var(--accent-primary);
    padding: 1rem 1.5rem;
    border-radius: 4px;
    margin-bottom: 0.5rem;
    min-width: 300px;
    animation: slideIn 0.3s ease;
 }
 .toast.success {
    border-left-color: var(--accent-success);
 }
 .toast.error {
    border-left-color: var(--accent-danger);
 }
 .toast.warning {
    border-left-color: var(--accent-warning);
 }
@keyframes slideIn {
    from {
        transform: translateX(400px);
        opacity: 0;
    }
    to {
        transform: translateX(0);
        opacity: 1;
    }
 }
 /* Scrollbar */
 ::-webkit-scrollbar {
    width: 8px;
    height: 8px;
 }
 ::-webkit-scrollbar-track {
    background: var(--bg-secondary);
 }
 ::-webkit-scrollbar-thumb {
    background: var(--bg-tertiary);
    border-radius: 4px;
 }
 ::-webkit-scrollbar-thumb:hover {
    background: #4a4a4a;
 }
 /* Responsive */
@media (max-width: 768px) {
    .sidebar {
        width: 200px;
    }
    .comparison-results,
    .training-charts {
        grid-template-columns: 1fr;
    }
    .stats-grid {
        grid-template-columns: 1fr;
    }
 }
@@ -0,0 +1,24 @@
 # Mail Fine-Tuning App Dependencies
 # Web Framework
 fastapi==0.109.0
 uvicorn[standard]==0.27.0
 python-multipart==0.0.6
 # ML Framework (Apple Silicon)
 mlx==0.6.0
 mlx-lm==0.8.0
 # Mail Parsing
 beautifulsoup4==4.12.3
 chardet==5.2.0
 # Database
 aiosqlite==0.19.0
 # Utilities
 aiofiles==23.2.1
 psutil==5.9.8
 # Optional but recommended
 huggingface-hub==0.20.3
@@ -0,0 +1,35 @@
 #!/bin/bash
 # Mail Fine-Tuning App Startup Script
 echo "🚀 Starting Mail Fine-Tuning App..."
 echo ""
 # Check if venv exists
 if [ ! -d "venv" ]; then
    echo "❌ Virtual environment not found!"
    echo "Please run: python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt"
    exit 1
 fi
 # Activate venv
 source venv/bin/activate
 # Check if dependencies are installed
 if ! python -c "import fastapi" 2>/dev/null; then
    echo "❌ Dependencies not installed!"
    echo "Please run: pip install -r requirements.txt"
    exit 1
 fi
 # Create necessary directories
 mkdir -p data models output
 # Start server
 echo "✅ Starting server on http://localhost:8000"
 echo ""
 echo "Press Ctrl+C to stop"
 echo ""
 cd backend
 python main.py