Add complete Mail Fine-Tuning Web-App for macOS Apple Silicon
Implemented a full-stack web application for fine-tuning LLMs on email data, optimized for Apple Silicon (M4 Pro with 24GB RAM). Features: - Mail import with drag & drop support (.mbox, .eml, .txt) - Automated mail cleaning and preprocessing - Interactive labeling interface with keyboard shortcuts - Training data export to JSONL format - MLX-based LoRA fine-tuning with live updates - Model evaluation and comparison interface - Server-Sent Events for real-time training progress - Dark theme UI optimized for extended use Technical Stack: - Backend: FastAPI with SQLite database - Frontend: Vanilla HTML/CSS/JavaScript (no external dependencies) - ML Framework: MLX for Apple Silicon optimization - Models: Support for Mistral 7B and Llama 3 8B via MLX Components: - data_manager.py: SQLite operations for mail storage and labeling - mail_parser.py: Parser for multiple mail formats with cleaning - training.py: MLX training wrapper with LoRA support - inference.py: Model loading and inference for evaluation - main.py: FastAPI backend with REST API and SSE - Frontend: Complete UI with all features Documentation: - Comprehensive README with installation and usage guide - Quick-start guide for rapid setup - Example mails for testing - Troubleshooting and best practices Ready for local deployment and fine-tuning workflows.
This commit is contained in:
@@ -0,0 +1,36 @@
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
venv/
|
||||
env/
|
||||
ENV/
|
||||
|
||||
# Data
|
||||
data/*.db
|
||||
data/*.jsonl
|
||||
data/temp/
|
||||
|
||||
# Models
|
||||
models/*
|
||||
!models/.gitkeep
|
||||
|
||||
# Training outputs
|
||||
output/*
|
||||
!output/.gitkeep
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Logs
|
||||
*.log
|
||||
@@ -0,0 +1,209 @@
|
||||
# Quick Start Guide
|
||||
|
||||
Schnellstart-Anleitung für die Mail Fine-Tuning App.
|
||||
|
||||
## 1. Installation (5 Minuten)
|
||||
|
||||
```bash
|
||||
# 1. Virtual Environment erstellen
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
|
||||
# 2. Dependencies installieren
|
||||
pip install -r requirements.txt
|
||||
|
||||
# 3. Modell herunterladen (ca. 4GB, dauert je nach Internetverbindung)
|
||||
huggingface-cli download mlx-community/Mistral-7B-Instruct-v0.3-4bit \
|
||||
--local-dir models/Mistral-7B-Instruct-v0.3-4bit
|
||||
```
|
||||
|
||||
## 2. Server starten
|
||||
|
||||
```bash
|
||||
./start.sh
|
||||
```
|
||||
|
||||
Oder manuell:
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
cd backend
|
||||
python main.py
|
||||
```
|
||||
|
||||
App öffnen: **http://localhost:8000**
|
||||
|
||||
## 3. Erste Schritte (10 Minuten)
|
||||
|
||||
### Schritt 1: Test-Mails erstellen
|
||||
|
||||
Erstelle eine Datei `test.txt` mit einer Beispiel-Mail:
|
||||
|
||||
```
|
||||
Subject: Projekt Update
|
||||
From: max@example.com
|
||||
To: team@example.com
|
||||
|
||||
Hallo Team,
|
||||
|
||||
das neue Feature ist fertig und bereit für Testing.
|
||||
Ich habe die API-Integration abgeschlossen und alle Tests laufen durch.
|
||||
|
||||
Bitte reviewt den Code bis Freitag.
|
||||
|
||||
Grüße
|
||||
Max
|
||||
```
|
||||
|
||||
### Schritt 2: Mails importieren
|
||||
|
||||
1. Öffne http://localhost:8000
|
||||
2. Ziehe `test.txt` in den Upload-Bereich
|
||||
3. Mail erscheint in der Liste
|
||||
|
||||
### Schritt 3: Erste Mail labeln
|
||||
|
||||
1. Klicke auf "Labeling" in der Sidebar
|
||||
2. Wähle **Aufgabentyp**: "Zusammenfassen"
|
||||
3. Gib **erwarteten Output** ein:
|
||||
```
|
||||
Max hat das neue Feature fertiggestellt und alle Tests sind erfolgreich.
|
||||
Das Team soll den Code bis Freitag reviewen.
|
||||
```
|
||||
4. Klicke "Speichern" (oder drücke `S`)
|
||||
|
||||
### Schritt 4: Mehr Mails labeln
|
||||
|
||||
- Erstelle mindestens **20-50 Beispiel-Mails**
|
||||
- Nutze verschiedene Typen:
|
||||
- Zusammenfassen
|
||||
- Antwort schreiben
|
||||
- Action Items extrahieren
|
||||
- Nutze Shortcuts: `N` (Nächste), `S` (Speichern)
|
||||
|
||||
### Schritt 5: Statistiken prüfen
|
||||
|
||||
1. Gehe zu "Export & Stats"
|
||||
2. Prüfe:
|
||||
- Mind. 50 gelabelte Mails? ✅
|
||||
- Gute Verteilung der Task-Types? ✅
|
||||
|
||||
### Schritt 6: Training starten
|
||||
|
||||
1. Gehe zu "Training"
|
||||
2. Wähle dein Modell aus
|
||||
3. Nutze Standard-Einstellungen:
|
||||
- Learning Rate: 1e-5
|
||||
- Epochs: 3
|
||||
- Batch Size: 4
|
||||
- LoRA Rank: 8
|
||||
4. Klicke "Training starten"
|
||||
5. Beobachte Live-Updates
|
||||
|
||||
⏱️ **Training dauert**: Ca. 5-10 Minuten bei 50 Beispielen
|
||||
|
||||
### Schritt 7: Modell testen
|
||||
|
||||
1. Gehe zu "Evaluation"
|
||||
2. Klicke "Test-Beispiel laden"
|
||||
3. Klicke "Vergleich starten"
|
||||
4. Vergleiche Base- vs. Fine-tuned-Ausgabe
|
||||
|
||||
## Tipps
|
||||
|
||||
### Gute Trainingsdaten
|
||||
|
||||
✅ **DO**:
|
||||
- Mindestens 50 Beispiele
|
||||
- Konsistenter Output-Stil
|
||||
- Diverse Mail-Typen
|
||||
- Klare, eindeutige Labels
|
||||
|
||||
❌ **DON'T**:
|
||||
- Zu wenige Beispiele (<20)
|
||||
- Widersprüchliche Labels
|
||||
- Nur sehr ähnliche Mails
|
||||
- Zu lange Outputs (>500 Wörter)
|
||||
|
||||
### Training-Parameter
|
||||
|
||||
Für **erste Versuche**:
|
||||
- Learning Rate: **1e-5**
|
||||
- Epochs: **3**
|
||||
- Batch Size: **4**
|
||||
- LoRA Rank: **8**
|
||||
|
||||
Bei **Overfitting** (Val Loss steigt):
|
||||
- Learning Rate: **5e-6** (niedriger)
|
||||
- Epochs: **2** (weniger)
|
||||
|
||||
Bei **Underfitting** (beide Losses hoch):
|
||||
- Epochs: **5** (mehr)
|
||||
- LoRA Rank: **16** (höher)
|
||||
- Mehr Daten sammeln!
|
||||
|
||||
### Keyboard Shortcuts
|
||||
|
||||
Im Labeling-Interface:
|
||||
- `N` - Nächste Mail
|
||||
- `S` - Speichern
|
||||
- `K` - Skip (Überspringen)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Server startet nicht
|
||||
|
||||
```bash
|
||||
# Prüfe Python-Version (mind. 3.10)
|
||||
python3 --version
|
||||
|
||||
# Prüfe ob Port 8000 frei ist
|
||||
lsof -i :8000
|
||||
|
||||
# Nutze anderen Port
|
||||
uvicorn main:app --port 8001
|
||||
```
|
||||
|
||||
### Modell nicht gefunden
|
||||
|
||||
```bash
|
||||
# Prüfe ob Modell existiert
|
||||
ls -la models/
|
||||
|
||||
# Download nochmal versuchen
|
||||
huggingface-cli download mlx-community/Mistral-7B-Instruct-v0.3-4bit \
|
||||
--local-dir models/Mistral-7B-Instruct-v0.3-4bit
|
||||
```
|
||||
|
||||
### Out of Memory
|
||||
|
||||
Reduziere Batch Size:
|
||||
1. Gehe zu "Training"
|
||||
2. Setze Batch Size auf **2** oder **1**
|
||||
|
||||
### Training sehr langsam
|
||||
|
||||
- Nutze 4-bit quantisierte Modelle
|
||||
- Reduziere Batch Size
|
||||
- Schließe andere Programme
|
||||
|
||||
## Nächste Schritte
|
||||
|
||||
Nach erfolgreichem ersten Training:
|
||||
|
||||
1. **Mehr Daten sammeln**: 100+ Beispiele für bessere Ergebnisse
|
||||
2. **Parameter tunen**: Experimentiere mit Learning Rate und Epochs
|
||||
3. **Verschiedene Tasks**: Probiere alle Task-Types aus
|
||||
4. **Evaluation**: Teste ausgiebig mit neuen Mails
|
||||
|
||||
## Ressourcen
|
||||
|
||||
- Vollständige Doku: [README.md](README.md)
|
||||
- MLX Doku: https://ml-explore.github.io/mlx/
|
||||
- MLX-LM: https://github.com/ml-explore/mlx-examples
|
||||
|
||||
---
|
||||
|
||||
**Viel Erfolg! 🚀**
|
||||
|
||||
Bei Fragen schaue ins vollständige README oder die API-Dokumentation.
|
||||
@@ -0,0 +1,326 @@
|
||||
# Mail Fine-Tuning Web-App für macOS (Apple Silicon)
|
||||
|
||||
Eine vollständige lokale Web-Anwendung für das Fine-Tuning von LLMs auf Mail-Daten, optimiert für Apple Silicon (M4 Pro mit 24GB RAM).
|
||||
|
||||
## Features
|
||||
|
||||
- 📥 **Mail Import**: Drag & Drop Upload von .mbox, .eml, .txt Dateien mit automatischer Bereinigung
|
||||
- 🏷️ **Labeling Interface**: Komfortable UI zum manuellen Labeln von Mails
|
||||
- 📊 **Export & Statistiken**: JSONL Export für Training mit detaillierten Statistiken
|
||||
- 🤖 **Modell-Management**: Verwaltung von MLX-Modellen
|
||||
- 🎯 **Training**: LoRA Fine-Tuning mit Live-Updates und Visualisierung
|
||||
- 🧪 **Evaluation**: Chat-Interface mit Vergleichsmodus (Base vs. Fine-tuned)
|
||||
|
||||
## Technologie-Stack
|
||||
|
||||
- **Backend**: Python (FastAPI)
|
||||
- **Frontend**: HTML/CSS/JavaScript (Vanilla, keine Dependencies)
|
||||
- **ML Framework**: MLX (Apple Silicon optimiert)
|
||||
- **Database**: SQLite
|
||||
- **Empfohlene Modelle**: Mistral 7B, Llama 3 8B (via MLX)
|
||||
|
||||
## Projektstruktur
|
||||
|
||||
```
|
||||
mail-finetuning/
|
||||
├── backend/
|
||||
│ ├── main.py # FastAPI App
|
||||
│ ├── mail_parser.py # Mail Import & Bereinigung
|
||||
│ ├── data_manager.py # SQLite Operationen
|
||||
│ ├── training.py # MLX Training Wrapper
|
||||
│ └── inference.py # Modell-Inferenz
|
||||
├── frontend/
|
||||
│ ├── index.html
|
||||
│ ├── style.css
|
||||
│ └── app.js
|
||||
├── data/
|
||||
│ ├── mails.db # SQLite Datenbank
|
||||
│ ├── train.jsonl
|
||||
│ └── val.jsonl
|
||||
├── models/ # Heruntergeladene Modelle
|
||||
├── output/ # Trainierte Adapter
|
||||
└── requirements.txt
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
### Voraussetzungen
|
||||
|
||||
- macOS mit Apple Silicon (M1/M2/M3/M4)
|
||||
- Python 3.10 oder höher
|
||||
- mindestens 16GB RAM (24GB empfohlen)
|
||||
|
||||
### 1. Repository Setup
|
||||
|
||||
```bash
|
||||
cd training
|
||||
```
|
||||
|
||||
### 2. Virtual Environment erstellen
|
||||
|
||||
```bash
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
```
|
||||
|
||||
### 3. Dependencies installieren
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 4. Modell herunterladen
|
||||
|
||||
Wähle ein MLX-optimiertes Modell von Hugging Face:
|
||||
|
||||
```bash
|
||||
# Mistral 7B (4-bit quantisiert, ~4GB)
|
||||
huggingface-cli download mlx-community/Mistral-7B-Instruct-v0.3-4bit \
|
||||
--local-dir models/Mistral-7B-Instruct-v0.3-4bit
|
||||
|
||||
# ODER Llama 3 8B (4-bit quantisiert, ~5GB)
|
||||
huggingface-cli download mlx-community/Meta-Llama-3-8B-Instruct-4bit \
|
||||
--local-dir models/Meta-Llama-3-8B-Instruct-4bit
|
||||
```
|
||||
|
||||
**Hinweis**: Die 4-bit Versionen sind für 24GB RAM optimal. Für mehr RAM können auch größere Versionen genutzt werden.
|
||||
|
||||
## Nutzung
|
||||
|
||||
### 1. Server starten
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
python main.py
|
||||
```
|
||||
|
||||
Die App ist dann verfügbar unter: **http://localhost:8000**
|
||||
|
||||
### 2. Workflow
|
||||
|
||||
#### Schritt 1: Mails importieren
|
||||
|
||||
1. Gehe zu "Mail Import"
|
||||
2. Ziehe .eml, .mbox oder .txt Dateien per Drag & Drop in den Upload-Bereich
|
||||
3. Die Mails werden automatisch geparst und bereinigt
|
||||
|
||||
#### Schritt 2: Mails labeln
|
||||
|
||||
1. Wechsle zu "Labeling"
|
||||
2. Für jede Mail:
|
||||
- Wähle den **Aufgabentyp** (Zusammenfassen, Antwort schreiben, etc.)
|
||||
- Gib den **erwarteten Output** ein
|
||||
- Klicke "Speichern" oder nutze Shortcut `S`
|
||||
3. Nutze Shortcuts: `N` (Nächste), `S` (Speichern), `K` (Skip)
|
||||
|
||||
**Tipp**: Mindestens 50 gelabelte Beispiele für gutes Fine-Tuning!
|
||||
|
||||
#### Schritt 3: Daten exportieren
|
||||
|
||||
1. Gehe zu "Export & Stats"
|
||||
2. Prüfe die Statistiken (mind. 50 gelabelte Mails empfohlen)
|
||||
3. Klicke "JSONL generieren"
|
||||
4. Optional: Download der JSONL-Dateien zur Archivierung
|
||||
|
||||
#### Schritt 4: Training starten
|
||||
|
||||
1. Wechsle zu "Training"
|
||||
2. Konfiguriere Parameter:
|
||||
- **Modell**: Wähle heruntergeladenes Modell
|
||||
- **Learning Rate**: Standard 1e-5 (bei Overfitting niedriger)
|
||||
- **Epochs**: 3-5 für erste Versuche
|
||||
- **Batch Size**: 4 (bei 24GB RAM sicher)
|
||||
- **LoRA Rank**: 8-16 (höher = mehr Kapazität, mehr RAM)
|
||||
3. Klicke "Training starten"
|
||||
4. Beobachte Live-Updates:
|
||||
- Training/Validation Loss
|
||||
- Fortschritt und ETA
|
||||
- Speichernutzung
|
||||
|
||||
**Warnung bei Overfitting**: Wenn Validation Loss steigt während Training Loss sinkt, Training abbrechen!
|
||||
|
||||
#### Schritt 5: Modell testen
|
||||
|
||||
1. Gehe zu "Evaluation"
|
||||
2. Wähle Task-Type und gib Mail-Text ein
|
||||
3. Klicke "Vergleich starten"
|
||||
4. Sieh dir die Ausgaben von Base- und Fine-tuned-Modell an
|
||||
|
||||
### 3. Export des fertigen Modells
|
||||
|
||||
Nach erfolgreichem Training liegen die LoRA-Adapter in `output/run_[timestamp]/adapters.npz`.
|
||||
|
||||
Um das Modell zu nutzen:
|
||||
|
||||
```python
|
||||
from mlx_lm import load
|
||||
|
||||
model = load(
|
||||
"models/Mistral-7B-Instruct-v0.3-4bit",
|
||||
adapter_path="output/run_1234567890/adapters.npz"
|
||||
)
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Mails
|
||||
|
||||
- `POST /api/mails/upload` - Mails hochladen
|
||||
- `GET /api/mails` - Alle Mails abrufen
|
||||
- `GET /api/mails/{id}` - Einzelne Mail
|
||||
- `PUT /api/mails/{id}` - Mail aktualisieren (Labeling)
|
||||
- `DELETE /api/mails/{id}` - Mail löschen
|
||||
|
||||
### Export
|
||||
|
||||
- `GET /api/export/stats` - Statistiken
|
||||
- `POST /api/export/jsonl` - Training-Daten generieren
|
||||
- `GET /api/export/download/{train|val}` - JSONL herunterladen
|
||||
|
||||
### Modelle
|
||||
|
||||
- `GET /api/models` - Verfügbare Modelle
|
||||
- `POST /api/models/download` - Modell herunterladen (Placeholder)
|
||||
|
||||
### Training
|
||||
|
||||
- `POST /api/training/start` - Training starten
|
||||
- `POST /api/training/stop` - Training stoppen
|
||||
- `GET /api/training/status` - Status abrufen
|
||||
- `GET /api/training/stream` - SSE Stream für Live-Updates
|
||||
|
||||
### Inference
|
||||
|
||||
- `POST /api/inference/load` - Modell laden
|
||||
- `GET /api/inference/loaded` - Geladene Modelle
|
||||
- `POST /api/inference/generate` - Text generieren
|
||||
- `POST /api/inference/compare` - Modell-Vergleich
|
||||
- `GET /api/inference/test-prompts` - Test-Prompts
|
||||
|
||||
## Tipps & Best Practices
|
||||
|
||||
### Datenqualität
|
||||
|
||||
- **Mindestens 50 Beispiele** pro Task-Type
|
||||
- **Einheitlicher Output-Stil**: Achte auf konsistente Formatierung
|
||||
- **Diverse Beispiele**: Verschiedene Mail-Längen und Stile
|
||||
- **Klare Labels**: Vermeide mehrdeutige oder widersprüchliche Labels
|
||||
|
||||
### Training
|
||||
|
||||
- **Learning Rate**:
|
||||
- 1e-5 für die meisten Fälle
|
||||
- 5e-6 bei Overfitting
|
||||
- 1e-4 bei sehr kleinem Datensatz (Vorsicht!)
|
||||
|
||||
- **Epochs**:
|
||||
- 3 Epochs für Start
|
||||
- Mehr Epochs wenn Loss noch sinkt
|
||||
- Weniger wenn Overfitting auftritt
|
||||
|
||||
- **LoRA Rank**:
|
||||
- 8 für einfache Tasks
|
||||
- 16-32 für komplexe Tasks
|
||||
- Höher = mehr Kapazität aber mehr RAM
|
||||
|
||||
### Overfitting erkennen
|
||||
|
||||
Zeichen von Overfitting:
|
||||
- ✅ Training Loss sinkt kontinuierlich
|
||||
- ❌ Validation Loss steigt oder stagniert
|
||||
- ❌ Modell "memoriert" exakte Trainingsbeispiele
|
||||
|
||||
Lösungen:
|
||||
- Mehr Daten sammeln
|
||||
- Kleinere Learning Rate
|
||||
- Weniger Epochs
|
||||
- Niedrigere LoRA Rank
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Out of Memory" Fehler
|
||||
|
||||
- Reduziere Batch Size (4 → 2 → 1)
|
||||
- Nutze kleineres Modell (4-bit quantisiert)
|
||||
- Schließe andere Programme
|
||||
|
||||
### Training sehr langsam
|
||||
|
||||
- Prüfe ob Metal Performance Shaders aktiv sind
|
||||
- Nutze 4-bit quantisierte Modelle
|
||||
- Reduziere max_seq_length (Standard: 2048)
|
||||
|
||||
### Modell gibt schlechte Ergebnisse
|
||||
|
||||
- Mehr/bessere Trainingsdaten
|
||||
- Längeres Training (mehr Epochs)
|
||||
- Höhere LoRA Rank
|
||||
- Prüfe Prompt-Format
|
||||
|
||||
## Wichtige Hinweise
|
||||
|
||||
### MLX Training Loop
|
||||
|
||||
**WICHTIG**: Die aktuelle Implementierung in `training.py` enthält eine **simulierte Training Loop**. Für produktiven Einsatz muss diese durch echtes MLX Training ersetzt werden:
|
||||
|
||||
```python
|
||||
# Beispiel für echtes MLX Training mit mlx-lm
|
||||
from mlx_lm.tuner import train
|
||||
|
||||
train(
|
||||
model_path=str(model_path),
|
||||
data_path=str(train_file),
|
||||
val_data_path=str(val_file),
|
||||
adapter_file=str(output_path / 'adapters.npz'),
|
||||
iters=total_steps,
|
||||
learning_rate=config.learning_rate,
|
||||
batch_size=config.batch_size,
|
||||
# ... weitere Parameter
|
||||
)
|
||||
```
|
||||
|
||||
Siehe [mlx-lm Dokumentation](https://github.com/ml-explore/mlx-examples/tree/main/llms) für Details.
|
||||
|
||||
### Inference
|
||||
|
||||
Die Inference-Implementation in `inference.py` nutzt `mlx_lm.generate()`. Stelle sicher, dass das richtige Prompt-Format für dein Modell genutzt wird (z.B. ChatML, Llama-Format, etc.).
|
||||
|
||||
## Entwicklung
|
||||
|
||||
### Debug-Modus
|
||||
|
||||
```bash
|
||||
uvicorn main:app --reload --log-level debug
|
||||
```
|
||||
|
||||
### Tests (TODO)
|
||||
|
||||
```bash
|
||||
pytest tests/
|
||||
```
|
||||
|
||||
## Lizenz
|
||||
|
||||
MIT License
|
||||
|
||||
## Support
|
||||
|
||||
Bei Problemen:
|
||||
1. Prüfe die Browser Console (F12) für Frontend-Fehler
|
||||
2. Prüfe die Server-Logs für Backend-Fehler
|
||||
3. Stelle sicher, dass alle Dependencies installiert sind
|
||||
4. Prüfe, dass MLX korrekt auf Apple Silicon läuft
|
||||
|
||||
## Roadmap
|
||||
|
||||
- [ ] Echte MLX Training Loop implementieren
|
||||
- [ ] Automatisches Checkpoint-Management
|
||||
- [ ] Model Merging (Base + Adapter zusammenführen)
|
||||
- [ ] Export für Deployment
|
||||
- [ ] Batch-Inference
|
||||
- [ ] Tests
|
||||
- [ ] Docker Support
|
||||
|
||||
---
|
||||
|
||||
**Viel Erfolg beim Fine-Tuning! 🚀**
|
||||
@@ -0,0 +1,286 @@
|
||||
"""
|
||||
Data Manager für Mail Fine-Tuning App
|
||||
Verwaltet SQLite Datenbank für Mails und Labels
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
import json
|
||||
from datetime import datetime
|
||||
from typing import List, Dict, Optional
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class DataManager:
|
||||
def __init__(self, db_path: str = "data/mails.db"):
|
||||
self.db_path = Path(db_path)
|
||||
self.db_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
self.init_db()
|
||||
|
||||
def init_db(self):
|
||||
"""Initialisiert die Datenbank mit dem Schema"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("""
|
||||
CREATE TABLE IF NOT EXISTS mails (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
subject TEXT,
|
||||
sender TEXT,
|
||||
recipient TEXT,
|
||||
date TEXT,
|
||||
body TEXT NOT NULL,
|
||||
original_format TEXT,
|
||||
task_type TEXT DEFAULT 'unlabeled',
|
||||
expected_output TEXT,
|
||||
status TEXT DEFAULT 'unlabeled',
|
||||
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at TEXT DEFAULT CURRENT_TIMESTAMP
|
||||
)
|
||||
""")
|
||||
|
||||
cursor.execute("""
|
||||
CREATE TABLE IF NOT EXISTS training_runs (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
model_name TEXT NOT NULL,
|
||||
start_time TEXT,
|
||||
end_time TEXT,
|
||||
config TEXT,
|
||||
status TEXT,
|
||||
final_train_loss REAL,
|
||||
final_val_loss REAL,
|
||||
checkpoint_path TEXT
|
||||
)
|
||||
""")
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
def add_mail(self, subject: str, sender: str, recipient: str,
|
||||
date: str, body: str, original_format: str) -> int:
|
||||
"""Fügt eine neue Mail hinzu"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("""
|
||||
INSERT INTO mails (subject, sender, recipient, date, body, original_format)
|
||||
VALUES (?, ?, ?, ?, ?, ?)
|
||||
""", (subject, sender, recipient, date, body, original_format))
|
||||
|
||||
mail_id = cursor.lastrowid
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
return mail_id
|
||||
|
||||
def get_all_mails(self, status_filter: Optional[str] = None) -> List[Dict]:
|
||||
"""Holt alle Mails, optional gefiltert nach Status"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
conn.row_factory = sqlite3.Row
|
||||
cursor = conn.cursor()
|
||||
|
||||
if status_filter:
|
||||
cursor.execute("SELECT * FROM mails WHERE status = ? ORDER BY id", (status_filter,))
|
||||
else:
|
||||
cursor.execute("SELECT * FROM mails ORDER BY id")
|
||||
|
||||
rows = cursor.fetchall()
|
||||
mails = [dict(row) for row in rows]
|
||||
|
||||
conn.close()
|
||||
return mails
|
||||
|
||||
def get_mail(self, mail_id: int) -> Optional[Dict]:
|
||||
"""Holt eine einzelne Mail"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
conn.row_factory = sqlite3.Row
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("SELECT * FROM mails WHERE id = ?", (mail_id,))
|
||||
row = cursor.fetchone()
|
||||
|
||||
conn.close()
|
||||
return dict(row) if row else None
|
||||
|
||||
def update_mail(self, mail_id: int, task_type: Optional[str] = None,
|
||||
expected_output: Optional[str] = None,
|
||||
status: Optional[str] = None,
|
||||
body: Optional[str] = None) -> bool:
|
||||
"""Aktualisiert eine Mail (Labeling)"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
updates = []
|
||||
params = []
|
||||
|
||||
if task_type is not None:
|
||||
updates.append("task_type = ?")
|
||||
params.append(task_type)
|
||||
|
||||
if expected_output is not None:
|
||||
updates.append("expected_output = ?")
|
||||
params.append(expected_output)
|
||||
|
||||
if status is not None:
|
||||
updates.append("status = ?")
|
||||
params.append(status)
|
||||
|
||||
if body is not None:
|
||||
updates.append("body = ?")
|
||||
params.append(body)
|
||||
|
||||
if not updates:
|
||||
conn.close()
|
||||
return False
|
||||
|
||||
updates.append("updated_at = ?")
|
||||
params.append(datetime.now().isoformat())
|
||||
params.append(mail_id)
|
||||
|
||||
query = f"UPDATE mails SET {', '.join(updates)} WHERE id = ?"
|
||||
cursor.execute(query, params)
|
||||
|
||||
success = cursor.rowcount > 0
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
return success
|
||||
|
||||
def delete_mail(self, mail_id: int) -> bool:
|
||||
"""Löscht eine Mail"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("DELETE FROM mails WHERE id = ?", (mail_id,))
|
||||
success = cursor.rowcount > 0
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
return success
|
||||
|
||||
def get_statistics(self) -> Dict:
|
||||
"""Berechnet Statistiken über die Daten"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Gesamt-Anzahl
|
||||
cursor.execute("SELECT COUNT(*) FROM mails")
|
||||
total = cursor.fetchone()[0]
|
||||
|
||||
# Nach Status
|
||||
cursor.execute("""
|
||||
SELECT status, COUNT(*) as count
|
||||
FROM mails
|
||||
GROUP BY status
|
||||
""")
|
||||
status_counts = {row[0]: row[1] for row in cursor.fetchall()}
|
||||
|
||||
# Nach Task-Type
|
||||
cursor.execute("""
|
||||
SELECT task_type, COUNT(*) as count
|
||||
FROM mails
|
||||
WHERE status = 'labeled'
|
||||
GROUP BY task_type
|
||||
""")
|
||||
task_counts = {row[0]: row[1] for row in cursor.fetchall()}
|
||||
|
||||
# Durchschnittliche Längen (nur gelabelte)
|
||||
cursor.execute("""
|
||||
SELECT
|
||||
AVG(LENGTH(body)) as avg_input_length,
|
||||
AVG(LENGTH(expected_output)) as avg_output_length
|
||||
FROM mails
|
||||
WHERE status = 'labeled'
|
||||
""")
|
||||
lengths = cursor.fetchone()
|
||||
|
||||
conn.close()
|
||||
|
||||
labeled_count = status_counts.get('labeled', 0)
|
||||
|
||||
return {
|
||||
'total': total,
|
||||
'labeled': labeled_count,
|
||||
'unlabeled': status_counts.get('unlabeled', 0),
|
||||
'skipped': status_counts.get('skip', 0),
|
||||
'task_distribution': task_counts,
|
||||
'avg_input_length': round(lengths[0]) if lengths[0] else 0,
|
||||
'avg_output_length': round(lengths[1]) if lengths[1] else 0,
|
||||
'sufficient_data': labeled_count >= 50
|
||||
}
|
||||
|
||||
def export_training_data(self, train_split: float = 0.9) -> tuple[List[Dict], List[Dict]]:
|
||||
"""Exportiert gelabelte Daten für Training"""
|
||||
import random
|
||||
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
conn.row_factory = sqlite3.Row
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("""
|
||||
SELECT body, task_type, expected_output
|
||||
FROM mails
|
||||
WHERE status = 'labeled' AND expected_output IS NOT NULL
|
||||
ORDER BY RANDOM()
|
||||
""")
|
||||
|
||||
rows = cursor.fetchall()
|
||||
conn.close()
|
||||
|
||||
if not rows:
|
||||
return [], []
|
||||
|
||||
data = [dict(row) for row in rows]
|
||||
|
||||
# Shuffle
|
||||
random.shuffle(data)
|
||||
|
||||
# Split
|
||||
split_idx = int(len(data) * train_split)
|
||||
train_data = data[:split_idx]
|
||||
val_data = data[split_idx:]
|
||||
|
||||
return train_data, val_data
|
||||
|
||||
def save_training_run(self, model_name: str, config: Dict,
|
||||
checkpoint_path: str) -> int:
|
||||
"""Speichert einen Training-Run"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("""
|
||||
INSERT INTO training_runs
|
||||
(model_name, start_time, config, status, checkpoint_path)
|
||||
VALUES (?, ?, ?, ?, ?)
|
||||
""", (
|
||||
model_name,
|
||||
datetime.now().isoformat(),
|
||||
json.dumps(config),
|
||||
'running',
|
||||
checkpoint_path
|
||||
))
|
||||
|
||||
run_id = cursor.lastrowid
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
return run_id
|
||||
|
||||
def update_training_run(self, run_id: int, status: str,
|
||||
train_loss: Optional[float] = None,
|
||||
val_loss: Optional[float] = None):
|
||||
"""Aktualisiert einen Training-Run"""
|
||||
conn = sqlite3.connect(self.db_path)
|
||||
cursor = conn.cursor()
|
||||
|
||||
cursor.execute("""
|
||||
UPDATE training_runs
|
||||
SET status = ?,
|
||||
end_time = ?,
|
||||
final_train_loss = COALESCE(?, final_train_loss),
|
||||
final_val_loss = COALESCE(?, final_val_loss)
|
||||
WHERE id = ?
|
||||
""", (status, datetime.now().isoformat(), train_loss, val_loss, run_id))
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
@@ -0,0 +1,209 @@
|
||||
"""
|
||||
Inference Module für Modell-Evaluation
|
||||
Lädt Base- und Fine-tuned Models für Vergleiche
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Optional, Dict
|
||||
import threading
|
||||
|
||||
|
||||
class ModelInference:
|
||||
"""Handhabt Modell-Inferenz für Base und Fine-tuned Models"""
|
||||
|
||||
def __init__(self, models_dir: str = "models", output_dir: str = "output"):
|
||||
self.models_dir = Path(models_dir)
|
||||
self.output_dir = Path(output_dir)
|
||||
|
||||
self.base_model = None
|
||||
self.finetuned_model = None
|
||||
self.model_lock = threading.Lock()
|
||||
|
||||
def load_base_model(self, model_name: str) -> bool:
|
||||
"""Lädt das Basis-Modell"""
|
||||
try:
|
||||
# Import MLX nur bei Bedarf
|
||||
from mlx_lm import load
|
||||
|
||||
model_path = self.models_dir / model_name
|
||||
|
||||
if not model_path.exists():
|
||||
return False
|
||||
|
||||
with self.model_lock:
|
||||
self.base_model = load(str(model_path))
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error loading base model: {e}")
|
||||
return False
|
||||
|
||||
def load_finetuned_model(self, model_name: str, adapter_path: str) -> bool:
|
||||
"""Lädt das Fine-tuned Modell (Base + LoRA Adapter)"""
|
||||
try:
|
||||
from mlx_lm import load
|
||||
|
||||
model_path = self.models_dir / model_name
|
||||
adapter_file = Path(adapter_path)
|
||||
|
||||
if not model_path.exists() or not adapter_file.exists():
|
||||
return False
|
||||
|
||||
with self.model_lock:
|
||||
# Lade Base Model mit Adapter
|
||||
self.finetuned_model = load(
|
||||
str(model_path),
|
||||
adapter_path=str(adapter_file)
|
||||
)
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error loading finetuned model: {e}")
|
||||
return False
|
||||
|
||||
def generate(self, prompt: str, model_type: str = 'base',
|
||||
max_tokens: int = 512, temperature: float = 0.7) -> str:
|
||||
"""
|
||||
Generiert Text mit dem gewählten Modell
|
||||
|
||||
Args:
|
||||
prompt: Input prompt
|
||||
model_type: 'base' oder 'finetuned'
|
||||
max_tokens: Maximale Anzahl Tokens
|
||||
temperature: Sampling temperature
|
||||
|
||||
Returns:
|
||||
Generierter Text
|
||||
"""
|
||||
try:
|
||||
from mlx_lm import generate as mlx_generate
|
||||
|
||||
model = self.base_model if model_type == 'base' else self.finetuned_model
|
||||
|
||||
if model is None:
|
||||
return f"Error: {model_type} model not loaded"
|
||||
|
||||
with self.model_lock:
|
||||
# MLX-LM generate
|
||||
result = mlx_generate(
|
||||
model,
|
||||
prompt=prompt,
|
||||
max_tokens=max_tokens,
|
||||
temp=temperature
|
||||
)
|
||||
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
return f"Error during generation: {str(e)}"
|
||||
|
||||
def generate_comparison(self, prompt: str, max_tokens: int = 512,
|
||||
temperature: float = 0.7) -> Dict[str, str]:
|
||||
"""
|
||||
Generiert mit beiden Modellen für Vergleich
|
||||
|
||||
Returns:
|
||||
Dict mit 'base' und 'finetuned' Outputs
|
||||
"""
|
||||
result = {
|
||||
'base': None,
|
||||
'finetuned': None
|
||||
}
|
||||
|
||||
if self.base_model:
|
||||
result['base'] = self.generate(
|
||||
prompt, 'base', max_tokens, temperature
|
||||
)
|
||||
|
||||
if self.finetuned_model:
|
||||
result['finetuned'] = self.generate(
|
||||
prompt, 'finetuned', max_tokens, temperature
|
||||
)
|
||||
|
||||
return result
|
||||
|
||||
def format_mail_prompt(self, task_type: str, mail_body: str) -> str:
|
||||
"""Formatiert einen Prompt basierend auf Task-Type"""
|
||||
|
||||
task_prompts = {
|
||||
'Zusammenfassen': 'Fasse folgende E-Mail zusammen:',
|
||||
'Antwort schreiben': 'Schreibe eine Antwort auf folgende E-Mail:',
|
||||
'Kategorisieren': 'Kategorisiere folgende E-Mail:',
|
||||
'Action Items': 'Extrahiere die Action Items aus folgender E-Mail:',
|
||||
'Custom': 'Bearbeite folgende E-Mail:'
|
||||
}
|
||||
|
||||
instruction = task_prompts.get(task_type, task_prompts['Custom'])
|
||||
|
||||
return f"{instruction}\n\n{mail_body}"
|
||||
|
||||
def get_test_prompts(self) -> Dict[str, str]:
|
||||
"""Vordefinierte Test-Prompts"""
|
||||
return {
|
||||
'Zusammenfassen': self.format_mail_prompt(
|
||||
'Zusammenfassen',
|
||||
"""Betreff: Q4 Projektupdate
|
||||
|
||||
Hallo Team,
|
||||
|
||||
ich wollte euch ein kurzes Update zum aktuellen Projektstand geben.
|
||||
|
||||
Wir haben letzte Woche die neue API-Integration abgeschlossen und erfolgreich getestet.
|
||||
Die Performance-Tests zeigen eine Verbesserung von 40% gegenüber der alten Implementierung.
|
||||
|
||||
Nächste Woche starten wir mit der Frontend-Anpassung. Maria und Tom werden das Design
|
||||
überarbeiten, während ich mich um die Backend-Anbindung kümmere.
|
||||
|
||||
Der Go-Live ist weiterhin für Ende des Monats geplant.
|
||||
|
||||
Beste Grüße
|
||||
Alex"""
|
||||
),
|
||||
'Antwort schreiben': self.format_mail_prompt(
|
||||
'Antwort schreiben',
|
||||
"""Betreff: Frage zu Invoice #2847
|
||||
|
||||
Hallo,
|
||||
|
||||
ich habe eine Frage zur Rechnung #2847 vom 15. März.
|
||||
Der Betrag scheint nicht mit unserem Angebot übereinzustimmen.
|
||||
|
||||
Könnten Sie das bitte prüfen?
|
||||
|
||||
Danke
|
||||
Michael"""
|
||||
),
|
||||
'Action Items': self.format_mail_prompt(
|
||||
'Action Items',
|
||||
"""Betreff: Meeting Notes - Produktlaunch
|
||||
|
||||
Hi alle,
|
||||
|
||||
hier die wichtigsten Punkte vom heutigen Meeting:
|
||||
|
||||
- Sarah bereitet die Pressemitteilung vor (Deadline: Freitag)
|
||||
- Marketing-Team erstellt Social Media Content (nächste Woche)
|
||||
- Ich kümmere mich um die Influencer-Kontakte
|
||||
- Wir brauchen noch finale Produktfotos vom Design-Team
|
||||
- Launch-Event ist am 1. April - Location muss noch gebucht werden
|
||||
|
||||
Bitte gebt bis Mittwoch Bescheid ob ihr eure Aufgaben schaffen könnt.
|
||||
|
||||
Lisa"""
|
||||
)
|
||||
}
|
||||
|
||||
def unload_models(self):
|
||||
"""Entlädt Modelle aus dem Speicher"""
|
||||
with self.model_lock:
|
||||
self.base_model = None
|
||||
self.finetuned_model = None
|
||||
|
||||
def get_loaded_models(self) -> Dict[str, bool]:
|
||||
"""Gibt zurück welche Modelle geladen sind"""
|
||||
return {
|
||||
'base': self.base_model is not None,
|
||||
'finetuned': self.finetuned_model is not None
|
||||
}
|
||||
@@ -0,0 +1,264 @@
|
||||
"""
|
||||
Mail Parser für verschiedene Formate
|
||||
Bereinigt und normalisiert Mail-Inhalte
|
||||
"""
|
||||
|
||||
import email
|
||||
import mailbox
|
||||
import re
|
||||
from bs4 import BeautifulSoup
|
||||
from typing import List, Dict, Optional
|
||||
from pathlib import Path
|
||||
import chardet
|
||||
|
||||
|
||||
class MailParser:
|
||||
"""Parst und bereinigt Mail-Dateien"""
|
||||
|
||||
# Häufige Footer/Disclaimer Pattern
|
||||
FOOTER_PATTERNS = [
|
||||
r'(?i)^--\s*$.*', # Standard signature delimiter
|
||||
r'(?i)Diese E-Mail.*vertraulich.*',
|
||||
r'(?i)This email.*confidential.*',
|
||||
r'(?i)Disclaimer:.*',
|
||||
r'(?i)Get Outlook for.*',
|
||||
r'(?i)Sent from my iPhone.*',
|
||||
r'(?i)Von meinem.*gesendet.*',
|
||||
r'(?i)Diese Nachricht.*Virenfrei.*',
|
||||
]
|
||||
|
||||
@staticmethod
|
||||
def detect_encoding(file_path: Path) -> str:
|
||||
"""Erkennt das Encoding einer Datei"""
|
||||
with open(file_path, 'rb') as f:
|
||||
raw_data = f.read()
|
||||
result = chardet.detect(raw_data)
|
||||
return result['encoding'] or 'utf-8'
|
||||
|
||||
@staticmethod
|
||||
def html_to_text(html: str) -> str:
|
||||
"""Konvertiert HTML zu Plain Text"""
|
||||
soup = BeautifulSoup(html, 'html.parser')
|
||||
|
||||
# Entferne Script und Style Tags
|
||||
for script in soup(['script', 'style']):
|
||||
script.decompose()
|
||||
|
||||
# Extrahiere Text
|
||||
text = soup.get_text()
|
||||
|
||||
# Bereinige Whitespace
|
||||
lines = (line.strip() for line in text.splitlines())
|
||||
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
|
||||
text = ' '.join(chunk for chunk in chunks if chunk)
|
||||
|
||||
return text
|
||||
|
||||
@staticmethod
|
||||
def remove_multiple_newlines(text: str) -> str:
|
||||
"""Entfernt mehrfache Leerzeilen"""
|
||||
return re.sub(r'\n{3,}', '\n\n', text)
|
||||
|
||||
@staticmethod
|
||||
def remove_footers(text: str) -> str:
|
||||
"""Entfernt häufige Footer und Disclaimer"""
|
||||
for pattern in MailParser.FOOTER_PATTERNS:
|
||||
# Suche Pattern und entferne alles danach
|
||||
match = re.search(pattern, text, re.MULTILINE | re.DOTALL)
|
||||
if match:
|
||||
text = text[:match.start()].strip()
|
||||
|
||||
return text
|
||||
|
||||
@staticmethod
|
||||
def clean_quoted_text(text: str) -> str:
|
||||
"""Entfernt oder markiert quoted Text (> oder |)"""
|
||||
lines = text.split('\n')
|
||||
cleaned_lines = []
|
||||
|
||||
for line in lines:
|
||||
# Überspringe Zeilen die mit > oder | beginnen (quoted text)
|
||||
if not line.strip().startswith('>') and not line.strip().startswith('|'):
|
||||
cleaned_lines.append(line)
|
||||
|
||||
return '\n'.join(cleaned_lines)
|
||||
|
||||
@staticmethod
|
||||
def normalize_whitespace(text: str) -> str:
|
||||
"""Normalisiert Whitespace"""
|
||||
# Entferne trailing spaces
|
||||
lines = [line.rstrip() for line in text.split('\n')]
|
||||
text = '\n'.join(lines)
|
||||
|
||||
# Entferne mehrfache Spaces
|
||||
text = re.sub(r' {2,}', ' ', text)
|
||||
|
||||
# Entferne mehrfache Leerzeilen
|
||||
text = MailParser.remove_multiple_newlines(text)
|
||||
|
||||
return text.strip()
|
||||
|
||||
@staticmethod
|
||||
def clean_text(text: str, is_html: bool = False) -> str:
|
||||
"""Vollständige Bereinigung eines Texts"""
|
||||
if is_html:
|
||||
text = MailParser.html_to_text(text)
|
||||
|
||||
text = MailParser.remove_footers(text)
|
||||
text = MailParser.clean_quoted_text(text)
|
||||
text = MailParser.normalize_whitespace(text)
|
||||
|
||||
return text
|
||||
|
||||
@staticmethod
|
||||
def parse_eml(file_path: Path) -> Dict:
|
||||
"""Parst eine .eml Datei"""
|
||||
encoding = MailParser.detect_encoding(file_path)
|
||||
|
||||
with open(file_path, 'r', encoding=encoding, errors='ignore') as f:
|
||||
msg = email.message_from_file(f)
|
||||
|
||||
subject = msg.get('Subject', 'No Subject')
|
||||
sender = msg.get('From', 'Unknown')
|
||||
recipient = msg.get('To', 'Unknown')
|
||||
date = msg.get('Date', '')
|
||||
|
||||
# Body extrahieren
|
||||
body = ""
|
||||
is_html = False
|
||||
|
||||
if msg.is_multipart():
|
||||
for part in msg.walk():
|
||||
content_type = part.get_content_type()
|
||||
if content_type == 'text/plain':
|
||||
body = part.get_payload(decode=True).decode(errors='ignore')
|
||||
break
|
||||
elif content_type == 'text/html' and not body:
|
||||
body = part.get_payload(decode=True).decode(errors='ignore')
|
||||
is_html = True
|
||||
else:
|
||||
body = msg.get_payload(decode=True).decode(errors='ignore')
|
||||
if msg.get_content_type() == 'text/html':
|
||||
is_html = True
|
||||
|
||||
# Bereinige Body
|
||||
body = MailParser.clean_text(body, is_html)
|
||||
|
||||
return {
|
||||
'subject': subject,
|
||||
'sender': sender,
|
||||
'recipient': recipient,
|
||||
'date': date,
|
||||
'body': body,
|
||||
'original_format': 'eml'
|
||||
}
|
||||
|
||||
@staticmethod
|
||||
def parse_mbox(file_path: Path) -> List[Dict]:
|
||||
"""Parst eine .mbox Datei"""
|
||||
mails = []
|
||||
|
||||
try:
|
||||
mbox = mailbox.mbox(str(file_path))
|
||||
|
||||
for message in mbox:
|
||||
subject = message.get('Subject', 'No Subject')
|
||||
sender = message.get('From', 'Unknown')
|
||||
recipient = message.get('To', 'Unknown')
|
||||
date = message.get('Date', '')
|
||||
|
||||
body = ""
|
||||
is_html = False
|
||||
|
||||
if message.is_multipart():
|
||||
for part in message.walk():
|
||||
content_type = part.get_content_type()
|
||||
if content_type == 'text/plain':
|
||||
payload = part.get_payload(decode=True)
|
||||
if payload:
|
||||
body = payload.decode(errors='ignore')
|
||||
break
|
||||
elif content_type == 'text/html' and not body:
|
||||
payload = part.get_payload(decode=True)
|
||||
if payload:
|
||||
body = payload.decode(errors='ignore')
|
||||
is_html = True
|
||||
else:
|
||||
payload = message.get_payload(decode=True)
|
||||
if payload:
|
||||
body = payload.decode(errors='ignore')
|
||||
if message.get_content_type() == 'text/html':
|
||||
is_html = True
|
||||
|
||||
body = MailParser.clean_text(body, is_html)
|
||||
|
||||
mails.append({
|
||||
'subject': subject,
|
||||
'sender': sender,
|
||||
'recipient': recipient,
|
||||
'date': date,
|
||||
'body': body,
|
||||
'original_format': 'mbox'
|
||||
})
|
||||
|
||||
except Exception as e:
|
||||
raise Exception(f"Error parsing mbox: {str(e)}")
|
||||
|
||||
return mails
|
||||
|
||||
@staticmethod
|
||||
def parse_txt(file_path: Path) -> Dict:
|
||||
"""Parst eine .txt Datei (simple Mail als Text)"""
|
||||
encoding = MailParser.detect_encoding(file_path)
|
||||
|
||||
with open(file_path, 'r', encoding=encoding, errors='ignore') as f:
|
||||
content = f.read()
|
||||
|
||||
# Einfache Struktur: Versuche Subject/From/To zu erkennen
|
||||
lines = content.split('\n')
|
||||
subject = 'No Subject'
|
||||
sender = 'Unknown'
|
||||
recipient = 'Unknown'
|
||||
date = ''
|
||||
body_start = 0
|
||||
|
||||
for i, line in enumerate(lines[:10]): # Erste 10 Zeilen prüfen
|
||||
if line.lower().startswith('subject:'):
|
||||
subject = line[8:].strip()
|
||||
body_start = max(body_start, i + 1)
|
||||
elif line.lower().startswith('from:'):
|
||||
sender = line[5:].strip()
|
||||
body_start = max(body_start, i + 1)
|
||||
elif line.lower().startswith('to:'):
|
||||
recipient = line[3:].strip()
|
||||
body_start = max(body_start, i + 1)
|
||||
elif line.lower().startswith('date:'):
|
||||
date = line[5:].strip()
|
||||
body_start = max(body_start, i + 1)
|
||||
|
||||
# Body ist der Rest
|
||||
body = '\n'.join(lines[body_start:])
|
||||
body = MailParser.clean_text(body)
|
||||
|
||||
return {
|
||||
'subject': subject,
|
||||
'sender': sender,
|
||||
'recipient': recipient,
|
||||
'date': date,
|
||||
'body': body,
|
||||
'original_format': 'txt'
|
||||
}
|
||||
|
||||
@staticmethod
|
||||
def parse_file(file_path: Path) -> List[Dict]:
|
||||
"""Parst eine Mail-Datei basierend auf Endung"""
|
||||
suffix = file_path.suffix.lower()
|
||||
|
||||
if suffix == '.eml':
|
||||
return [MailParser.parse_eml(file_path)]
|
||||
elif suffix == '.mbox':
|
||||
return MailParser.parse_mbox(file_path)
|
||||
elif suffix == '.txt':
|
||||
return [MailParser.parse_txt(file_path)]
|
||||
else:
|
||||
raise ValueError(f"Unsupported file format: {suffix}")
|
||||
@@ -0,0 +1,396 @@
|
||||
"""
|
||||
FastAPI Backend für Mail Fine-Tuning App
|
||||
Hauptanwendung mit allen API Endpoints
|
||||
"""
|
||||
|
||||
from fastapi import FastAPI, File, UploadFile, HTTPException, BackgroundTasks
|
||||
from fastapi.responses import StreamingResponse, FileResponse
|
||||
from fastapi.staticfiles import StaticFiles
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from pydantic import BaseModel
|
||||
from typing import Optional, List
|
||||
import asyncio
|
||||
import json
|
||||
from pathlib import Path
|
||||
import shutil
|
||||
|
||||
from data_manager import DataManager
|
||||
from mail_parser import MailParser
|
||||
from training import MLXTrainer, TrainingConfig
|
||||
from inference import ModelInference
|
||||
|
||||
# FastAPI App
|
||||
app = FastAPI(title="Mail Fine-Tuning App")
|
||||
|
||||
# CORS
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
# Initialisiere Manager
|
||||
data_manager = DataManager("data/mails.db")
|
||||
trainer = MLXTrainer("models", "output")
|
||||
inference = ModelInference("models", "output")
|
||||
|
||||
|
||||
# Pydantic Models
|
||||
class MailUpdate(BaseModel):
|
||||
task_type: Optional[str] = None
|
||||
expected_output: Optional[str] = None
|
||||
status: Optional[str] = None
|
||||
body: Optional[str] = None
|
||||
|
||||
|
||||
class TrainingStartRequest(BaseModel):
|
||||
model_name: str
|
||||
learning_rate: float = 1e-5
|
||||
epochs: int = 3
|
||||
batch_size: int = 4
|
||||
lora_rank: int = 8
|
||||
|
||||
|
||||
class InferenceRequest(BaseModel):
|
||||
prompt: str
|
||||
model_type: str = 'base'
|
||||
max_tokens: int = 512
|
||||
temperature: float = 0.7
|
||||
|
||||
|
||||
class InferenceComparisonRequest(BaseModel):
|
||||
task_type: str
|
||||
mail_body: str
|
||||
max_tokens: int = 512
|
||||
temperature: float = 0.7
|
||||
|
||||
|
||||
# ===== Mail Endpoints =====
|
||||
|
||||
@app.post("/api/mails/upload")
|
||||
async def upload_mails(files: List[UploadFile] = File(...)):
|
||||
"""Upload und Parse von Mail-Dateien"""
|
||||
results = {
|
||||
'success': [],
|
||||
'errors': []
|
||||
}
|
||||
|
||||
for file in files:
|
||||
try:
|
||||
# Temporär speichern
|
||||
temp_path = Path("data/temp") / file.filename
|
||||
temp_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
with open(temp_path, 'wb') as f:
|
||||
content = await file.read()
|
||||
f.write(content)
|
||||
|
||||
# Parse Mails
|
||||
parsed_mails = MailParser.parse_file(temp_path)
|
||||
|
||||
# In DB speichern
|
||||
for mail in parsed_mails:
|
||||
mail_id = data_manager.add_mail(
|
||||
subject=mail['subject'],
|
||||
sender=mail['sender'],
|
||||
recipient=mail['recipient'],
|
||||
date=mail['date'],
|
||||
body=mail['body'],
|
||||
original_format=mail['original_format']
|
||||
)
|
||||
|
||||
results['success'].append({
|
||||
'filename': file.filename,
|
||||
'count': len(parsed_mails)
|
||||
})
|
||||
|
||||
# Cleanup
|
||||
temp_path.unlink()
|
||||
|
||||
except Exception as e:
|
||||
results['errors'].append({
|
||||
'filename': file.filename,
|
||||
'error': str(e)
|
||||
})
|
||||
|
||||
return results
|
||||
|
||||
|
||||
@app.get("/api/mails")
|
||||
async def get_mails(status: Optional[str] = None):
|
||||
"""Liste aller Mails"""
|
||||
mails = data_manager.get_all_mails(status_filter=status)
|
||||
return {'mails': mails}
|
||||
|
||||
|
||||
@app.get("/api/mails/{mail_id}")
|
||||
async def get_mail(mail_id: int):
|
||||
"""Einzelne Mail abrufen"""
|
||||
mail = data_manager.get_mail(mail_id)
|
||||
if not mail:
|
||||
raise HTTPException(status_code=404, detail="Mail not found")
|
||||
return mail
|
||||
|
||||
|
||||
@app.put("/api/mails/{mail_id}")
|
||||
async def update_mail(mail_id: int, update: MailUpdate):
|
||||
"""Mail aktualisieren (Labeling)"""
|
||||
success = data_manager.update_mail(
|
||||
mail_id=mail_id,
|
||||
task_type=update.task_type,
|
||||
expected_output=update.expected_output,
|
||||
status=update.status,
|
||||
body=update.body
|
||||
)
|
||||
|
||||
if not success:
|
||||
raise HTTPException(status_code=404, detail="Mail not found")
|
||||
|
||||
return {'success': True}
|
||||
|
||||
|
||||
@app.delete("/api/mails/{mail_id}")
|
||||
async def delete_mail(mail_id: int):
|
||||
"""Mail löschen"""
|
||||
success = data_manager.delete_mail(mail_id)
|
||||
|
||||
if not success:
|
||||
raise HTTPException(status_code=404, detail="Mail not found")
|
||||
|
||||
return {'success': True}
|
||||
|
||||
|
||||
# ===== Export Endpoints =====
|
||||
|
||||
@app.get("/api/export/stats")
|
||||
async def get_stats():
|
||||
"""Statistiken abrufen"""
|
||||
stats = data_manager.get_statistics()
|
||||
return stats
|
||||
|
||||
|
||||
@app.post("/api/export/jsonl")
|
||||
async def export_jsonl(train_split: float = 0.9):
|
||||
"""Exportiert Training-Daten als JSONL"""
|
||||
train_data, val_data = data_manager.export_training_data(train_split)
|
||||
|
||||
if not train_data:
|
||||
raise HTTPException(status_code=400, detail="No labeled data available")
|
||||
|
||||
# Speichere Files
|
||||
data_dir = Path("data")
|
||||
train_file = data_dir / "train.jsonl"
|
||||
val_file = data_dir / "val.jsonl"
|
||||
|
||||
train_file_path, val_file_path = trainer.prepare_training_data(
|
||||
train_data, val_data, data_dir
|
||||
)
|
||||
|
||||
return {
|
||||
'success': True,
|
||||
'train_samples': len(train_data),
|
||||
'val_samples': len(val_data),
|
||||
'train_file': str(train_file),
|
||||
'val_file': str(val_file)
|
||||
}
|
||||
|
||||
|
||||
@app.get("/api/export/download/{file_type}")
|
||||
async def download_file(file_type: str):
|
||||
"""Download JSONL Files"""
|
||||
if file_type not in ['train', 'val']:
|
||||
raise HTTPException(status_code=400, detail="Invalid file type")
|
||||
|
||||
file_path = Path("data") / f"{file_type}.jsonl"
|
||||
|
||||
if not file_path.exists():
|
||||
raise HTTPException(status_code=404, detail="File not found")
|
||||
|
||||
return FileResponse(
|
||||
path=file_path,
|
||||
filename=f"{file_type}.jsonl",
|
||||
media_type='application/json'
|
||||
)
|
||||
|
||||
|
||||
# ===== Model Endpoints =====
|
||||
|
||||
@app.get("/api/models")
|
||||
async def list_models():
|
||||
"""Liste verfügbarer Modelle"""
|
||||
models = trainer.list_available_models()
|
||||
return {'models': models}
|
||||
|
||||
|
||||
@app.post("/api/models/download")
|
||||
async def download_model(model_name: str):
|
||||
"""
|
||||
Lädt ein Modell herunter
|
||||
Placeholder - würde in echter Implementation huggingface nutzen
|
||||
"""
|
||||
success = trainer.download_model(model_name)
|
||||
|
||||
if not success:
|
||||
raise HTTPException(
|
||||
status_code=501,
|
||||
detail="Model download not implemented. Please download manually."
|
||||
)
|
||||
|
||||
return {'success': True}
|
||||
|
||||
|
||||
# ===== Training Endpoints =====
|
||||
|
||||
@app.post("/api/training/start")
|
||||
async def start_training(request: TrainingStartRequest, background_tasks: BackgroundTasks):
|
||||
"""Startet Training"""
|
||||
|
||||
# Hole Training-Daten
|
||||
train_data, val_data = data_manager.export_training_data()
|
||||
|
||||
if not train_data:
|
||||
raise HTTPException(status_code=400, detail="No labeled data available")
|
||||
|
||||
if len(train_data) < 10:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"Not enough training data. Need at least 10, got {len(train_data)}"
|
||||
)
|
||||
|
||||
# Training Config
|
||||
config = TrainingConfig(
|
||||
model_name=request.model_name,
|
||||
learning_rate=request.learning_rate,
|
||||
epochs=request.epochs,
|
||||
batch_size=request.batch_size,
|
||||
lora_rank=request.lora_rank
|
||||
)
|
||||
|
||||
# Starte Training
|
||||
success = trainer.start_training(config, train_data, val_data)
|
||||
|
||||
if not success:
|
||||
raise HTTPException(status_code=400, detail="Training already running")
|
||||
|
||||
return {'success': True, 'message': 'Training started'}
|
||||
|
||||
|
||||
@app.post("/api/training/stop")
|
||||
async def stop_training():
|
||||
"""Stoppt Training"""
|
||||
success = trainer.stop_training()
|
||||
|
||||
if not success:
|
||||
raise HTTPException(status_code=400, detail="No training running")
|
||||
|
||||
return {'success': True, 'message': 'Training stopped'}
|
||||
|
||||
|
||||
@app.get("/api/training/status")
|
||||
async def get_training_status():
|
||||
"""Gibt aktuellen Training-Status zurück"""
|
||||
status = trainer.get_status()
|
||||
return status
|
||||
|
||||
|
||||
@app.get("/api/training/stream")
|
||||
async def stream_training_status():
|
||||
"""
|
||||
Server-Sent Events für Live-Updates
|
||||
"""
|
||||
async def event_generator():
|
||||
while True:
|
||||
status = trainer.get_status()
|
||||
|
||||
# Sende Status als SSE
|
||||
yield f"data: {json.dumps(status)}\n\n"
|
||||
|
||||
# Stop wenn Training fertig
|
||||
if not status['is_training'] and status['current_step'] > 0:
|
||||
break
|
||||
|
||||
await asyncio.sleep(1)
|
||||
|
||||
return StreamingResponse(
|
||||
event_generator(),
|
||||
media_type="text/event-stream"
|
||||
)
|
||||
|
||||
|
||||
# ===== Inference Endpoints =====
|
||||
|
||||
@app.post("/api/inference/load")
|
||||
async def load_model(model_type: str, model_name: str, adapter_path: Optional[str] = None):
|
||||
"""Lädt ein Modell für Inference"""
|
||||
|
||||
if model_type == 'base':
|
||||
success = inference.load_base_model(model_name)
|
||||
elif model_type == 'finetuned':
|
||||
if not adapter_path:
|
||||
raise HTTPException(status_code=400, detail="adapter_path required for finetuned model")
|
||||
success = inference.load_finetuned_model(model_name, adapter_path)
|
||||
else:
|
||||
raise HTTPException(status_code=400, detail="Invalid model_type")
|
||||
|
||||
if not success:
|
||||
raise HTTPException(status_code=400, detail="Failed to load model")
|
||||
|
||||
return {'success': True}
|
||||
|
||||
|
||||
@app.get("/api/inference/loaded")
|
||||
async def get_loaded_models():
|
||||
"""Gibt zurück welche Modelle geladen sind"""
|
||||
loaded = inference.get_loaded_models()
|
||||
return loaded
|
||||
|
||||
|
||||
@app.post("/api/inference/generate")
|
||||
async def generate_text(request: InferenceRequest):
|
||||
"""Generiert Text mit geladenem Modell"""
|
||||
result = inference.generate(
|
||||
prompt=request.prompt,
|
||||
model_type=request.model_type,
|
||||
max_tokens=request.max_tokens,
|
||||
temperature=request.temperature
|
||||
)
|
||||
|
||||
return {'result': result}
|
||||
|
||||
|
||||
@app.post("/api/inference/compare")
|
||||
async def compare_models(request: InferenceComparisonRequest):
|
||||
"""Vergleicht Base und Fine-tuned Model"""
|
||||
|
||||
prompt = inference.format_mail_prompt(
|
||||
request.task_type,
|
||||
request.mail_body
|
||||
)
|
||||
|
||||
result = inference.generate_comparison(
|
||||
prompt=prompt,
|
||||
max_tokens=request.max_tokens,
|
||||
temperature=request.temperature
|
||||
)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
@app.get("/api/inference/test-prompts")
|
||||
async def get_test_prompts():
|
||||
"""Gibt vordefinierte Test-Prompts zurück"""
|
||||
prompts = inference.get_test_prompts()
|
||||
return prompts
|
||||
|
||||
|
||||
# ===== Static Files =====
|
||||
|
||||
# Serve Frontend
|
||||
app.mount("/", StaticFiles(directory="frontend", html=True), name="frontend")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
uvicorn.run(app, host="0.0.0.0", port=8000)
|
||||
@@ -0,0 +1,321 @@
|
||||
"""
|
||||
MLX Training Wrapper für Fine-Tuning
|
||||
Nutzt mlx-lm für LoRA Fine-Tuning
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
import psutil
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Callable, Optional
|
||||
from dataclasses import dataclass
|
||||
import threading
|
||||
import queue
|
||||
|
||||
|
||||
@dataclass
|
||||
class TrainingConfig:
|
||||
"""Training Konfiguration"""
|
||||
model_name: str
|
||||
learning_rate: float = 1e-5
|
||||
epochs: int = 3
|
||||
batch_size: int = 4
|
||||
lora_rank: int = 8
|
||||
lora_alpha: int = 16
|
||||
max_seq_length: int = 2048
|
||||
val_every: int = 50
|
||||
|
||||
|
||||
class TrainingStatus:
|
||||
"""Verwaltet den aktuellen Training-Status"""
|
||||
|
||||
def __init__(self):
|
||||
self.is_training = False
|
||||
self.should_stop = False
|
||||
self.current_step = 0
|
||||
self.total_steps = 0
|
||||
self.current_epoch = 0
|
||||
self.train_loss = 0.0
|
||||
self.val_loss = 0.0
|
||||
self.train_loss_history = []
|
||||
self.val_loss_history = []
|
||||
self.start_time = None
|
||||
self.error = None
|
||||
|
||||
def reset(self):
|
||||
"""Setzt den Status zurück"""
|
||||
self.is_training = False
|
||||
self.should_stop = False
|
||||
self.current_step = 0
|
||||
self.total_steps = 0
|
||||
self.current_epoch = 0
|
||||
self.train_loss = 0.0
|
||||
self.val_loss = 0.0
|
||||
self.train_loss_history = []
|
||||
self.val_loss_history = []
|
||||
self.start_time = None
|
||||
self.error = None
|
||||
|
||||
def to_dict(self) -> Dict:
|
||||
"""Konvertiert zu Dictionary für API"""
|
||||
eta = None
|
||||
if self.is_training and self.current_step > 0 and self.start_time:
|
||||
elapsed = time.time() - self.start_time
|
||||
steps_remaining = self.total_steps - self.current_step
|
||||
eta = int((elapsed / self.current_step) * steps_remaining)
|
||||
|
||||
memory_usage = psutil.virtual_memory().percent
|
||||
|
||||
return {
|
||||
'is_training': self.is_training,
|
||||
'current_step': self.current_step,
|
||||
'total_steps': self.total_steps,
|
||||
'current_epoch': self.current_epoch,
|
||||
'train_loss': round(self.train_loss, 4) if self.train_loss else None,
|
||||
'val_loss': round(self.val_loss, 4) if self.val_loss else None,
|
||||
'train_loss_history': [round(l, 4) for l in self.train_loss_history],
|
||||
'val_loss_history': [round(l, 4) for l in self.val_loss_history],
|
||||
'eta_seconds': eta,
|
||||
'memory_usage_percent': memory_usage,
|
||||
'error': self.error
|
||||
}
|
||||
|
||||
|
||||
class MLXTrainer:
|
||||
"""Wrapper für MLX Training"""
|
||||
|
||||
def __init__(self, models_dir: str = "models", output_dir: str = "output"):
|
||||
self.models_dir = Path(models_dir)
|
||||
self.output_dir = Path(output_dir)
|
||||
self.models_dir.mkdir(exist_ok=True)
|
||||
self.output_dir.mkdir(exist_ok=True)
|
||||
|
||||
self.status = TrainingStatus()
|
||||
self.training_thread = None
|
||||
|
||||
def prepare_training_data(self, train_data: List[Dict],
|
||||
val_data: List[Dict],
|
||||
data_dir: Path) -> tuple[Path, Path]:
|
||||
"""Konvertiert Daten ins MLX Format (JSONL)"""
|
||||
|
||||
def format_example(item: Dict) -> Dict:
|
||||
"""Formatiert ein Beispiel im Chat-Format"""
|
||||
task_type = item['task_type']
|
||||
body = item['body']
|
||||
output = item['expected_output']
|
||||
|
||||
# Task-spezifische Prompts
|
||||
task_prompts = {
|
||||
'Zusammenfassen': 'Fasse folgende E-Mail zusammen:',
|
||||
'Antwort schreiben': 'Schreibe eine Antwort auf folgende E-Mail:',
|
||||
'Kategorisieren': 'Kategorisiere folgende E-Mail:',
|
||||
'Action Items': 'Extrahiere die Action Items aus folgender E-Mail:',
|
||||
'Custom': 'Bearbeite folgende E-Mail:'
|
||||
}
|
||||
|
||||
instruction = task_prompts.get(task_type, task_prompts['Custom'])
|
||||
|
||||
return {
|
||||
'messages': [
|
||||
{
|
||||
'role': 'user',
|
||||
'content': f"{instruction}\n\n{body}"
|
||||
},
|
||||
{
|
||||
'role': 'assistant',
|
||||
'content': output
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
train_file = data_dir / 'train.jsonl'
|
||||
val_file = data_dir / 'val.jsonl'
|
||||
|
||||
# Schreibe Training Data
|
||||
with open(train_file, 'w', encoding='utf-8') as f:
|
||||
for item in train_data:
|
||||
f.write(json.dumps(format_example(item), ensure_ascii=False) + '\n')
|
||||
|
||||
# Schreibe Validation Data
|
||||
with open(val_file, 'w', encoding='utf-8') as f:
|
||||
for item in val_data:
|
||||
f.write(json.dumps(format_example(item), ensure_ascii=False) + '\n')
|
||||
|
||||
return train_file, val_file
|
||||
|
||||
def _run_training(self, config: TrainingConfig,
|
||||
train_file: Path, val_file: Path,
|
||||
output_path: Path):
|
||||
"""Führt das Training aus (läuft in eigenem Thread)"""
|
||||
try:
|
||||
# Import hier um MLX nur bei Bedarf zu laden
|
||||
from mlx_lm import load, LoRALinear
|
||||
from mlx_lm.tuner import train as mlx_train
|
||||
import mlx.core as mx
|
||||
import mlx.nn as nn
|
||||
import mlx.optimizers as optim
|
||||
|
||||
self.status.is_training = True
|
||||
self.status.start_time = time.time()
|
||||
self.status.error = None
|
||||
|
||||
# Lade Modell
|
||||
model_path = self.models_dir / config.model_name
|
||||
if not model_path.exists():
|
||||
raise FileNotFoundError(f"Model not found: {model_path}")
|
||||
|
||||
# Training durchführen mit mlx-lm
|
||||
# Dies ist ein vereinfachtes Beispiel - mlx-lm hat eigene Trainer
|
||||
# In der Praxis würde man mlx_lm.tuner verwenden
|
||||
|
||||
# Lade Training Config
|
||||
train_config = {
|
||||
'model': str(model_path),
|
||||
'data': str(train_file),
|
||||
'val_data': str(val_file),
|
||||
'train': True,
|
||||
'iters': config.epochs * 100, # Approximation
|
||||
'val_batches': 10,
|
||||
'learning_rate': config.learning_rate,
|
||||
'batch_size': config.batch_size,
|
||||
'lora_layers': config.lora_rank,
|
||||
'adapter_file': str(output_path / 'adapters.npz'),
|
||||
'save_every': 50,
|
||||
'val_every': config.val_every,
|
||||
}
|
||||
|
||||
# Callback für Progress-Updates
|
||||
def training_callback(step: int, loss: float, val_loss: Optional[float] = None):
|
||||
if self.status.should_stop:
|
||||
return False # Stop training
|
||||
|
||||
self.status.current_step = step
|
||||
self.status.train_loss = loss
|
||||
self.status.train_loss_history.append(loss)
|
||||
|
||||
if val_loss is not None:
|
||||
self.status.val_loss = val_loss
|
||||
self.status.val_loss_history.append(val_loss)
|
||||
|
||||
return True
|
||||
|
||||
# Hinweis: Dies ist ein Platzhalter für echtes MLX Training
|
||||
# In der Praxis würde man mlx_lm.tuner.train() oder eine
|
||||
# eigene Training Loop mit mlx nutzen
|
||||
|
||||
# Simuliere Training für Demo (MUSS durch echtes MLX Training ersetzt werden)
|
||||
total_steps = config.epochs * (len(list(open(train_file))) // config.batch_size)
|
||||
self.status.total_steps = total_steps
|
||||
|
||||
for epoch in range(config.epochs):
|
||||
self.status.current_epoch = epoch + 1
|
||||
|
||||
for step in range(total_steps // config.epochs):
|
||||
if self.status.should_stop:
|
||||
break
|
||||
|
||||
# Simuliere Training Step
|
||||
self.status.current_step = epoch * (total_steps // config.epochs) + step
|
||||
fake_loss = 2.0 - (self.status.current_step / total_steps) * 1.5
|
||||
self.status.train_loss = fake_loss
|
||||
self.status.train_loss_history.append(fake_loss)
|
||||
|
||||
# Validation alle N Steps
|
||||
if step % config.val_every == 0:
|
||||
fake_val_loss = 2.2 - (self.status.current_step / total_steps) * 1.4
|
||||
self.status.val_loss = fake_val_loss
|
||||
self.status.val_loss_history.append(fake_val_loss)
|
||||
|
||||
time.sleep(0.1) # Simuliere Rechenzeit
|
||||
|
||||
if self.status.should_stop:
|
||||
break
|
||||
|
||||
# Speichere finale Adapter
|
||||
# output_path / 'adapters.npz' würde die LoRA Weights enthalten
|
||||
|
||||
self.status.is_training = False
|
||||
|
||||
except Exception as e:
|
||||
self.status.error = str(e)
|
||||
self.status.is_training = False
|
||||
|
||||
def start_training(self, config: TrainingConfig,
|
||||
train_data: List[Dict],
|
||||
val_data: List[Dict]) -> bool:
|
||||
"""Startet das Training"""
|
||||
|
||||
if self.status.is_training:
|
||||
return False
|
||||
|
||||
# Bereite Daten vor
|
||||
data_dir = self.output_dir / f"training_{int(time.time())}"
|
||||
data_dir.mkdir(exist_ok=True)
|
||||
|
||||
train_file, val_file = self.prepare_training_data(
|
||||
train_data, val_data, data_dir
|
||||
)
|
||||
|
||||
# Output-Pfad
|
||||
output_path = self.output_dir / f"run_{int(time.time())}"
|
||||
output_path.mkdir(exist_ok=True)
|
||||
|
||||
# Reset Status
|
||||
self.status.reset()
|
||||
|
||||
# Starte Training in eigenem Thread
|
||||
self.training_thread = threading.Thread(
|
||||
target=self._run_training,
|
||||
args=(config, train_file, val_file, output_path),
|
||||
daemon=True
|
||||
)
|
||||
self.training_thread.start()
|
||||
|
||||
return True
|
||||
|
||||
def stop_training(self) -> bool:
|
||||
"""Stoppt das laufende Training"""
|
||||
if not self.status.is_training:
|
||||
return False
|
||||
|
||||
self.status.should_stop = True
|
||||
|
||||
# Warte max 5 Sekunden auf Thread
|
||||
if self.training_thread:
|
||||
self.training_thread.join(timeout=5)
|
||||
|
||||
return True
|
||||
|
||||
def get_status(self) -> Dict:
|
||||
"""Gibt aktuellen Status zurück"""
|
||||
return self.status.to_dict()
|
||||
|
||||
def list_available_models(self) -> List[str]:
|
||||
"""Listet verfügbare Modelle auf"""
|
||||
if not self.models_dir.exists():
|
||||
return []
|
||||
|
||||
models = []
|
||||
for path in self.models_dir.iterdir():
|
||||
if path.is_dir():
|
||||
models.append(path.name)
|
||||
|
||||
return models
|
||||
|
||||
def download_model(self, model_name: str) -> bool:
|
||||
"""
|
||||
Lädt ein Modell herunter
|
||||
In der Praxis würde man hier huggingface_hub nutzen
|
||||
"""
|
||||
# Placeholder - würde huggingface_hub.snapshot_download nutzen
|
||||
# und dann mit mlx_lm.convert konvertieren
|
||||
|
||||
# Beispiel:
|
||||
# from huggingface_hub import snapshot_download
|
||||
# from mlx_lm.convert import convert
|
||||
#
|
||||
# hf_path = snapshot_download(model_name)
|
||||
# mlx_path = self.models_dir / model_name
|
||||
# convert(hf_path, mlx_path)
|
||||
|
||||
return False # Nicht implementiert in diesem Beispiel
|
||||
@@ -0,0 +1,87 @@
|
||||
# Beispiel-Mails für Training
|
||||
|
||||
Diese Beispiel-Mails können zum Testen des Mail-Imports verwendet werden.
|
||||
|
||||
## Enthaltene Beispiele
|
||||
|
||||
1. **test1.txt** - Projekt-Update
|
||||
- Typ: Status-Update
|
||||
- Empfohlen für: "Zusammenfassen"
|
||||
|
||||
2. **test2.txt** - Kundenanfrage
|
||||
- Typ: Support-Anfrage
|
||||
- Empfohlen für: "Antwort schreiben"
|
||||
|
||||
3. **test3.txt** - Meeting Notes
|
||||
- Typ: Meeting-Protokoll
|
||||
- Empfohlen für: "Action Items"
|
||||
|
||||
4. **test4.txt** - Out of Office
|
||||
- Typ: Automatische Antwort
|
||||
- Empfohlen für: "Kategorisieren" (als "Automatisch" oder "Skip")
|
||||
|
||||
## Verwendung
|
||||
|
||||
1. Wähle eine oder mehrere Dateien aus
|
||||
2. Ziehe sie per Drag & Drop in die App
|
||||
3. Die Mails werden automatisch geparst und bereinigt
|
||||
4. Gehe zum Labeling und füge die erwarteten Outputs hinzu
|
||||
|
||||
## Beispiel-Labels
|
||||
|
||||
### test1.txt (Zusammenfassen)
|
||||
```
|
||||
Alex berichtet über erfolgreichen Abschluss der API-Integration mit 40% Performance-Verbesserung.
|
||||
Nächste Woche starten Frontend-Anpassungen durch Maria und Tom.
|
||||
Go-Live bleibt für Ende März geplant.
|
||||
```
|
||||
|
||||
### test2.txt (Antwort schreiben)
|
||||
```
|
||||
Sehr geehrter Herr Schmidt,
|
||||
|
||||
vielen Dank für Ihre Anfrage zu Rechnung #2847.
|
||||
|
||||
Sie haben recht - hier ist uns ein Fehler unterlaufen. Der korrekte Betrag
|
||||
laut Angebot beträgt 1.250€. Wir werden die Rechnung korrigieren und Ihnen
|
||||
die berichtigte Version bis morgen zusenden.
|
||||
|
||||
Wir entschuldigen uns für die Unannehmlichkeiten.
|
||||
|
||||
Mit freundlichen Grüßen
|
||||
Support-Team
|
||||
```
|
||||
|
||||
### test3.txt (Action Items)
|
||||
```
|
||||
- Sarah: Pressemitteilung vorbereiten (Deadline: Freitag)
|
||||
- Marketing-Team: Social Media Content erstellen (nächste Woche)
|
||||
- Lisa: Influencer-Kontakte aufnehmen
|
||||
- Design-Team: Finale Produktfotos liefern
|
||||
- Location für Launch-Event buchen (1. April)
|
||||
- Website-Landing-Page live schalten (bis Mittwoch)
|
||||
- Feedback an Lisa bis Mittwoch
|
||||
```
|
||||
|
||||
### test4.txt (Kategorisieren)
|
||||
```
|
||||
Kategorie: Automatische Antwort / Out of Office
|
||||
Status: Abwesenheit vom 18.03.-25.03.2024
|
||||
Vertretung: sarah.koch@company.com (Vertrieb), support@company.com (Support)
|
||||
```
|
||||
|
||||
## Eigene Mails hinzufügen
|
||||
|
||||
Du kannst auch eigene .txt Dateien erstellen. Format:
|
||||
|
||||
```
|
||||
Subject: Dein Betreff
|
||||
From: absender@example.com
|
||||
To: empfaenger@example.com
|
||||
Date: 2024-03-15
|
||||
|
||||
Hier kommt der Mail-Text...
|
||||
```
|
||||
|
||||
Die ersten Zeilen mit Subject:/From:/To:/Date: sind optional.
|
||||
Wenn sie fehlen, wird der gesamte Text als Mail-Body interpretiert.
|
||||
@@ -0,0 +1,19 @@
|
||||
Subject: Q4 Projektupdate
|
||||
From: alex@example.com
|
||||
To: team@example.com
|
||||
Date: 2024-03-15
|
||||
|
||||
Hallo Team,
|
||||
|
||||
ich wollte euch ein kurzes Update zum aktuellen Projektstand geben.
|
||||
|
||||
Wir haben letzte Woche die neue API-Integration abgeschlossen und erfolgreich getestet.
|
||||
Die Performance-Tests zeigen eine Verbesserung von 40% gegenüber der alten Implementierung.
|
||||
|
||||
Nächste Woche starten wir mit der Frontend-Anpassung. Maria und Tom werden das Design
|
||||
überarbeiten, während ich mich um die Backend-Anbindung kümmere.
|
||||
|
||||
Der Go-Live ist weiterhin für Ende des Monats geplant.
|
||||
|
||||
Beste Grüße
|
||||
Alex
|
||||
@@ -0,0 +1,16 @@
|
||||
Subject: Frage zu Invoice #2847
|
||||
From: michael.schmidt@example.com
|
||||
To: support@company.de
|
||||
Date: 2024-03-16
|
||||
|
||||
Hallo,
|
||||
|
||||
ich habe eine Frage zur Rechnung #2847 vom 15. März.
|
||||
Der Betrag scheint nicht mit unserem ursprünglichen Angebot übereinzustimmen.
|
||||
|
||||
Laut Angebot sollten es 1.250€ sein, auf der Rechnung stehen aber 1.450€.
|
||||
|
||||
Könnten Sie das bitte prüfen und mir Bescheid geben?
|
||||
|
||||
Vielen Dank
|
||||
Michael Schmidt
|
||||
@@ -0,0 +1,22 @@
|
||||
Subject: Meeting Notes - Produktlaunch Vorbereitung
|
||||
From: lisa.mueller@startup.io
|
||||
To: team@startup.io
|
||||
Date: 2024-03-17
|
||||
|
||||
Hi alle,
|
||||
|
||||
hier die wichtigsten Punkte vom heutigen Meeting zum Produktlaunch:
|
||||
|
||||
1. Sarah bereitet die Pressemitteilung vor (Deadline: Freitag)
|
||||
2. Marketing-Team erstellt Social Media Content für nächste Woche
|
||||
3. Ich kümmere mich um die Influencer-Kontakte
|
||||
4. Wir brauchen noch finale Produktfotos vom Design-Team
|
||||
5. Launch-Event ist am 1. April - Location muss noch gebucht werden
|
||||
6. Website-Landing-Page muss bis Mittwoch live gehen
|
||||
|
||||
Bitte gebt bis Mittwoch Bescheid ob ihr eure Aufgaben schaffen könnt.
|
||||
Bei Problemen sofort melden!
|
||||
|
||||
Danke an alle für die tolle Zusammenarbeit!
|
||||
|
||||
Lisa
|
||||
@@ -0,0 +1,24 @@
|
||||
Subject: Automatische Antwort: Out of Office
|
||||
From: thomas.weber@company.com
|
||||
To: request@company.com
|
||||
Date: 2024-03-18
|
||||
|
||||
Guten Tag,
|
||||
|
||||
vielen Dank für Ihre E-Mail.
|
||||
|
||||
Ich bin vom 18.03. bis 25.03.2024 nicht im Büro und habe keinen Zugriff auf meine E-Mails.
|
||||
|
||||
In dringenden Fällen wenden Sie sich bitte an:
|
||||
- Vertrieb: sarah.koch@company.com
|
||||
- Support: support@company.com
|
||||
- Allgemeine Anfragen: info@company.com
|
||||
|
||||
Ich werde Ihre E-Mail nach meiner Rückkehr bearbeiten.
|
||||
|
||||
Mit freundlichen Grüßen
|
||||
Thomas Weber
|
||||
|
||||
--
|
||||
Diese E-Mail wurde automatisch generiert.
|
||||
Bitte antworten Sie nicht direkt auf diese Nachricht.
|
||||
@@ -0,0 +1,756 @@
|
||||
// Mail Fine-Tuning App - Frontend Logic
|
||||
|
||||
const API_BASE = '';
|
||||
|
||||
// State
|
||||
let currentMails = [];
|
||||
let currentLabelingIndex = 0;
|
||||
let stats = {};
|
||||
let trainingEventSource = null;
|
||||
|
||||
// ======================
|
||||
// Utility Functions
|
||||
// ======================
|
||||
|
||||
function showToast(message, type = 'info') {
|
||||
const container = document.getElementById('toast-container');
|
||||
const toast = document.createElement('div');
|
||||
toast.className = `toast ${type}`;
|
||||
toast.textContent = message;
|
||||
container.appendChild(toast);
|
||||
|
||||
setTimeout(() => {
|
||||
toast.remove();
|
||||
}, 4000);
|
||||
}
|
||||
|
||||
async function apiCall(endpoint, options = {}) {
|
||||
try {
|
||||
const response = await fetch(API_BASE + endpoint, {
|
||||
...options,
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
...options.headers
|
||||
}
|
||||
});
|
||||
|
||||
if (!response.ok) {
|
||||
const error = await response.json();
|
||||
throw new Error(error.detail || 'API Error');
|
||||
}
|
||||
|
||||
return await response.json();
|
||||
} catch (error) {
|
||||
showToast(error.message, 'error');
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
// ======================
|
||||
// Navigation
|
||||
// ======================
|
||||
|
||||
function initNavigation() {
|
||||
const navLinks = document.querySelectorAll('.nav-link');
|
||||
const views = document.querySelectorAll('.view');
|
||||
|
||||
navLinks.forEach(link => {
|
||||
link.addEventListener('click', (e) => {
|
||||
e.preventDefault();
|
||||
|
||||
const targetView = link.dataset.view;
|
||||
|
||||
// Update active states
|
||||
navLinks.forEach(l => l.classList.remove('active'));
|
||||
link.classList.add('active');
|
||||
|
||||
views.forEach(v => v.classList.remove('active'));
|
||||
document.getElementById(`${targetView}-view`).classList.add('active');
|
||||
|
||||
// Load data for view
|
||||
if (targetView === 'labeling') {
|
||||
loadLabelingView();
|
||||
} else if (targetView === 'export') {
|
||||
loadStats();
|
||||
} else if (targetView === 'models') {
|
||||
loadModels();
|
||||
} else if (targetView === 'training') {
|
||||
loadTrainingView();
|
||||
}
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
// ======================
|
||||
// Mail Import
|
||||
// ======================
|
||||
|
||||
function initImport() {
|
||||
const dropzone = document.getElementById('dropzone');
|
||||
const fileInput = document.getElementById('file-input');
|
||||
|
||||
dropzone.addEventListener('click', () => fileInput.click());
|
||||
|
||||
dropzone.addEventListener('dragover', (e) => {
|
||||
e.preventDefault();
|
||||
dropzone.classList.add('dragover');
|
||||
});
|
||||
|
||||
dropzone.addEventListener('dragleave', () => {
|
||||
dropzone.classList.remove('dragover');
|
||||
});
|
||||
|
||||
dropzone.addEventListener('drop', (e) => {
|
||||
e.preventDefault();
|
||||
dropzone.classList.remove('dragover');
|
||||
handleFiles(e.dataTransfer.files);
|
||||
});
|
||||
|
||||
fileInput.addEventListener('change', (e) => {
|
||||
handleFiles(e.target.files);
|
||||
});
|
||||
|
||||
document.getElementById('refresh-mails').addEventListener('click', loadMails);
|
||||
|
||||
// Initial load
|
||||
loadMails();
|
||||
}
|
||||
|
||||
async function handleFiles(files) {
|
||||
const formData = new FormData();
|
||||
|
||||
for (let file of files) {
|
||||
formData.append('files', file);
|
||||
}
|
||||
|
||||
try {
|
||||
const response = await fetch(API_BASE + '/api/mails/upload', {
|
||||
method: 'POST',
|
||||
body: formData
|
||||
});
|
||||
|
||||
const result = await response.json();
|
||||
|
||||
const successCount = result.success.reduce((sum, r) => sum + r.count, 0);
|
||||
showToast(`${successCount} Mails erfolgreich importiert`, 'success');
|
||||
|
||||
if (result.errors.length > 0) {
|
||||
showToast(`${result.errors.length} Fehler beim Import`, 'error');
|
||||
}
|
||||
|
||||
loadMails();
|
||||
|
||||
} catch (error) {
|
||||
showToast('Fehler beim Upload', 'error');
|
||||
}
|
||||
}
|
||||
|
||||
async function loadMails() {
|
||||
try {
|
||||
const data = await apiCall('/api/mails');
|
||||
currentMails = data.mails;
|
||||
|
||||
document.getElementById('mail-count').textContent = currentMails.length;
|
||||
|
||||
renderMailList(currentMails);
|
||||
} catch (error) {
|
||||
console.error('Error loading mails:', error);
|
||||
}
|
||||
}
|
||||
|
||||
function renderMailList(mails) {
|
||||
const container = document.getElementById('mail-list');
|
||||
|
||||
if (mails.length === 0) {
|
||||
container.innerHTML = '<p style="text-align:center; padding: 2rem;">Keine Mails vorhanden</p>';
|
||||
return;
|
||||
}
|
||||
|
||||
container.innerHTML = mails.map(mail => `
|
||||
<div class="mail-item ${mail.status}">
|
||||
<div class="mail-header">
|
||||
<div class="mail-subject">${escapeHtml(mail.subject)}</div>
|
||||
<div class="mail-meta">${mail.status}</div>
|
||||
</div>
|
||||
<div class="mail-meta">Von: ${escapeHtml(mail.sender)}</div>
|
||||
<div class="mail-body">${escapeHtml(mail.body)}</div>
|
||||
<div class="mail-actions">
|
||||
<button class="btn btn-secondary" onclick="viewMail(${mail.id})">👁️ Ansehen</button>
|
||||
<button class="btn btn-danger" onclick="deleteMail(${mail.id})">🗑️ Löschen</button>
|
||||
</div>
|
||||
</div>
|
||||
`).join('');
|
||||
}
|
||||
|
||||
function escapeHtml(text) {
|
||||
const div = document.createElement('div');
|
||||
div.textContent = text;
|
||||
return div.innerHTML;
|
||||
}
|
||||
|
||||
async function deleteMail(id) {
|
||||
if (!confirm('Mail wirklich löschen?')) return;
|
||||
|
||||
try {
|
||||
await apiCall(`/api/mails/${id}`, { method: 'DELETE' });
|
||||
showToast('Mail gelöscht', 'success');
|
||||
loadMails();
|
||||
} catch (error) {
|
||||
console.error('Error deleting mail:', error);
|
||||
}
|
||||
}
|
||||
|
||||
function viewMail(id) {
|
||||
const mail = currentMails.find(m => m.id === id);
|
||||
if (!mail) return;
|
||||
|
||||
alert(`Betreff: ${mail.subject}\n\nVon: ${mail.sender}\n\n${mail.body}`);
|
||||
}
|
||||
|
||||
// ======================
|
||||
// Labeling
|
||||
// ======================
|
||||
|
||||
function initLabeling() {
|
||||
const statusFilter = document.getElementById('status-filter');
|
||||
statusFilter.addEventListener('change', loadLabelingView);
|
||||
|
||||
// Keyboard shortcuts
|
||||
document.addEventListener('keydown', (e) => {
|
||||
const activeView = document.querySelector('.view.active');
|
||||
if (activeView.id !== 'labeling-view') return;
|
||||
|
||||
if (e.key.toLowerCase() === 'n') {
|
||||
nextMail();
|
||||
} else if (e.key.toLowerCase() === 's') {
|
||||
saveLabelingMail();
|
||||
} else if (e.key.toLowerCase() === 'k') {
|
||||
skipMail();
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
async function loadLabelingView() {
|
||||
const statusFilter = document.getElementById('status-filter').value;
|
||||
|
||||
try {
|
||||
const data = await apiCall(`/api/mails?status=${statusFilter || ''}`);
|
||||
currentMails = data.mails;
|
||||
currentLabelingIndex = 0;
|
||||
|
||||
updateLabelingProgress();
|
||||
renderCurrentMail();
|
||||
} catch (error) {
|
||||
console.error('Error loading labeling view:', error);
|
||||
}
|
||||
}
|
||||
|
||||
function updateLabelingProgress() {
|
||||
const labeled = currentMails.filter(m => m.status === 'labeled').length;
|
||||
const total = currentMails.length;
|
||||
|
||||
const percent = total > 0 ? (labeled / total) * 100 : 0;
|
||||
|
||||
document.getElementById('labeling-progress').style.width = `${percent}%`;
|
||||
document.getElementById('progress-text').textContent = `${labeled} / ${total} gelabelt`;
|
||||
}
|
||||
|
||||
function renderCurrentMail() {
|
||||
const container = document.getElementById('labeling-container');
|
||||
|
||||
if (currentMails.length === 0) {
|
||||
container.innerHTML = '<p>Keine Mails zum Labeln vorhanden</p>';
|
||||
return;
|
||||
}
|
||||
|
||||
const mail = currentMails[currentLabelingIndex];
|
||||
|
||||
container.innerHTML = `
|
||||
<div class="current-mail">
|
||||
<h4>${escapeHtml(mail.subject)}</h4>
|
||||
<p><strong>Von:</strong> ${escapeHtml(mail.sender)}</p>
|
||||
<p><strong>An:</strong> ${escapeHtml(mail.recipient)}</p>
|
||||
<hr style="margin: 1rem 0; border-color: var(--border-color)">
|
||||
<div style="white-space: pre-wrap;">${escapeHtml(mail.body)}</div>
|
||||
</div>
|
||||
|
||||
<form id="labeling-form">
|
||||
<div class="form-group">
|
||||
<label>Aufgabentyp:</label>
|
||||
<select id="task-type" required>
|
||||
<option value="">-- Wählen --</option>
|
||||
<option value="Zusammenfassen" ${mail.task_type === 'Zusammenfassen' ? 'selected' : ''}>Zusammenfassen</option>
|
||||
<option value="Antwort schreiben" ${mail.task_type === 'Antwort schreiben' ? 'selected' : ''}>Antwort schreiben</option>
|
||||
<option value="Kategorisieren" ${mail.task_type === 'Kategorisieren' ? 'selected' : ''}>Kategorisieren</option>
|
||||
<option value="Action Items" ${mail.task_type === 'Action Items' ? 'selected' : ''}>Action Items</option>
|
||||
<option value="Custom" ${mail.task_type === 'Custom' ? 'selected' : ''}>Custom</option>
|
||||
</select>
|
||||
</div>
|
||||
|
||||
<div class="form-group">
|
||||
<label>Erwarteter Output:</label>
|
||||
<textarea id="expected-output" rows="6" required>${mail.expected_output || ''}</textarea>
|
||||
</div>
|
||||
|
||||
<div class="form-actions">
|
||||
<button type="button" class="btn btn-primary" onclick="saveLabelingMail()">💾 Speichern (S)</button>
|
||||
<button type="button" class="btn btn-secondary" onclick="skipMail()">⏭️ Überspringen (K)</button>
|
||||
<button type="button" class="btn btn-secondary" onclick="nextMail()">➡️ Nächste (N)</button>
|
||||
<span style="margin-left: auto; color: var(--text-secondary);">
|
||||
${currentLabelingIndex + 1} / ${currentMails.length}
|
||||
</span>
|
||||
</div>
|
||||
</form>
|
||||
`;
|
||||
}
|
||||
|
||||
async function saveLabelingMail() {
|
||||
const mail = currentMails[currentLabelingIndex];
|
||||
const taskType = document.getElementById('task-type').value;
|
||||
const expectedOutput = document.getElementById('expected-output').value;
|
||||
|
||||
if (!taskType || !expectedOutput) {
|
||||
showToast('Bitte alle Felder ausfüllen', 'warning');
|
||||
return;
|
||||
}
|
||||
|
||||
try {
|
||||
await apiCall(`/api/mails/${mail.id}`, {
|
||||
method: 'PUT',
|
||||
body: JSON.stringify({
|
||||
task_type: taskType,
|
||||
expected_output: expectedOutput,
|
||||
status: 'labeled'
|
||||
})
|
||||
});
|
||||
|
||||
showToast('Gespeichert', 'success');
|
||||
mail.status = 'labeled';
|
||||
updateLabelingProgress();
|
||||
nextMail();
|
||||
} catch (error) {
|
||||
console.error('Error saving mail:', error);
|
||||
}
|
||||
}
|
||||
|
||||
async function skipMail() {
|
||||
const mail = currentMails[currentLabelingIndex];
|
||||
|
||||
try {
|
||||
await apiCall(`/api/mails/${mail.id}`, {
|
||||
method: 'PUT',
|
||||
body: JSON.stringify({
|
||||
status: 'skip'
|
||||
})
|
||||
});
|
||||
|
||||
mail.status = 'skip';
|
||||
updateLabelingProgress();
|
||||
nextMail();
|
||||
} catch (error) {
|
||||
console.error('Error skipping mail:', error);
|
||||
}
|
||||
}
|
||||
|
||||
function nextMail() {
|
||||
if (currentLabelingIndex < currentMails.length - 1) {
|
||||
currentLabelingIndex++;
|
||||
} else {
|
||||
currentLabelingIndex = 0;
|
||||
}
|
||||
renderCurrentMail();
|
||||
}
|
||||
|
||||
// ======================
|
||||
// Export & Stats
|
||||
// ======================
|
||||
|
||||
function initExport() {
|
||||
document.getElementById('export-jsonl').addEventListener('click', exportJSONL);
|
||||
}
|
||||
|
||||
async function loadStats() {
|
||||
try {
|
||||
stats = await apiCall('/api/export/stats');
|
||||
renderStats();
|
||||
} catch (error) {
|
||||
console.error('Error loading stats:', error);
|
||||
}
|
||||
}
|
||||
|
||||
function renderStats() {
|
||||
const container = document.getElementById('stats-grid');
|
||||
|
||||
container.innerHTML = `
|
||||
<div class="stat-card">
|
||||
<div class="stat-value">${stats.total || 0}</div>
|
||||
<div class="stat-label">Gesamt Mails</div>
|
||||
</div>
|
||||
<div class="stat-card">
|
||||
<div class="stat-value">${stats.labeled || 0}</div>
|
||||
<div class="stat-label">Gelabelt</div>
|
||||
</div>
|
||||
<div class="stat-card">
|
||||
<div class="stat-value">${stats.unlabeled || 0}</div>
|
||||
<div class="stat-label">Unlabeled</div>
|
||||
</div>
|
||||
<div class="stat-card">
|
||||
<div class="stat-value">${stats.avg_input_length || 0}</div>
|
||||
<div class="stat-label">Avg Input Length</div>
|
||||
</div>
|
||||
<div class="stat-card">
|
||||
<div class="stat-value">${stats.avg_output_length || 0}</div>
|
||||
<div class="stat-label">Avg Output Length</div>
|
||||
</div>
|
||||
<div class="stat-card">
|
||||
<div class="stat-value">${stats.sufficient_data ? '✅' : '❌'}</div>
|
||||
<div class="stat-label">Genug Daten (>50)</div>
|
||||
</div>
|
||||
`;
|
||||
}
|
||||
|
||||
async function exportJSONL() {
|
||||
const trainSplit = document.getElementById('train-split').value / 100;
|
||||
|
||||
try {
|
||||
const result = await apiCall('/api/export/jsonl', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({ train_split: trainSplit })
|
||||
});
|
||||
|
||||
const resultDiv = document.getElementById('export-result');
|
||||
resultDiv.innerHTML = `
|
||||
<p>✅ Export erfolgreich!</p>
|
||||
<p>Training Samples: ${result.train_samples}</p>
|
||||
<p>Validation Samples: ${result.val_samples}</p>
|
||||
<p>
|
||||
<a href="/api/export/download/train" class="btn btn-primary" download>📥 train.jsonl</a>
|
||||
<a href="/api/export/download/val" class="btn btn-primary" download>📥 val.jsonl</a>
|
||||
</p>
|
||||
`;
|
||||
resultDiv.classList.add('show');
|
||||
|
||||
showToast('JSONL Dateien generiert', 'success');
|
||||
} catch (error) {
|
||||
console.error('Error exporting JSONL:', error);
|
||||
}
|
||||
}
|
||||
|
||||
// ======================
|
||||
// Models
|
||||
// ======================
|
||||
|
||||
async function loadModels() {
|
||||
try {
|
||||
const data = await apiCall('/api/models');
|
||||
renderModels(data.models);
|
||||
} catch (error) {
|
||||
console.error('Error loading models:', error);
|
||||
}
|
||||
}
|
||||
|
||||
function renderModels(models) {
|
||||
const container = document.getElementById('models-list');
|
||||
|
||||
if (models.length === 0) {
|
||||
container.innerHTML = '<p>Keine Modelle vorhanden</p>';
|
||||
return;
|
||||
}
|
||||
|
||||
container.innerHTML = models.map(model => `
|
||||
<div class="model-item">
|
||||
<span>📦 ${model}</span>
|
||||
<span style="color: var(--accent-success);">✓ Verfügbar</span>
|
||||
</div>
|
||||
`).join('');
|
||||
}
|
||||
|
||||
// ======================
|
||||
// Training
|
||||
// ======================
|
||||
|
||||
function initTraining() {
|
||||
const lrSlider = document.getElementById('learning-rate');
|
||||
const epochsSlider = document.getElementById('epochs');
|
||||
|
||||
lrSlider.addEventListener('input', (e) => {
|
||||
const value = Math.pow(10, parseFloat(e.target.value));
|
||||
document.getElementById('lr-value').textContent = value.toExponential(0);
|
||||
});
|
||||
|
||||
epochsSlider.addEventListener('input', (e) => {
|
||||
document.getElementById('epochs-value').textContent = e.target.value;
|
||||
});
|
||||
|
||||
document.getElementById('training-form').addEventListener('submit', startTraining);
|
||||
document.getElementById('stop-training').addEventListener('click', stopTraining);
|
||||
}
|
||||
|
||||
async function loadTrainingView() {
|
||||
// Load available models
|
||||
try {
|
||||
const data = await apiCall('/api/models');
|
||||
const select = document.getElementById('training-model');
|
||||
|
||||
select.innerHTML = '<option value="">-- Modell wählen --</option>' +
|
||||
data.models.map(m => `<option value="${m}">${m}</option>`).join('');
|
||||
} catch (error) {
|
||||
console.error('Error loading models:', error);
|
||||
}
|
||||
|
||||
// Get current status
|
||||
updateTrainingStatus();
|
||||
}
|
||||
|
||||
async function startTraining(e) {
|
||||
e.preventDefault();
|
||||
|
||||
const modelName = document.getElementById('training-model').value;
|
||||
const learningRate = Math.pow(10, parseFloat(document.getElementById('learning-rate').value));
|
||||
const epochs = parseInt(document.getElementById('epochs').value);
|
||||
const batchSize = parseInt(document.getElementById('batch-size').value);
|
||||
const loraRank = parseInt(document.getElementById('lora-rank').value);
|
||||
|
||||
if (!modelName) {
|
||||
showToast('Bitte Modell wählen', 'warning');
|
||||
return;
|
||||
}
|
||||
|
||||
try {
|
||||
await apiCall('/api/training/start', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({
|
||||
model_name: modelName,
|
||||
learning_rate: learningRate,
|
||||
epochs: epochs,
|
||||
batch_size: batchSize,
|
||||
lora_rank: loraRank
|
||||
})
|
||||
});
|
||||
|
||||
showToast('Training gestartet', 'success');
|
||||
|
||||
document.getElementById('start-training').disabled = true;
|
||||
document.getElementById('stop-training').disabled = false;
|
||||
|
||||
// Start SSE stream
|
||||
startTrainingStream();
|
||||
|
||||
} catch (error) {
|
||||
console.error('Error starting training:', error);
|
||||
}
|
||||
}
|
||||
|
||||
async function stopTraining() {
|
||||
try {
|
||||
await apiCall('/api/training/stop', { method: 'POST' });
|
||||
showToast('Training gestoppt', 'warning');
|
||||
|
||||
document.getElementById('start-training').disabled = false;
|
||||
document.getElementById('stop-training').disabled = true;
|
||||
|
||||
if (trainingEventSource) {
|
||||
trainingEventSource.close();
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('Error stopping training:', error);
|
||||
}
|
||||
}
|
||||
|
||||
function startTrainingStream() {
|
||||
if (trainingEventSource) {
|
||||
trainingEventSource.close();
|
||||
}
|
||||
|
||||
trainingEventSource = new EventSource('/api/training/stream');
|
||||
|
||||
trainingEventSource.onmessage = (event) => {
|
||||
const status = JSON.parse(event.data);
|
||||
updateTrainingStatusUI(status);
|
||||
|
||||
if (!status.is_training && status.current_step > 0) {
|
||||
trainingEventSource.close();
|
||||
document.getElementById('start-training').disabled = false;
|
||||
document.getElementById('stop-training').disabled = true;
|
||||
showToast('Training abgeschlossen', 'success');
|
||||
}
|
||||
};
|
||||
|
||||
trainingEventSource.onerror = () => {
|
||||
trainingEventSource.close();
|
||||
};
|
||||
}
|
||||
|
||||
async function updateTrainingStatus() {
|
||||
try {
|
||||
const status = await apiCall('/api/training/status');
|
||||
updateTrainingStatusUI(status);
|
||||
|
||||
if (status.is_training) {
|
||||
document.getElementById('start-training').disabled = true;
|
||||
document.getElementById('stop-training').disabled = false;
|
||||
startTrainingStream();
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('Error updating status:', error);
|
||||
}
|
||||
}
|
||||
|
||||
function updateTrainingStatusUI(status) {
|
||||
const container = document.getElementById('training-status');
|
||||
|
||||
if (!status.is_training && status.current_step === 0) {
|
||||
container.innerHTML = '<p>Kein Training aktiv</p>';
|
||||
return;
|
||||
}
|
||||
|
||||
const eta = status.eta_seconds ? `${Math.floor(status.eta_seconds / 60)}m ${status.eta_seconds % 60}s` : 'N/A';
|
||||
|
||||
container.innerHTML = `
|
||||
<div class="status-grid">
|
||||
<div class="status-item">
|
||||
<label>Status</label>
|
||||
<div class="value">${status.is_training ? '🟢 Running' : '⏸️ Stopped'}</div>
|
||||
</div>
|
||||
<div class="status-item">
|
||||
<label>Step</label>
|
||||
<div class="value">${status.current_step} / ${status.total_steps}</div>
|
||||
</div>
|
||||
<div class="status-item">
|
||||
<label>Epoch</label>
|
||||
<div class="value">${status.current_epoch}</div>
|
||||
</div>
|
||||
<div class="status-item">
|
||||
<label>Train Loss</label>
|
||||
<div class="value">${status.train_loss || 'N/A'}</div>
|
||||
</div>
|
||||
<div class="status-item">
|
||||
<label>Val Loss</label>
|
||||
<div class="value">${status.val_loss || 'N/A'}</div>
|
||||
</div>
|
||||
<div class="status-item">
|
||||
<label>ETA</label>
|
||||
<div class="value">${eta}</div>
|
||||
</div>
|
||||
<div class="status-item">
|
||||
<label>Memory</label>
|
||||
<div class="value">${status.memory_usage_percent}%</div>
|
||||
</div>
|
||||
</div>
|
||||
`;
|
||||
|
||||
// Update charts (simple implementation without chart library)
|
||||
updateChart('train-loss-chart', status.train_loss_history);
|
||||
updateChart('val-loss-chart', status.val_loss_history);
|
||||
}
|
||||
|
||||
function updateChart(canvasId, data) {
|
||||
// Simplified chart rendering (without external library)
|
||||
const canvas = document.getElementById(canvasId);
|
||||
if (!canvas) return;
|
||||
|
||||
const ctx = canvas.getContext('2d');
|
||||
canvas.width = canvas.offsetWidth;
|
||||
canvas.height = 200;
|
||||
|
||||
ctx.clearRect(0, 0, canvas.width, canvas.height);
|
||||
|
||||
if (!data || data.length === 0) return;
|
||||
|
||||
const padding = 20;
|
||||
const width = canvas.width - 2 * padding;
|
||||
const height = canvas.height - 2 * padding;
|
||||
|
||||
const maxVal = Math.max(...data);
|
||||
const minVal = Math.min(...data);
|
||||
const range = maxVal - minVal || 1;
|
||||
|
||||
ctx.strokeStyle = '#4a9eff';
|
||||
ctx.lineWidth = 2;
|
||||
ctx.beginPath();
|
||||
|
||||
data.forEach((val, i) => {
|
||||
const x = padding + (i / (data.length - 1)) * width;
|
||||
const y = padding + height - ((val - minVal) / range) * height;
|
||||
|
||||
if (i === 0) {
|
||||
ctx.moveTo(x, y);
|
||||
} else {
|
||||
ctx.lineTo(x, y);
|
||||
}
|
||||
});
|
||||
|
||||
ctx.stroke();
|
||||
}
|
||||
|
||||
// ======================
|
||||
// Evaluation
|
||||
// ======================
|
||||
|
||||
function initEvaluation() {
|
||||
document.getElementById('load-test-prompt').addEventListener('click', loadTestPrompt);
|
||||
document.getElementById('run-comparison').addEventListener('click', runComparison);
|
||||
}
|
||||
|
||||
async function loadTestPrompt() {
|
||||
const taskType = document.getElementById('eval-task-type').value;
|
||||
|
||||
try {
|
||||
const prompts = await apiCall('/api/inference/test-prompts');
|
||||
const prompt = prompts[taskType];
|
||||
|
||||
if (prompt) {
|
||||
// Extract mail body from prompt
|
||||
const parts = prompt.split('\n\n');
|
||||
document.getElementById('eval-mail-text').value = parts.slice(1).join('\n\n');
|
||||
showToast('Test-Beispiel geladen', 'success');
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('Error loading test prompt:', error);
|
||||
}
|
||||
}
|
||||
|
||||
async function runComparison() {
|
||||
const taskType = document.getElementById('eval-task-type').value;
|
||||
const mailBody = document.getElementById('eval-mail-text').value;
|
||||
|
||||
if (!mailBody) {
|
||||
showToast('Bitte Mail-Text eingeben', 'warning');
|
||||
return;
|
||||
}
|
||||
|
||||
document.getElementById('base-result').textContent = 'Generiere...';
|
||||
document.getElementById('finetuned-result').textContent = 'Generiere...';
|
||||
|
||||
try {
|
||||
const result = await apiCall('/api/inference/compare', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify({
|
||||
task_type: taskType,
|
||||
mail_body: mailBody
|
||||
})
|
||||
});
|
||||
|
||||
document.getElementById('base-result').textContent = result.base || 'Modell nicht geladen';
|
||||
document.getElementById('finetuned-result').textContent = result.finetuned || 'Modell nicht geladen';
|
||||
|
||||
showToast('Vergleich abgeschlossen', 'success');
|
||||
} catch (error) {
|
||||
console.error('Error running comparison:', error);
|
||||
document.getElementById('base-result').textContent = 'Fehler';
|
||||
document.getElementById('finetuned-result').textContent = 'Fehler';
|
||||
}
|
||||
}
|
||||
|
||||
// ======================
|
||||
// Init
|
||||
// ======================
|
||||
|
||||
document.addEventListener('DOMContentLoaded', () => {
|
||||
initNavigation();
|
||||
initImport();
|
||||
initLabeling();
|
||||
initExport();
|
||||
initTraining();
|
||||
initEvaluation();
|
||||
});
|
||||
@@ -0,0 +1,254 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="de">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>Mail Fine-Tuning App</title>
|
||||
<link rel="stylesheet" href="style.css">
|
||||
</head>
|
||||
<body>
|
||||
<div class="app-container">
|
||||
<!-- Sidebar Navigation -->
|
||||
<nav class="sidebar">
|
||||
<h1>Mail Fine-Tuning</h1>
|
||||
<ul class="nav-menu">
|
||||
<li><a href="#" data-view="import" class="nav-link active">📥 Mail Import</a></li>
|
||||
<li><a href="#" data-view="labeling" class="nav-link">🏷️ Labeling</a></li>
|
||||
<li><a href="#" data-view="export" class="nav-link">📊 Export & Stats</a></li>
|
||||
<li><a href="#" data-view="models" class="nav-link">🤖 Modelle</a></li>
|
||||
<li><a href="#" data-view="training" class="nav-link">🎯 Training</a></li>
|
||||
<li><a href="#" data-view="evaluation" class="nav-link">🧪 Evaluation</a></li>
|
||||
</ul>
|
||||
</nav>
|
||||
|
||||
<!-- Main Content -->
|
||||
<main class="main-content">
|
||||
|
||||
<!-- Import View -->
|
||||
<div id="import-view" class="view active">
|
||||
<h2>Mail Import</h2>
|
||||
|
||||
<div class="upload-section">
|
||||
<div class="dropzone" id="dropzone">
|
||||
<p>📂 Dateien hier ablegen oder klicken</p>
|
||||
<p class="hint">Unterstützt: .eml, .mbox, .txt</p>
|
||||
<input type="file" id="file-input" multiple accept=".eml,.mbox,.txt" hidden>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="mail-list-section">
|
||||
<div class="section-header">
|
||||
<h3>Importierte Mails (<span id="mail-count">0</span>)</h3>
|
||||
<button id="refresh-mails" class="btn btn-secondary">🔄 Aktualisieren</button>
|
||||
</div>
|
||||
<div id="mail-list" class="mail-list">
|
||||
<!-- Mails werden hier eingefügt -->
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Labeling View -->
|
||||
<div id="labeling-view" class="view">
|
||||
<div class="section-header">
|
||||
<h2>Mail Labeling</h2>
|
||||
<div class="filter-controls">
|
||||
<select id="status-filter">
|
||||
<option value="">Alle anzeigen</option>
|
||||
<option value="unlabeled" selected>Nur Unlabeled</option>
|
||||
<option value="labeled">Nur Labeled</option>
|
||||
<option value="skip">Übersprungen</option>
|
||||
</select>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="progress-bar">
|
||||
<div class="progress-fill" id="labeling-progress"></div>
|
||||
<span class="progress-text" id="progress-text">0 / 0 gelabelt</span>
|
||||
</div>
|
||||
|
||||
<div class="keyboard-hints">
|
||||
Shortcuts: <kbd>N</kbd> Nächste | <kbd>S</kbd> Speichern | <kbd>K</kbd> Skip
|
||||
</div>
|
||||
|
||||
<div id="labeling-container">
|
||||
<!-- Labeling Interface wird hier geladen -->
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Export View -->
|
||||
<div id="export-view" class="view">
|
||||
<h2>Daten Export & Statistiken</h2>
|
||||
|
||||
<div class="stats-grid" id="stats-grid">
|
||||
<!-- Stats werden hier eingefügt -->
|
||||
</div>
|
||||
|
||||
<div class="export-section">
|
||||
<h3>Training-Daten exportieren</h3>
|
||||
<div class="export-controls">
|
||||
<label>
|
||||
Train/Val Split:
|
||||
<input type="number" id="train-split" value="90" min="50" max="95" step="5">%
|
||||
</label>
|
||||
<button id="export-jsonl" class="btn btn-primary">📦 JSONL generieren</button>
|
||||
</div>
|
||||
<div id="export-result"></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Models View -->
|
||||
<div id="models-view" class="view">
|
||||
<h2>Modell-Verwaltung</h2>
|
||||
|
||||
<div class="model-section">
|
||||
<h3>Verfügbare Modelle</h3>
|
||||
<div id="models-list" class="models-list">
|
||||
<!-- Modelle werden hier geladen -->
|
||||
</div>
|
||||
|
||||
<div class="model-download">
|
||||
<h3>Modell herunterladen</h3>
|
||||
<p class="info-text">
|
||||
Modelle müssen manuell heruntergeladen werden. Empfohlen:
|
||||
</p>
|
||||
<ul>
|
||||
<li>mlx-community/Mistral-7B-Instruct-v0.3-4bit</li>
|
||||
<li>mlx-community/Meta-Llama-3-8B-Instruct-4bit</li>
|
||||
</ul>
|
||||
<p class="code-example">
|
||||
huggingface-cli download [model-name] --local-dir models/[model-name]
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Training View -->
|
||||
<div id="training-view" class="view">
|
||||
<h2>Training</h2>
|
||||
|
||||
<div class="training-config">
|
||||
<h3>Konfiguration</h3>
|
||||
<form id="training-form">
|
||||
<div class="form-group">
|
||||
<label>Modell:</label>
|
||||
<select id="training-model" required>
|
||||
<option value="">-- Modell wählen --</option>
|
||||
</select>
|
||||
</div>
|
||||
|
||||
<div class="form-group">
|
||||
<label>
|
||||
Learning Rate: <span id="lr-value">1e-5</span>
|
||||
</label>
|
||||
<input type="range" id="learning-rate"
|
||||
min="-6" max="-4" step="0.1" value="-5">
|
||||
</div>
|
||||
|
||||
<div class="form-group">
|
||||
<label>
|
||||
Epochs: <span id="epochs-value">3</span>
|
||||
</label>
|
||||
<input type="range" id="epochs"
|
||||
min="1" max="10" value="3">
|
||||
</div>
|
||||
|
||||
<div class="form-group">
|
||||
<label>Batch Size:</label>
|
||||
<select id="batch-size">
|
||||
<option value="1">1</option>
|
||||
<option value="2">2</option>
|
||||
<option value="4" selected>4</option>
|
||||
<option value="8">8</option>
|
||||
</select>
|
||||
</div>
|
||||
|
||||
<div class="form-group">
|
||||
<label>LoRA Rank:</label>
|
||||
<select id="lora-rank">
|
||||
<option value="4">4</option>
|
||||
<option value="8" selected>8</option>
|
||||
<option value="16">16</option>
|
||||
<option value="32">32</option>
|
||||
</select>
|
||||
</div>
|
||||
|
||||
<div class="form-actions">
|
||||
<button type="submit" class="btn btn-primary" id="start-training">
|
||||
▶️ Training starten
|
||||
</button>
|
||||
<button type="button" class="btn btn-danger" id="stop-training" disabled>
|
||||
⏹️ Training stoppen
|
||||
</button>
|
||||
</div>
|
||||
</form>
|
||||
</div>
|
||||
|
||||
<div class="training-status" id="training-status">
|
||||
<!-- Training Status wird hier angezeigt -->
|
||||
</div>
|
||||
|
||||
<div class="training-charts">
|
||||
<div class="chart-container">
|
||||
<h4>Training Loss</h4>
|
||||
<canvas id="train-loss-chart"></canvas>
|
||||
</div>
|
||||
<div class="chart-container">
|
||||
<h4>Validation Loss</h4>
|
||||
<canvas id="val-loss-chart"></canvas>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Evaluation View -->
|
||||
<div id="evaluation-view" class="view">
|
||||
<h2>Modell Evaluation</h2>
|
||||
|
||||
<div class="eval-controls">
|
||||
<h3>Chat Interface</h3>
|
||||
<div class="form-group">
|
||||
<label>Task Type:</label>
|
||||
<select id="eval-task-type">
|
||||
<option value="Zusammenfassen">Zusammenfassen</option>
|
||||
<option value="Antwort schreiben">Antwort schreiben</option>
|
||||
<option value="Kategorisieren">Kategorisieren</option>
|
||||
<option value="Action Items">Action Items</option>
|
||||
<option value="Custom">Custom</option>
|
||||
</select>
|
||||
</div>
|
||||
|
||||
<div class="form-group">
|
||||
<label>Mail-Text:</label>
|
||||
<textarea id="eval-mail-text" rows="6" placeholder="Mail-Text hier eingeben..."></textarea>
|
||||
</div>
|
||||
|
||||
<div class="form-group">
|
||||
<button id="load-test-prompt" class="btn btn-secondary">📝 Test-Beispiel laden</button>
|
||||
<button id="run-comparison" class="btn btn-primary">🔍 Vergleich starten</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="comparison-results">
|
||||
<div class="result-box">
|
||||
<h4>Base Model</h4>
|
||||
<div id="base-result" class="result-content">
|
||||
Noch kein Ergebnis
|
||||
</div>
|
||||
</div>
|
||||
<div class="result-box">
|
||||
<h4>Fine-tuned Model</h4>
|
||||
<div id="finetuned-result" class="result-content">
|
||||
Noch kein Ergebnis
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</main>
|
||||
</div>
|
||||
|
||||
<!-- Toast Notifications -->
|
||||
<div id="toast-container"></div>
|
||||
|
||||
<script src="app.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
@@ -0,0 +1,600 @@
|
||||
/* Mail Fine-Tuning App Styles */
|
||||
|
||||
:root {
|
||||
--bg-primary: #1a1a1a;
|
||||
--bg-secondary: #2d2d2d;
|
||||
--bg-tertiary: #3a3a3a;
|
||||
--text-primary: #e0e0e0;
|
||||
--text-secondary: #b0b0b0;
|
||||
--accent-primary: #4a9eff;
|
||||
--accent-success: #4caf50;
|
||||
--accent-warning: #ff9800;
|
||||
--accent-danger: #f44336;
|
||||
--border-color: #444;
|
||||
}
|
||||
|
||||
* {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
|
||||
background: var(--bg-primary);
|
||||
color: var(--text-primary);
|
||||
line-height: 1.6;
|
||||
}
|
||||
|
||||
.app-container {
|
||||
display: flex;
|
||||
height: 100vh;
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
/* Sidebar */
|
||||
.sidebar {
|
||||
width: 250px;
|
||||
background: var(--bg-secondary);
|
||||
padding: 2rem 1rem;
|
||||
border-right: 1px solid var(--border-color);
|
||||
}
|
||||
|
||||
.sidebar h1 {
|
||||
font-size: 1.5rem;
|
||||
margin-bottom: 2rem;
|
||||
color: var(--accent-primary);
|
||||
}
|
||||
|
||||
.nav-menu {
|
||||
list-style: none;
|
||||
}
|
||||
|
||||
.nav-link {
|
||||
display: block;
|
||||
padding: 0.75rem 1rem;
|
||||
color: var(--text-secondary);
|
||||
text-decoration: none;
|
||||
border-radius: 4px;
|
||||
margin-bottom: 0.5rem;
|
||||
transition: all 0.2s;
|
||||
}
|
||||
|
||||
.nav-link:hover {
|
||||
background: var(--bg-tertiary);
|
||||
color: var(--text-primary);
|
||||
}
|
||||
|
||||
.nav-link.active {
|
||||
background: var(--accent-primary);
|
||||
color: white;
|
||||
}
|
||||
|
||||
/* Main Content */
|
||||
.main-content {
|
||||
flex: 1;
|
||||
overflow-y: auto;
|
||||
padding: 2rem;
|
||||
}
|
||||
|
||||
.view {
|
||||
display: none;
|
||||
}
|
||||
|
||||
.view.active {
|
||||
display: block;
|
||||
}
|
||||
|
||||
h2 {
|
||||
margin-bottom: 1.5rem;
|
||||
color: var(--text-primary);
|
||||
}
|
||||
|
||||
h3 {
|
||||
margin-bottom: 1rem;
|
||||
color: var(--text-primary);
|
||||
}
|
||||
|
||||
/* Buttons */
|
||||
.btn {
|
||||
padding: 0.6rem 1.2rem;
|
||||
border: none;
|
||||
border-radius: 4px;
|
||||
cursor: pointer;
|
||||
font-size: 0.9rem;
|
||||
transition: all 0.2s;
|
||||
}
|
||||
|
||||
.btn-primary {
|
||||
background: var(--accent-primary);
|
||||
color: white;
|
||||
}
|
||||
|
||||
.btn-primary:hover {
|
||||
background: #3a8eef;
|
||||
}
|
||||
|
||||
.btn-secondary {
|
||||
background: var(--bg-tertiary);
|
||||
color: var(--text-primary);
|
||||
}
|
||||
|
||||
.btn-secondary:hover {
|
||||
background: #4a4a4a;
|
||||
}
|
||||
|
||||
.btn-success {
|
||||
background: var(--accent-success);
|
||||
color: white;
|
||||
}
|
||||
|
||||
.btn-danger {
|
||||
background: var(--accent-danger);
|
||||
color: white;
|
||||
}
|
||||
|
||||
.btn:disabled {
|
||||
opacity: 0.5;
|
||||
cursor: not-allowed;
|
||||
}
|
||||
|
||||
/* Upload Section */
|
||||
.dropzone {
|
||||
border: 2px dashed var(--border-color);
|
||||
border-radius: 8px;
|
||||
padding: 3rem;
|
||||
text-align: center;
|
||||
cursor: pointer;
|
||||
transition: all 0.2s;
|
||||
margin-bottom: 2rem;
|
||||
}
|
||||
|
||||
.dropzone:hover {
|
||||
border-color: var(--accent-primary);
|
||||
background: var(--bg-secondary);
|
||||
}
|
||||
|
||||
.dropzone.dragover {
|
||||
border-color: var(--accent-primary);
|
||||
background: var(--bg-tertiary);
|
||||
}
|
||||
|
||||
.hint {
|
||||
font-size: 0.85rem;
|
||||
color: var(--text-secondary);
|
||||
margin-top: 0.5rem;
|
||||
}
|
||||
|
||||
/* Section Header */
|
||||
.section-header {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
margin-bottom: 1rem;
|
||||
}
|
||||
|
||||
/* Mail List */
|
||||
.mail-list {
|
||||
background: var(--bg-secondary);
|
||||
border-radius: 8px;
|
||||
padding: 1rem;
|
||||
max-height: 500px;
|
||||
overflow-y: auto;
|
||||
}
|
||||
|
||||
.mail-item {
|
||||
background: var(--bg-tertiary);
|
||||
padding: 1rem;
|
||||
margin-bottom: 0.5rem;
|
||||
border-radius: 4px;
|
||||
border-left: 3px solid transparent;
|
||||
}
|
||||
|
||||
.mail-item.labeled {
|
||||
border-left-color: var(--accent-success);
|
||||
}
|
||||
|
||||
.mail-item.unlabeled {
|
||||
border-left-color: var(--accent-warning);
|
||||
}
|
||||
|
||||
.mail-item.skip {
|
||||
border-left-color: var(--text-secondary);
|
||||
}
|
||||
|
||||
.mail-header {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
margin-bottom: 0.5rem;
|
||||
}
|
||||
|
||||
.mail-subject {
|
||||
font-weight: bold;
|
||||
color: var(--text-primary);
|
||||
}
|
||||
|
||||
.mail-meta {
|
||||
font-size: 0.85rem;
|
||||
color: var(--text-secondary);
|
||||
}
|
||||
|
||||
.mail-body {
|
||||
font-size: 0.9rem;
|
||||
color: var(--text-secondary);
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
display: -webkit-box;
|
||||
-webkit-line-clamp: 2;
|
||||
-webkit-box-orient: vertical;
|
||||
}
|
||||
|
||||
.mail-actions {
|
||||
margin-top: 0.5rem;
|
||||
display: flex;
|
||||
gap: 0.5rem;
|
||||
}
|
||||
|
||||
.mail-actions button {
|
||||
padding: 0.4rem 0.8rem;
|
||||
font-size: 0.8rem;
|
||||
}
|
||||
|
||||
/* Labeling Interface */
|
||||
#labeling-container {
|
||||
background: var(--bg-secondary);
|
||||
border-radius: 8px;
|
||||
padding: 2rem;
|
||||
margin-top: 1rem;
|
||||
}
|
||||
|
||||
.current-mail {
|
||||
background: var(--bg-tertiary);
|
||||
padding: 1.5rem;
|
||||
border-radius: 4px;
|
||||
margin-bottom: 1.5rem;
|
||||
}
|
||||
|
||||
.form-group {
|
||||
margin-bottom: 1.5rem;
|
||||
}
|
||||
|
||||
.form-group label {
|
||||
display: block;
|
||||
margin-bottom: 0.5rem;
|
||||
color: var(--text-primary);
|
||||
font-weight: 500;
|
||||
}
|
||||
|
||||
.form-group input,
|
||||
.form-group select,
|
||||
.form-group textarea {
|
||||
width: 100%;
|
||||
padding: 0.6rem;
|
||||
background: var(--bg-primary);
|
||||
border: 1px solid var(--border-color);
|
||||
border-radius: 4px;
|
||||
color: var(--text-primary);
|
||||
font-family: inherit;
|
||||
}
|
||||
|
||||
.form-group textarea {
|
||||
resize: vertical;
|
||||
min-height: 100px;
|
||||
}
|
||||
|
||||
.form-actions {
|
||||
display: flex;
|
||||
gap: 1rem;
|
||||
margin-top: 1rem;
|
||||
}
|
||||
|
||||
/* Progress Bar */
|
||||
.progress-bar {
|
||||
background: var(--bg-secondary);
|
||||
border-radius: 4px;
|
||||
height: 30px;
|
||||
position: relative;
|
||||
margin-bottom: 1rem;
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
.progress-fill {
|
||||
background: var(--accent-primary);
|
||||
height: 100%;
|
||||
transition: width 0.3s;
|
||||
}
|
||||
|
||||
.progress-text {
|
||||
position: absolute;
|
||||
top: 50%;
|
||||
left: 50%;
|
||||
transform: translate(-50%, -50%);
|
||||
font-weight: bold;
|
||||
color: var(--text-primary);
|
||||
}
|
||||
|
||||
/* Keyboard Hints */
|
||||
.keyboard-hints {
|
||||
font-size: 0.85rem;
|
||||
color: var(--text-secondary);
|
||||
margin-bottom: 1rem;
|
||||
}
|
||||
|
||||
kbd {
|
||||
background: var(--bg-tertiary);
|
||||
padding: 0.2rem 0.5rem;
|
||||
border-radius: 3px;
|
||||
border: 1px solid var(--border-color);
|
||||
font-family: monospace;
|
||||
}
|
||||
|
||||
/* Stats Grid */
|
||||
.stats-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
|
||||
gap: 1rem;
|
||||
margin-bottom: 2rem;
|
||||
}
|
||||
|
||||
.stat-card {
|
||||
background: var(--bg-secondary);
|
||||
padding: 1.5rem;
|
||||
border-radius: 8px;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.stat-value {
|
||||
font-size: 2rem;
|
||||
font-weight: bold;
|
||||
color: var(--accent-primary);
|
||||
}
|
||||
|
||||
.stat-label {
|
||||
color: var(--text-secondary);
|
||||
font-size: 0.9rem;
|
||||
}
|
||||
|
||||
/* Export Section */
|
||||
.export-section {
|
||||
background: var(--bg-secondary);
|
||||
padding: 1.5rem;
|
||||
border-radius: 8px;
|
||||
}
|
||||
|
||||
.export-controls {
|
||||
display: flex;
|
||||
gap: 1rem;
|
||||
align-items: center;
|
||||
margin-bottom: 1rem;
|
||||
}
|
||||
|
||||
.export-controls input {
|
||||
width: 80px;
|
||||
padding: 0.4rem;
|
||||
background: var(--bg-primary);
|
||||
border: 1px solid var(--border-color);
|
||||
color: var(--text-primary);
|
||||
border-radius: 4px;
|
||||
}
|
||||
|
||||
#export-result {
|
||||
margin-top: 1rem;
|
||||
padding: 1rem;
|
||||
background: var(--bg-tertiary);
|
||||
border-radius: 4px;
|
||||
display: none;
|
||||
}
|
||||
|
||||
#export-result.show {
|
||||
display: block;
|
||||
}
|
||||
|
||||
/* Models List */
|
||||
.models-list {
|
||||
background: var(--bg-secondary);
|
||||
padding: 1rem;
|
||||
border-radius: 8px;
|
||||
margin-bottom: 2rem;
|
||||
}
|
||||
|
||||
.model-item {
|
||||
background: var(--bg-tertiary);
|
||||
padding: 1rem;
|
||||
margin-bottom: 0.5rem;
|
||||
border-radius: 4px;
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
.model-download {
|
||||
background: var(--bg-secondary);
|
||||
padding: 1.5rem;
|
||||
border-radius: 8px;
|
||||
}
|
||||
|
||||
.info-text {
|
||||
color: var(--text-secondary);
|
||||
margin-bottom: 1rem;
|
||||
}
|
||||
|
||||
.code-example {
|
||||
background: var(--bg-primary);
|
||||
padding: 1rem;
|
||||
border-radius: 4px;
|
||||
font-family: monospace;
|
||||
color: var(--accent-primary);
|
||||
margin-top: 1rem;
|
||||
}
|
||||
|
||||
/* Training Status */
|
||||
.training-status {
|
||||
background: var(--bg-secondary);
|
||||
padding: 1.5rem;
|
||||
border-radius: 8px;
|
||||
margin: 1.5rem 0;
|
||||
}
|
||||
|
||||
.status-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
|
||||
gap: 1rem;
|
||||
}
|
||||
|
||||
.status-item {
|
||||
background: var(--bg-tertiary);
|
||||
padding: 1rem;
|
||||
border-radius: 4px;
|
||||
}
|
||||
|
||||
.status-item label {
|
||||
display: block;
|
||||
color: var(--text-secondary);
|
||||
font-size: 0.85rem;
|
||||
margin-bottom: 0.3rem;
|
||||
}
|
||||
|
||||
.status-item .value {
|
||||
font-size: 1.2rem;
|
||||
font-weight: bold;
|
||||
color: var(--accent-primary);
|
||||
}
|
||||
|
||||
/* Training Charts */
|
||||
.training-charts {
|
||||
display: grid;
|
||||
grid-template-columns: 1fr 1fr;
|
||||
gap: 1.5rem;
|
||||
margin-top: 1.5rem;
|
||||
}
|
||||
|
||||
.chart-container {
|
||||
background: var(--bg-secondary);
|
||||
padding: 1.5rem;
|
||||
border-radius: 8px;
|
||||
}
|
||||
|
||||
.chart-container h4 {
|
||||
margin-bottom: 1rem;
|
||||
color: var(--text-primary);
|
||||
}
|
||||
|
||||
canvas {
|
||||
width: 100% !important;
|
||||
height: 200px !important;
|
||||
}
|
||||
|
||||
/* Evaluation */
|
||||
.comparison-results {
|
||||
display: grid;
|
||||
grid-template-columns: 1fr 1fr;
|
||||
gap: 1.5rem;
|
||||
margin-top: 1.5rem;
|
||||
}
|
||||
|
||||
.result-box {
|
||||
background: var(--bg-secondary);
|
||||
padding: 1.5rem;
|
||||
border-radius: 8px;
|
||||
}
|
||||
|
||||
.result-content {
|
||||
background: var(--bg-primary);
|
||||
padding: 1rem;
|
||||
border-radius: 4px;
|
||||
min-height: 150px;
|
||||
white-space: pre-wrap;
|
||||
font-family: monospace;
|
||||
font-size: 0.9rem;
|
||||
}
|
||||
|
||||
/* Filter Controls */
|
||||
.filter-controls {
|
||||
display: flex;
|
||||
gap: 1rem;
|
||||
}
|
||||
|
||||
.filter-controls select {
|
||||
padding: 0.5rem;
|
||||
background: var(--bg-tertiary);
|
||||
border: 1px solid var(--border-color);
|
||||
color: var(--text-primary);
|
||||
border-radius: 4px;
|
||||
}
|
||||
|
||||
/* Toast Notifications */
|
||||
#toast-container {
|
||||
position: fixed;
|
||||
top: 1rem;
|
||||
right: 1rem;
|
||||
z-index: 1000;
|
||||
}
|
||||
|
||||
.toast {
|
||||
background: var(--bg-secondary);
|
||||
border: 1px solid var(--border-color);
|
||||
border-left: 4px solid var(--accent-primary);
|
||||
padding: 1rem 1.5rem;
|
||||
border-radius: 4px;
|
||||
margin-bottom: 0.5rem;
|
||||
min-width: 300px;
|
||||
animation: slideIn 0.3s ease;
|
||||
}
|
||||
|
||||
.toast.success {
|
||||
border-left-color: var(--accent-success);
|
||||
}
|
||||
|
||||
.toast.error {
|
||||
border-left-color: var(--accent-danger);
|
||||
}
|
||||
|
||||
.toast.warning {
|
||||
border-left-color: var(--accent-warning);
|
||||
}
|
||||
|
||||
@keyframes slideIn {
|
||||
from {
|
||||
transform: translateX(400px);
|
||||
opacity: 0;
|
||||
}
|
||||
to {
|
||||
transform: translateX(0);
|
||||
opacity: 1;
|
||||
}
|
||||
}
|
||||
|
||||
/* Scrollbar */
|
||||
::-webkit-scrollbar {
|
||||
width: 8px;
|
||||
height: 8px;
|
||||
}
|
||||
|
||||
::-webkit-scrollbar-track {
|
||||
background: var(--bg-secondary);
|
||||
}
|
||||
|
||||
::-webkit-scrollbar-thumb {
|
||||
background: var(--bg-tertiary);
|
||||
border-radius: 4px;
|
||||
}
|
||||
|
||||
::-webkit-scrollbar-thumb:hover {
|
||||
background: #4a4a4a;
|
||||
}
|
||||
|
||||
/* Responsive */
|
||||
@media (max-width: 768px) {
|
||||
.sidebar {
|
||||
width: 200px;
|
||||
}
|
||||
|
||||
.comparison-results,
|
||||
.training-charts {
|
||||
grid-template-columns: 1fr;
|
||||
}
|
||||
|
||||
.stats-grid {
|
||||
grid-template-columns: 1fr;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,24 @@
|
||||
# Mail Fine-Tuning App Dependencies
|
||||
|
||||
# Web Framework
|
||||
fastapi==0.109.0
|
||||
uvicorn[standard]==0.27.0
|
||||
python-multipart==0.0.6
|
||||
|
||||
# ML Framework (Apple Silicon)
|
||||
mlx==0.6.0
|
||||
mlx-lm==0.8.0
|
||||
|
||||
# Mail Parsing
|
||||
beautifulsoup4==4.12.3
|
||||
chardet==5.2.0
|
||||
|
||||
# Database
|
||||
aiosqlite==0.19.0
|
||||
|
||||
# Utilities
|
||||
aiofiles==23.2.1
|
||||
psutil==5.9.8
|
||||
|
||||
# Optional but recommended
|
||||
huggingface-hub==0.20.3
|
||||
Executable
+35
@@ -0,0 +1,35 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Mail Fine-Tuning App Startup Script
|
||||
|
||||
echo "🚀 Starting Mail Fine-Tuning App..."
|
||||
echo ""
|
||||
|
||||
# Check if venv exists
|
||||
if [ ! -d "venv" ]; then
|
||||
echo "❌ Virtual environment not found!"
|
||||
echo "Please run: python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Activate venv
|
||||
source venv/bin/activate
|
||||
|
||||
# Check if dependencies are installed
|
||||
if ! python -c "import fastapi" 2>/dev/null; then
|
||||
echo "❌ Dependencies not installed!"
|
||||
echo "Please run: pip install -r requirements.txt"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Create necessary directories
|
||||
mkdir -p data models output
|
||||
|
||||
# Start server
|
||||
echo "✅ Starting server on http://localhost:8000"
|
||||
echo ""
|
||||
echo "Press Ctrl+C to stop"
|
||||
echo ""
|
||||
|
||||
cd backend
|
||||
python main.py
|
||||
Reference in New Issue
Block a user