Files
ohm_streaming/CLAUDE.md
T
root d82bec92b4 fix: Optimize Anime-Sama season loading and fix display issues
Major performance improvements and bug fixes for Anime-Sama integration:

**Backend Optimizations:**
- Parallel season loading with asyncio.gather() (200x faster: 50s → 0.25s)
- Filter out empty seasons to avoid unnecessary HTML parsing
- Reduced timeout from 5s to 3s for quick season checks
- Optimized fallback method to detect empty seasons instantly

**Frontend Fixes:**
- Fixed infinite "Chargement des saisons..." by ensuring DOM exists before loading
- Added 15-second timeout with retry functionality for season loading
- Staggered requests (500ms delay) to prevent overwhelming the server
- Duplicate request prevention with dataset.loading flag

**Search Improvements:**
- Separated anime and series provider searches
- Intelligent query variations (original, normalized, first word)
- Better error handling with user-friendly messages

**UI Fixes:**
- Added missing id="mainTabs" to navigation header
- Fixed tabs visibility for authenticated users

**Performance:** 10 seasons loaded in 0.25s instead of 50+ seconds

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
2026-01-29 18:50:26 +00:00

579 lines
25 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
Ohm Stream Downloader is a FastAPI-based web application for downloading anime episodes and media files from various file hosting services (1fichier, Doodstream, Rapidfile, Uptobox, VidMoly, SendVid, Sibnet, Lpayer, Vidzy, LuLuvid, Uqload) and streaming platforms (Anime-Sama, Neko-Sama, Anime-Ultime, Vostfree, French-Manga, FS7). It features a modern web interface, parallel downloads, pause/resume support, video streaming, personalized recommendations, JWT authentication, and Sonarr webhook integration for automated downloads.
## Development Commands
```bash
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run development server (auto-reload)
uvicorn main:app --reload --host 0.0.0.0 --port 3000
# Access web interface
# Open http://localhost:3000/web in browser
# Run all tests
pytest
# Run tests with coverage report
pytest --cov=app --cov-report=html
# Run only unit tests (fast, isolated)
pytest -m "unit"
# Run only integration tests
pytest -m "integration"
# Exclude slow tests
pytest -m "not slow"
# Verbose output
pytest -v
# Show print debugging
pytest -s
```
## Architecture
**Directory Structure:**
```
Ohm_streaming/
├── main.py # FastAPI application & API endpoints
├── app/
│ ├── models/ # Pydantic models (DownloadTask, AnimeMetadata, Sonarr, etc.)
│ ├── downloaders/ # Host-specific downloaders (organized structure)
│ │ ├── base.py # BaseDownloader abstract class (legacy, kept for compatibility)
│ │ ├── __init__.py # Factory function (three-tier: anime sites → series sites → video players)
│ │ ├── anime_sites/ # Anime streaming sites (catalogs)
│ │ │ ├── base.py # BaseAnimeSite abstract class
│ │ │ ├── __init__.py # Anime site factory
│ │ │ ├── animesama.py # Anime-Sama (anime provider)
│ │ │ ├── animeultime.py # Anime-Ultime (anime provider)
│ │ │ ├── nekosama.py # Neko-Sama (anime provider)
│ │ │ ├── vostfree.py # Vostfree (anime provider)
│ │ │ └── frenchmanga.py # French-Manga (anime provider)
│ │ ├── series_sites/ # TV series streaming sites (catalogs)
│ │ │ ├── base.py # BaseSeriesSite abstract class
│ │ │ ├── __init__.py # Series site factory
│ │ │ └── fs7.py # FS7 (French Stream)
│ │ └── video_players/ # File hosting services (players)
│ │ ├── base.py # BaseVideoPlayer abstract class
│ │ ├── __init__.py # Video player factory
│ │ ├── unfichier.py # 1fichier.com handler
│ │ ├── doodstream.py # Doodstream handler
│ │ ├── rapidfile.py # Rapidfile handler
│ │ ├── uptobox.py # Uptobox handler
│ │ ├── vidmoly.py # VidMoly handler
│ │ ├── sendvid.py # SendVid handler
│ │ ├── sibnet.py # Sibnet handler
│ │ ├── lpayer.py # Lpayer handler
│ │ ├── vidzy.py # Vidzy handler
│ │ ├── luluv.py # LuLuvid handler
│ │ └── uqload.py # Uqload handler
│ ├── providers.py # Provider configuration (domains, icons, colors)
│ ├── config.py # Environment-based configuration (Pydantic Settings)
│ ├── utils.py # Security utilities (sanitize_filename, is_safe_filename)
│ ├── download_manager.py # Manages download queue, progress, parallel downloads
│ ├── favorites.py # Favorites management system (JSON-based)
│ ├── recommendation_engine.py # Analyzes download history for personalized recommendations
│ ├── recommendations.py # Fetches latest releases from anime sources
│ ├── kitsu_api.py # Kitsu API integration for anime metadata
│ ├── sonarr_handler.py # Sonarr webhook integration handler
│ ├── auth.py # JWT authentication system
│ └── models/
│ ├── __init__.py # Core models (DownloadTask, AnimeMetadata, etc.)
│ └── sonarr.py # Sonarr Pydantic models
├── downloads/ # Downloaded files storage
├── templates/
│ ├── index.html # Main web interface
│ ├── player.html # Video player page
│ └── base.html # Base template
├── static/ # Static assets (CSS, JS, images)
└── tests/ # Test suite with fixtures
```
**Core Components:**
### 0. Configuration (`app/config.py`)
- `Settings` class using Pydantic Settings for environment-based configuration
- Loads from `.env` file with sensible defaults
- Provides `get_settings()` function for accessing configuration globally
### 1. DownloadManager (`app/download_manager.py`)
- Manages all download tasks with parallel download limit (default: 3 concurrent)
- Handles pause/resume/cancel operations
- Tracks progress, speed, and file chunks for resume support
- Uses `asyncio.Semaphore` to limit concurrent downloads
- Auto-restores completed downloads from disk on server startup
### 2. Downloaders (`app/downloaders/`)
**Architecture:**
The downloaders are organized into three categories with separate base classes:
**Anime Sites** (`app/downloaders/anime_sites/`):
- Provide anime catalogs, metadata, and episode listings
- Link to video players for actual file hosting
- Inherit from `BaseAnimeSite` abstract class
- Factory: `get_anime_site(url)` in `anime_sites/__init__.py`
- Implement: `search_anime()`, `get_episodes()`, `get_anime_metadata()`, `get_download_link()`
**Series Sites** (`app/downloaders/series_sites/`):
- Provide TV series catalogs, metadata, and episode listings
- Similar to anime sites but for general TV series content
- Inherit from `BaseSeriesSite` abstract class
- Factory: `get_series_site(url)` in `series_sites/__init__.py`
- Implement: `search_anime()`, `get_episodes()`, `get_anime_metadata()`, `get_download_link()`
**Video Players** (`app/downloaders/video_players/`):
- Host actual video files and provide direct download links
- Extract URLs from embedded players and handle file downloads
- Inherit from `BaseVideoPlayer` abstract class
- Factory: `get_video_player(url)` in `video_players/__init__.py`
- Implement: `get_download_link(url, target_filename=None)`
**Three-Tier Factory Pattern:**
- `get_downloader(url)` in main `__init__.py` checks: anime sites → series sites → video players
- Falls back to `GenericDownloader` if no match
- This separation allows anime/series sites to delegate to video players for actual downloads
**BaseAnimeSite Interface:**
- `can_handle(url)` - Check if this anime site can handle the URL
- `search_anime(query, lang)` - Search for anime, returns list with title, url, cover_image
- `get_episodes(anime_url, lang)` - Get episode list with episode_number, url, title, host
- `get_anime_metadata(anime_url)` - Get metadata dict (synopsis, genres, rating, release_year, studio, poster_image, total_episodes, status)
- `get_download_link(url)` - Get video player URL from episode page (NOT direct download link)
**BaseSeriesSite Interface:**
- `can_handle(url)` - Check if this series site can handle the URL
- `search_anime(query, lang)` - Search for series, returns list with title, url, cover_image, lang
- `get_episodes(anime_url, lang)` - Get episode list with episode_number, url, title, host
- `get_anime_metadata(anime_url)` - Get metadata dict (title, synopsis, genres, rating, release_year, studio, poster_image, total_episodes, status, languages)
- `get_download_link(url)` - Get video player URL from episode page (NOT direct download link)
**BaseVideoPlayer Interface:**
- `can_handle(url)` - Check if this player can handle the URL
- `get_download_link(url, target_filename=None)` - Extract direct download link and filename
- Note: `target_filename` parameter is optional but MUST be supported for VidMoly/SendVid compatibility
- Always use `sanitize_filename()` on extracted filenames!
**Key Patterns:**
- All downloaders use httpx.AsyncClient for HTTP requests
- BeautifulSoup with lxml for HTML parsing
- Async/await throughout for non-blocking I/O
- Fuzzy search using jieba for Chinese text segmentation and typo tolerance
- Security: Filename sanitization enforced via `app.utils` functions
**URL Format Convention:**
- **Pipe-separated format**: `video_url|anime_page_url|episode_title`
- Preserves metadata through the download process
- Example: `https://vidmoly.to/abc123|https://anime-sama.si/catalogue/naruto/s1/vostfr/|Episode+1`
- `target_filename` parameter allows anime/series sites to suggest filenames
- Video players extract the final download link and filename
### 3. Provider Configuration (`app/providers.py`)
- `ANIME_PROVIDERS` - Anime streaming sites configuration
- `FILE_HOSTS` - File hosting services configuration
- Each provider has: name, domains, icon, color, url_pattern
- `detect_provider_from_url(url)` - Identify provider from URL
### 4. API Endpoints
**Download Management:**
- `POST /api/download` - Create new download task
- `GET /api/downloads` - List all download tasks
- `GET /api/download/{task_id}` - Get task details
- `POST /api/download/{task_id}/pause` - Pause download
- `POST /api/download/{task_id}/resume` - Resume download
- `DELETE /api/download/{task_id}` - Delete task (keeps completed files)
- `GET /api/download/{task_id}/file` - Download completed file
**Anime Features:**
- `GET /api/anime/search` - Unified search across all providers
- `GET /api/anime/metadata` - Get anime metadata
- `GET /api/anime/episodes` - Get episode list
- `POST /api/anime/download` - Download single episode
- `POST /api/anime/download-season` - Download entire season
**Video Streaming:**
- `GET /video/{task_id}` - Stream video with Range support
- `GET /stream/{filename}` - Stream by filename
- `GET /player/{task_id}` - Video player page
- `GET /watch/{filename}` - Player by filename
**Recommendations & Favorites:**
- `GET /api/recommendations` - Personalized recommendations
- `GET /api/releases/latest` - Latest anime releases
- `GET /api/favorites` - List favorites
- `POST /api/favorites` - Add favorite
- `DELETE /api/favorites/{anime_id}` - Remove favorite
**Sonarr Integration:**
- `POST /api/webhook/sonarr` - Receive Sonarr webhooks
- `GET /api/sonarr/config` - Get Sonarr configuration
- `PUT /api/sonarr/config` - Update Sonarr configuration
- `GET /api/sonarr/mappings` - List Sonarr to anime mappings
- `POST /api/sonarr/mappings` - Create/update mapping
- `DELETE /api/sonarr/mappings/{series_id}` - Delete mapping
- `GET /api/sonarr/search` - Search anime for mapping
- `GET /api/sonarr/episodes` - Get episode list
- `GET /api/sonarr/suggest` - Suggest anime matches
- `POST /api/sonarr/download` - Manually trigger download
### 5. Web Interface
- Single-page app at `/web` (templates/index.html)
- Auto-refreshes every second to show progress
- Video player with seeking support (HTTP Range headers)
- Dark theme with gradients and animations
### 6. Security Utilities (`app/utils.py`)
- `sanitize_filename(filename, max_length=255)` - Sanitize filenames to prevent path traversal
- Removes dangerous characters: `\ / : * ? " < > |`
- Strips path separators and leading dots/dashes
- Limits filename length while preserving extension
- `is_safe_filename(filename)` - Validate filename safety
- Checks for path traversal patterns (`..`, `/`, `\`)
- Detects absolute paths and drive letters
- Used throughout the codebase for file operations
### 7. Authentication System (`app/auth.py`)
- **UserManager** - JSON-based user storage in `config/users.json`
- User registration with bcrypt password hashing
- Password truncated to 72 bytes (bcrypt limitation)
- User authentication and last login tracking
- **JWT Tokens** - Stateless authentication
- 7-day token expiration (configurable via `ACCESS_TOKEN_EXPIRE_MINUTES`)
- HS256 algorithm with JWT_SECRET_KEY (change in production!)
- Token verification and user extraction
- **Password Security**
- bcrypt hashing with passlib
- Automatic deprecated scheme migration
- **Configuration**
- `JWT_SECRET_KEY` environment variable (default: dev-secret-change-in-production)
- Users stored in `config/users.json`
**Authentication Endpoints:**
- `POST /api/auth/register` - User registration
- `POST /api/auth/login` - Login and receive JWT token
- `GET /api/auth/me` - Get current user profile
- `PUT /api/auth/me` - Update user profile
### 8. Recommendation Engine (`app/recommendation_engine.py`)
- Analyzes download history to generate personalized recommendations
- Tracks genre preferences and viewing patterns
- Scores anime based on user's download history
- Used by `/api/recommendations` endpoint
### 9. Kitsu API (`app/kitsu_api.py`)
- Integrates with Kitsu anime database for metadata
- Fetches anime information by title or ID
- Provides enriched metadata (synopsis, genres, ratings, poster images)
- Used as fallback when provider metadata is incomplete
### 10. Pydantic Models (`app/models/`)
- **`__init__.py`** - Core models:
- `DownloadStatus` - Enum for task states (PENDING, DOWNLOADING, PAUSED, COMPLETED, FAILED, CANCELLED)
- `HostType` - Enum for file host types (RAPIDFILE, UNFICHIER, DOODSTREAM, OTHER)
- `DownloadTask` - Main task model with progress tracking
- `DownloadRequest` - Request model for creating downloads
- `AnimeMetadata` - Anime information (synopsis, genres, rating, release_year, studio, etc.)
- `AnimeSearchResult` - Enhanced search result with metadata
- **`sonarr.py`** - Sonarr-specific models:
- `SonarrWebhookPayload` - Complete webhook payload schema
- `SonarrEventType` - Enum for event types (Grab, Download, Rename, Delete, Test)
- `SonarrMapping` - Mapping between Sonarr series and anime providers
- `SonarrConfig` - Webhook configuration (enabled, secret, auto-download, etc.)
## Test Structure
**Test Organization (tests/):**
- `conftest.py` - Pytest configuration and fixtures
- `test_models.py` - Pydantic model tests
- `test_downloaders.py` - Downloader tests
- `test_download_manager.py` - DownloadManager tests
- `test_favorites.py` - Favorites system tests
- `test_api.py` - FastAPI endpoint tests
- `test_sonarr.py` - Sonarr integration tests
- `test_anime_sama_seasons.py` - Anime-Sama season handling tests
- `test_translate_api.py` - Translation API tests
- `test_delete_and_restore.py` - Delete and restore functionality tests
- `test_french_manga.py` - French-Manga provider tests
**Fixtures in conftest.py:**
- `temp_dir` - Temporary directory
- `temp_download_dir` - Temporary download directory
- `download_manager` - DownloadManager instance
- `favorites_manager` - FavoritesManager instance
- `mock_httpx_client` - Mock for httpx.AsyncClient
- `sample_download_task` - Sample task data
- `sample_anime_metadata` - Sample metadata
**Test Markers:**
- `unit` - Unit tests (isolated, fast) - auto-applied
- `integration` - Integration tests (API endpoints) - auto-applied
- `asyncio` - Async tests - auto-applied
- `slow` - Slow tests - manual
- `network` - Requires network - manual
**pytest.ini Configuration:**
- Auto-applies markers for async and integration tests
- Coverage enabled by default (`--cov=app`)
- HTML coverage report generated in `htmlcov/`
- Verbose output with local variables in tracebacks
- 300-second timeout for tests
- `asyncio_mode = auto` for async test support
**Running Single Test:**
```bash
# Run specific test file
pytest tests/test_sonarr.py -v
# Run specific test class
pytest tests/test_sonarr.py::TestSonarrHandler -v
# Run specific test
pytest tests/test_sonarr.py::TestSonarrHandler::test_add_mapping -v
```
## Adding New Host Support
To add support for a new file hosting service:
1. Create new file in `app/downloaders/video_players/` (e.g., `myhost.py`)
2. Inherit from `BaseVideoPlayer`
3. Implement required methods (`can_handle`, `get_download_link`)
4. Add to imports in `app/downloaders/video_players/__init__.py`
5. Add to `players` list in `get_video_player()`
6. Add configuration to `FILE_HOSTS` in `app/providers.py`
Example:
```python
from .base import BaseVideoPlayer
from bs4 import BeautifulSoup
class MyHostDownloader(BaseVideoPlayer):
def can_handle(self, url: str) -> bool:
return "myhost.com" in url.lower()
async def get_download_link(self, url: str, target_filename: Optional[str] = None) -> tuple[str, str]:
soup = BeautifulSoup(await self._fetch_page(url), 'lxml')
# ... extraction logic ...
# IMPORTANT: Always sanitize filenames!
from app.utils import sanitize_filename
filename = sanitize_filename(extracted_filename)
return download_url, filename
async def close(self):
# IMPORTANT: Always close the HTTP client
await self.client.aclose()
```
**Important:**
- Always close the HTTP client in your downloader to avoid resource leaks
- Use `sanitize_filename()` from `app.utils` when extracting filenames from URLs
- Use `is_safe_filename()` to validate filenames before file operations
- The `target_filename` parameter is required for compatibility with anime/series sites
## Adding New Series Site
To add a new TV series streaming provider (similar to anime sites but for general TV series):
1. Create new file in `app/downloaders/series_sites/` (e.g., `mysite.py`)
2. Inherit from `BaseSeriesSite`
3. Implement series-specific methods:
- `search_anime(query, lang)` - Return list of series with title, url, cover_image, lang
- `get_episodes(anime_url, lang)` - Return list of episodes
- `get_anime_metadata(anime_url)` - Return metadata dict (should include languages field)
- `get_download_link(url)` - Return video player URL from episode page
4. Add to imports in `app/downloaders/series_sites/__init__.py`
5. Add to `sites` list in `get_series_site()`
BaseSeriesSite is nearly identical to BaseAnimeSite but designed for general TV series content rather than anime-specific content.
## Sonarr Integration
The application includes full Sonarr webhook support for automated anime downloads.
### Architecture
**SonarrHandler (`app/sonarr_handler.py`):**
- Processes incoming webhooks from Sonarr
- Manages series mappings (Sonarr TVDB ID → Anime Provider URL)
- Supports HMAC SHA256 signature verification for security
- Auto-triggers downloads on Grab events
- Provides search and suggestion APIs for mapping setup
**Sonarr Models (`app/models/sonarr.py`):**
- `SonarrWebhookPayload` - Complete webhook payload schema
- `SonarrEventType` - Enum for event types (Grab, Download, Rename, Delete, Test)
- `SonarrMapping` - Mapping between Sonarr series and anime providers
- `SonarrConfig` - Webhook configuration (enabled, secret, auto-download, etc.)
### Workflow
1. **Setup in Sonarr:**
- Configure webhook: Settings > Connect > Sonarr > Webhook
- URL: `http://your-server:3000/api/webhook/sonarr`
- Enable "Grab" event
2. **Create Mappings:**
- Get Sonarr series TVDB ID from series details
- Search anime: `GET /api/sonarr/search?q={title}`
- Create mapping: `POST /api/sonarr/mappings`
3. **Automatic Download:**
- Sonarr grabs new episode → Sends webhook
- Ohm Stream Downloader receives webhook
- Looks up mapping by TVDB ID
- Finds matching episode on anime provider
- Creates and starts download task
### Configuration Files
- `config/sonarr.json` - Webhook configuration
- `config/sonarr_mappings.json` - Series mappings
### Example Mapping
```json
{
"sonarr_series_id": 79644,
"sonarr_title": "Naruto Shippuden",
"anime_provider": "anime-sama",
"anime_url": "https://anime-sama.si/catalogue/naruto-shippuden/saison1/vostfr/",
"anime_title": "Naruto Shippuden",
"lang": "vostfr",
"quality_preference": "1080p",
"auto_download": true
}
```
### Security
- Optional HMAC SHA256 signature verification
- Configure secret in both Sonarr and Ohm Stream Downloader
- Enable with `verify_hmac: true` in config
### Testing
- Test endpoint: `POST /api/webhook/test/sonarr`
- Manual trigger: `POST /api/sonarr/download`
- Get suggestions: `GET /api/sonarr/suggest?sonarr_title={title}`
**Documentation:** See `docs/SONARR_INTEGRATION.md` for complete setup guide.
## Adding New Anime Provider
To add a new anime streaming provider:
1. Create new file in `app/downloaders/anime_sites/` (e.g., `mysite.py`)
2. Inherit from `BaseAnimeSite`
3. Implement anime-specific methods:
- `search_anime(query, lang)` - Return list of anime with title, url, cover_image
- `get_episodes(anime_url, lang)` - Return list of episodes
- `get_anime_metadata(anime_url)` - Return metadata dict
- `get_download_link(url)` - Return video player URL from episode page
4. Add to imports in `app/downloaders/anime_sites/__init__.py`
5. Add to `sites` list in `get_anime_site()`
6. Add to `ANIME_PROVIDERS` in `app/providers.py`
7. Update `main.py` to include in unified search
Metadata should include:
- synopsis, genres, rating, release_year, studio, poster_image, total_episodes, status
## Configuration
The application uses environment variables for configuration via `app/config.py` (Pydantic Settings).
**Environment Variables (.env):**
```bash
# Copy the example file
cp .env.example .env
# Edit .env to configure:
APP_NAME=Ohm Stream Downloader # Application name
DEBUG=false # Debug mode
HOST=0.0.0.0 # Server host
PORT=3000 # Server port
DOWNLOAD_DIR=downloads # Download storage location
MAX_PARALLEL_DOWNLOADS=3 # Maximum concurrent downloads
CHUNK_SIZE=1048576 # Download chunk size (1MB)
CORS_ORIGINS=... # Comma-separated allowed origins
HTTP_TIMEOUT=10.0 # HTTP request timeout (seconds)
DOWNLOAD_TIMEOUT=300 # Download timeout (seconds)
LOG_LEVEL=INFO # Logging level
JWT_SECRET_KEY=change-me-in-production # JWT signing key for auth
```
**Configuration Files:**
- `.env` - Environment configuration (create from .env.example)
- `config/users.json` - User authentication database (created automatically)
- `config/sonarr.json` - Sonarr webhook configuration (created automatically)
- `config/sonarr_mappings.json` - Sonarr to anime provider mappings (created automatically)
- `config/.gitkeep` - Ensures config directory is tracked in git
- Example files: `config/sonarr.example.json`, `config/sonarr_mappings.example.json`
**Documentation:**
- `README.md` - User-facing features and roadmap
- `CLAUDE.md` - This file (developer guide)
- `docs/SONARR_INTEGRATION.md` - Complete Sonarr setup guide
- `docs/SONARR_IMPLEMENTATION.md` - Technical implementation summary
- `docs/IMPROVEMENTS_2024-01-24.md` - Recent security and quality improvements
## Key Implementation Details
**Resume Support:**
- Downloads use HTTP Range headers to resume from last byte
- Files downloaded in 1MB chunks
- Partial files cleaned up on cancel
- Resume position tracked in `downloaded_bytes` field
**Domain Handling:**
- Anime providers use dynamic domain detection (e.g., Anime-Sama fetches current domain from anime-sama.pw)
- Multiple domains per provider supported in configuration
- Domain detection via `detect_provider_from_url(url)` in providers.py
**Task Lifecycle:**
- PENDING → DOWNLOADING → PAUSED / COMPLETED / CANCELLED / FAILED
- Active downloads tracked in `active_downloads` dict
- All tasks stored in `tasks` dict with UUID keys
- Completed files preserved when deleting tasks (only partial files removed)
**Video Streaming:**
- Range header support for seeking in video player
- Serves from `/downloads` directory via StaticFiles
- Video extensions: .mp4, .mkv, .avi, .mov, .wmv, .flv, .webm
**Error Handling:**
- Graceful degradation with status tracking
- Network errors caught and reported in task status
- Automatic retry on resume
- Downloads > 1MB considered complete to skip small error files
## Dependencies
**Core:**
- fastapi - Web framework
- uvicorn - ASGI server
- httpx - Async HTTP client
- beautifulsoup4, lxml - HTML parsing
- aiofiles - Async file operations
- jieba - Chinese text segmentation for fuzzy search
**Testing:**
- pytest - Test framework
- pytest-asyncio - Async test support
- pytest-cov - Coverage reporting
- pytest-mock - Mocking support