Spaces:

fokan
/

trabb

Sleeping

App Files Files Community

fokan commited on Sep 3, 2025

Commit

d47cb66

1 Parent(s): 3e2ca56

first push

Browse files

Files changed (5) hide show

FIXES_APPLIED.md +117 -0
app/main.py +27 -7
test_api.py +108 -0
translator.py +131 -38
اخطاء.txt +0 -0

FIXES_APPLIED.md ADDED Viewed

	@@ -0,0 +1,117 @@

+# Translation Issues Fixed
+## Problems Addressed
+### 1. Translation Not Working (Files Remained Untranslated)
+**Problem**: Files were being processed but returned in the original language with 0 paragraphs translated.
+**Root Causes**:
+- Silent fallback behavior in `translate_text()` method
+- No validation of translation results
+- Missing error handling for API failures
+**Fixes Applied**:
+- **Enhanced `translate_text()` method**:
+  - Added API key validation before making requests
+  - Improved translation prompts for better results with Google Gemini 2.5 Pro
+  - Removed silent fallback to original text - now raises exceptions on failure
+  - Added validation to ensure translation actually occurred
+  - Increased token limits for better translation quality
+- **Improved error handling**:
+  - Added comprehensive exception handling in translation workflows
+  - Better validation of translated content
+  - Detailed logging to track translation progress
+- **Enhanced validation**:
+  - Check for empty or unchanged translation results
+  - Verify API responses before processing
+  - Ensure at least some content gets translated
+### 2. Format Preservation Issue
+**Problem**: User wanted files to maintain original filename and format (PDF→Word→translate→PDF workflow)
+**Current Behavior**: Created separate "translated_" prefixed files
+**Desired Behavior**: Receive PDF, convert to Word, translate, convert back to PDF with same filename
+**Fixes Applied**:
+- **Modified `translate_document()` method**:
+  - Output file now uses original filename (no "translated_" prefix)
+  - For PDF input: PDF→DOCX→translate→PDF with original filename
+  - For DOCX input: DOCX→translate→DOCX with original filename
+- **Updated file handling in `main.py`**:
+  - Both original and translated files now use same filename
+  - Better file copying and naming logic
+  - Improved response structure
+## Technical Improvements
+### 1. Robust Translation Logic
+```python
+# Before: Silent fallback
+if translation_failed:
+    return original_text  # Silent failure
+# After: Proper error handling
+if not translated or translated == text:
+    raise Exception("Translation failed: received empty or unchanged text")
+```
+### 2. Enhanced Error Reporting
+- Added detailed logging throughout the translation pipeline
+- Better API error messages
+- Validation at each step of the process
+### 3. Format Preservation Workflow
+```
+PDF Input → LibreOffice Convert to DOCX → Translate DOCX → Convert back to PDF (same filename)
+DOCX Input → Translate DOCX → Save as same filename
+```
+## Testing
+### API Key Testing
+Created `test_api.py` script to verify:
+- OPENROUTER_API_KEY is set correctly
+- API connection is working
+- Basic translation functionality
+### Usage
+Run the test script to verify setup:
+```bash
+python test_api.py
+```
+## Expected Results
+After these fixes:
+1. **Translation will work**: Files will be actually translated, not returned unchanged
+2. **Format preserved**: PDF files will be returned as PDF with same filename
+3. **Better error messages**: Clear feedback when translation fails
+4. **Robust operation**: Proper error handling instead of silent failures
+## Key Files Modified
+1. **`translator.py`**:
+   - Enhanced `translate_text()` method with validation
+   - Improved `translate_document()` for format preservation
+   - Better error handling in `translate_docx()` and `translate_pdf_direct()`
+2. **`app/main.py`**:
+   - Updated translation endpoint with better validation
+   - Fixed file naming to preserve original names
+   - Enhanced error reporting
+3. **`test_api.py`** (new):
+   - API key and connection testing
+   - Basic translation functionality verification
+## Usage Instructions
+1. **Set API Key**: Ensure `OPENROUTER_API_KEY` environment variable is set
+2. **Test Setup**: Run `python test_api.py` to verify configuration
+3. **Upload Files**: PDF or DOCX files will now be properly translated
+4. **Download Results**: Translated files maintain original format and filename
+The system now provides reliable translation with proper format preservation as requested.

app/main.py CHANGED Viewed

@@ -74,6 +74,7 @@ async def translate_document(
 ):
     """
     Translate a document (PDF or DOCX) using the specified model
     """
     if not file.filename:
         raise HTTPException(status_code=400, detail="No file provided")
@@ -87,6 +88,13 @@ async def translate_document(
             detail=f"Unsupported file type. Allowed: {', '.join(allowed_extensions)}"
         )
     # Create temporary directory for this translation
     with tempfile.TemporaryDirectory() as temp_dir:
         temp_path = Path(temp_dir)
@@ -99,6 +107,8 @@ async def translate_document(
         try:
             # Perform translation
             logger.info(f"Starting translation of {input_file} using model {model}")
             result = await translator.translate_document(
                 input_file=input_file,
                 model=model,
@@ -114,23 +124,28 @@ async def translate_document(
                 raise HTTPException(status_code=500, detail=error_details)
             if result.paragraphs_count == 0:
-                logger.warning("Translation completed but no paragraphs were translated")
-                # Still proceed but log the issue
             # Move files to uploads directory for serving
             timestamp = int(asyncio.get_event_loop().time())
             result_dir = UPLOAD_DIR / f"translation_{timestamp}"
             result_dir.mkdir(exist_ok=True)
-            # Copy result files
             final_files = {}
             if result.original_file.exists():
-                original_dest = result_dir / f"original_{result.original_file.name}"
                 shutil.copy2(result.original_file, original_dest)
                 final_files["original"] = str(original_dest.relative_to(UPLOAD_DIR))
             if result.translated_file.exists():
-                translated_dest = result_dir / f"translated_{result.translated_file.name}"
                 shutil.copy2(result.translated_file, translated_dest)
                 final_files["translated"] = str(translated_dest.relative_to(UPLOAD_DIR))
@@ -138,17 +153,22 @@ async def translate_document(
             report = {
                 "status": "success",
                 "original_filename": file.filename,
-                "translated_filename": result.translated_file.name,
                 "pages_translated": result.pages_count,
                 "paragraphs_translated": result.paragraphs_count,
                 "model_used": model,
                 "source_language": source_language,
                 "target_language": target_language,
-                "files": final_files
             }
             return JSONResponse(content=report)
         except Exception as e:
             logger.error(f"Translation error: {e}")
             raise HTTPException(status_code=500, detail=f"Translation failed: {str(e)}")

 ):
     """
     Translate a document (PDF or DOCX) using the specified model
+    Returns translated file with same name and format as original
     """
     if not file.filename:
         raise HTTPException(status_code=400, detail="No file provided")
             detail=f"Unsupported file type. Allowed: {', '.join(allowed_extensions)}"
         )
+    # Validate API key
+    if not translator.is_ready():
+        raise HTTPException(
+            status_code=500,
+            detail="Translation service not configured. Please check OPENROUTER_API_KEY."
+        )
     # Create temporary directory for this translation
     with tempfile.TemporaryDirectory() as temp_dir:
         temp_path = Path(temp_dir)
         try:
             # Perform translation
             logger.info(f"Starting translation of {input_file} using model {model}")
+            logger.info(f"Translation: {source_language} -> {target_language}")
             result = await translator.translate_document(
                 input_file=input_file,
                 model=model,
                 raise HTTPException(status_code=500, detail=error_details)
             if result.paragraphs_count == 0:
+                logger.error("Translation completed but no paragraphs were translated")
+                raise HTTPException(
+                    status_code=500,
+                    detail="Translation failed: No content was translated. Please check if the file contains readable text."
+                )
             # Move files to uploads directory for serving
             timestamp = int(asyncio.get_event_loop().time())
             result_dir = UPLOAD_DIR / f"translation_{timestamp}"
             result_dir.mkdir(exist_ok=True)
+            # Copy result files with original names (no prefix)
             final_files = {}
             if result.original_file.exists():
+                # Keep original filename
+                original_dest = result_dir / file.filename
                 shutil.copy2(result.original_file, original_dest)
                 final_files["original"] = str(original_dest.relative_to(UPLOAD_DIR))
             if result.translated_file.exists():
+                # Use original filename for translated file too
+                translated_dest = result_dir / file.filename
                 shutil.copy2(result.translated_file, translated_dest)
                 final_files["translated"] = str(translated_dest.relative_to(UPLOAD_DIR))
             report = {
                 "status": "success",
                 "original_filename": file.filename,
+                "translated_filename": file.filename,  # Same filename
                 "pages_translated": result.pages_count,
                 "paragraphs_translated": result.paragraphs_count,
                 "model_used": model,
                 "source_language": source_language,
                 "target_language": target_language,
+                "files": final_files,
+                "message": f"Successfully translated {result.paragraphs_count} paragraphs from {source_language} to {target_language}"
             }
+            logger.info(f"Translation completed successfully: {result.paragraphs_count} paragraphs translated")
             return JSONResponse(content=report)
+        except HTTPException:
+            # Re-raise HTTP exceptions
+            raise
         except Exception as e:
             logger.error(f"Translation error: {e}")
             raise HTTPException(status_code=500, detail=f"Translation failed: {str(e)}")

test_api.py ADDED Viewed

	@@ -0,0 +1,108 @@

+#!/usr/bin/env python3
+"""
+Test script to verify OpenRouter API key and translation functionality
+"""
+import os
+import asyncio
+import aiohttp
+from translator import DocumentTranslator
+async def test_api_key():
+    """Test if the API key is working"""
+    print("🔑 Testing OpenRouter API key...")
+    api_key = os.getenv("OPENROUTER_API_KEY")
+    if not api_key:
+        print("❌ OPENROUTER_API_KEY environment variable not set!")
+        print("Please set it with: set OPENROUTER_API_KEY=your_key_here")
+        return False
+    print(f"✅ API key found: {api_key[:10]}...")
+    # Test API connection
+    try:
+        headers = {
+            "Authorization": f"Bearer {api_key}",
+            "Content-Type": "application/json",
+            "HTTP-Referer": "https://huggingface.co",
+            "X-Title": "Document Translator"
+        }
+        async with aiohttp.ClientSession() as session:
+            async with session.get(
+                "https://openrouter.ai/api/v1/models",
+                headers=headers
+            ) as response:
+                if response.status == 200:
+                    print("✅ API connection successful!")
+                    return True
+                else:
+                    print(f"❌ API connection failed: {response.status}")
+                    error_text = await response.text()
+                    print(f"Error: {error_text}")
+                    return False
+    except Exception as e:
+        print(f"❌ API test failed: {e}")
+        return False
+async def test_translation():
+    """Test basic translation functionality"""
+    print("\n📝 Testing translation functionality...")
+    translator = DocumentTranslator()
+    if not translator.is_ready():
+        print("❌ Translator not ready - API key issue")
+        return False
+    try:
+        # Test simple translation
+        test_text = "Hello, this is a test document."
+        print(f"Original text: {test_text}")
+        translated = await translator.translate_text(
+            text=test_text,
+            model="google/gemini-2.5-pro-exp-03-25",
+            source_lang="en",
+            target_lang="ar"
+        )
+        print(f"Translated text: {translated}")
+        if translated != test_text:
+            print("✅ Translation working correctly!")
+            return True
+        else:
+            print("❌ Translation returned original text - may indicate an issue")
+            return False
+    except Exception as e:
+        print(f"❌ Translation test failed: {e}")
+        return False
+async def main():
+    """Run all tests"""
+    print("🧪 Testing Document Translator Setup\n")
+    # Test API key
+    api_ok = await test_api_key()
+    if api_ok:
+        # Test translation
+        translation_ok = await test_translation()
+        if translation_ok:
+            print("\n🎉 All tests passed! The translator should work correctly.")
+        else:
+            print("\n⚠️ Translation test failed. Check the logs for details.")
+    else:
+        print("\n❌ API key test failed. Please check your OPENROUTER_API_KEY.")
+    print("\n📋 Next steps:")
+    print("1. Make sure OPENROUTER_API_KEY is set correctly")
+    print("2. Upload a PDF or DOCX file to test the full workflow")
+    print("3. Check the translation.log file for detailed logs")
+if __name__ == "__main__":
+    asyncio.run(main())

translator.py CHANGED Viewed

@@ -61,10 +61,14 @@ class DocumentTranslator:
         ]
     async def translate_text(self, text: str, model: str, source_lang: str = "auto", target_lang: str = "en") -> str:
-        """Translate text using OpenRouter API with improved prompt"""
         if not text.strip():
             return text
         # Create a more specific translation prompt
         if source_lang == "auto":
             prompt = f"""You are a professional document translator. Translate the following text to {target_lang} (Arabic if 'ar', English if 'en', etc.).
@@ -74,6 +78,7 @@ IMPORTANT INSTRUCTIONS:
 2. Maintain the original formatting and structure
 3. Preserve technical terms appropriately
 4. Return ONLY the translated text
 Text to translate:
 {text}
@@ -87,6 +92,7 @@ IMPORTANT INSTRUCTIONS:
 2. Maintain the original formatting and structure
 3. Preserve technical terms appropriately
 4. Return ONLY the translated text
 Text to translate:
 {text}
@@ -98,11 +104,11 @@ Translated text:"""
                 payload = {
                     "model": model,
                     "messages": [
-                        {"role": "system", "content": "You are a professional document translator. Provide direct translations without any explanations or additional text."},
                         {"role": "user", "content": prompt}
                     ],
                     "temperature": 0.1,
-                    "max_tokens": len(text) * 3 + 200  # More generous token limit for Arabic
                 }
                 logger.info(f"Translating text: '{text[:50]}...' from {source_lang} to {target_lang}")
@@ -120,15 +126,26 @@ Translated text:"""
                         if "Translated text:" in translated:
                             translated = translated.split("Translated text:")[-1].strip()
                         logger.info(f"Translation successful: '{translated[:50]}...'")
                         return translated
                     else:
                         error_text = await response.text()
                         logger.error(f"Translation API error: {response.status} - {error_text}")
-                        return text  # Return original text if translation fails
         except Exception as e:
             logger.error(f"Translation error: {e}")
-            return text  # Return original text if translation fails
     def extract_text_from_pdf(self, pdf_path: Path) -> str:
         """Extract text directly from PDF as fallback method"""
@@ -175,20 +192,28 @@ Translated text:"""
                 if len(paragraph.strip()) > 10:  # Only translate substantial paragraphs
                     logger.info(f"Translating paragraph {i+1}/{len(paragraphs)}: '{paragraph[:50]}...'")
-                    translated_text = await self.translate_text(
-                        paragraph, model, source_lang, target_lang
-                    )
-                    # Add translated paragraph to document
-                    doc.add_paragraph(translated_text)
-                    paragraphs_translated += 1
                     # Add delay to avoid rate limiting
-                    await asyncio.sleep(0.2)
                 else:
                     # Add short text as-is
                     doc.add_paragraph(paragraph)
             # Save translated document
             translated_path = output_dir / f"translated_{pdf_path.stem}.docx"
             doc.save(translated_path)
@@ -284,7 +309,7 @@ Translated text:"""
             raise
     async def translate_docx(self, docx_path: Path, model: str, source_lang: str, target_lang: str, output_dir: Path) -> Tuple[Path, int]:
-        """Translate DOCX document paragraph by paragraph with enhanced debugging"""
         try:
             # Load the document
             logger.info(f"Loading DOCX document: {docx_path}")
@@ -298,6 +323,9 @@ Translated text:"""
             text_paragraphs = [p for p in doc.paragraphs if p.text.strip()]
             logger.info(f"Found {len(text_paragraphs)} paragraphs with text content")
             # Log first few paragraphs for debugging
             for i, paragraph in enumerate(text_paragraphs[:3]):
                 logger.info(f"Sample paragraph {i+1}: '{paragraph.text[:100]}...'")
@@ -308,21 +336,27 @@ Translated text:"""
                     original_text = paragraph.text.strip()
                     logger.info(f"Translating paragraph {paragraphs_count + 1}/{len(text_paragraphs)}: '{original_text[:50]}...'")
-                    translated_text = await self.translate_text(
-                        original_text, model, source_lang, target_lang
-                    )
-                    # Verify translation actually happened
-                    if translated_text != original_text:
-                        logger.info(f"Translation successful: '{translated_text[:50]}...'")
-                    else:
-                        logger.warning(f"Translation returned original text for: '{original_text[:50]}...'")
-                    paragraph.text = translated_text
-                    paragraphs_count += 1
                     # Add small delay to avoid rate limiting
-                    await asyncio.sleep(0.2)
             # Translate tables if any
             table_cells_translated = 0
@@ -332,16 +366,23 @@ Translated text:"""
                     for cell_idx, cell in enumerate(row.cells):
                         if cell.text.strip():
                             original_text = cell.text.strip()
-                            translated_text = await self.translate_text(
-                                original_text, model, source_lang, target_lang
-                            )
-                            cell.text = translated_text
-                            table_cells_translated += 1
                             await asyncio.sleep(0.1)
             logger.info(f"Translated {table_cells_translated} table cells")
             total_translated = paragraphs_count + table_cells_translated
             # Save translated document
             translated_path = output_dir / f"translated_{docx_path.name}"
             doc.save(translated_path)
@@ -352,6 +393,8 @@ Translated text:"""
             if translated_path.exists():
                 file_size = translated_path.stat().st_size
                 logger.info(f"Translated document saved (size: {file_size} bytes)")
             return translated_path, total_translated
@@ -369,12 +412,14 @@ Translated text:"""
     ) -> TranslationReport:
         """
         Main translation function that handles both PDF and DOCX files
         """
         if output_dir is None:
             output_dir = input_file.parent
         original_file = input_file
         file_extension = input_file.suffix.lower()
         try:
             if file_extension == ".pdf":
@@ -396,9 +441,27 @@ Translated text:"""
                         logger.warning("LibreOffice conversion produced no translatable content, trying direct extraction")
                         raise Exception("No content found in LibreOffice conversion")
-                    # Convert translated DOCX back to PDF
-                    logger.info(f"Converting translated DOCX back to PDF")
-                    translated_file = self.docx_to_pdf(translated_docx, output_dir)
                 except Exception as libreoffice_error:
                     logger.warning(f"LibreOffice method failed: {libreoffice_error}")
@@ -409,8 +472,25 @@ Translated text:"""
                         input_file, model, source_language, target_language, output_dir
                     )
-                    # Convert the translated DOCX to PDF
-                    translated_file = self.docx_to_pdf(translated_docx, output_dir)
                 # Estimate pages (rough estimate: 1 page = ~500 words)
                 doc = Document(translated_docx)
@@ -418,12 +498,21 @@ Translated text:"""
                 pages_count = max(1, total_words // 500)
             elif file_extension == ".docx":
-                # Translate DOCX directly
                 logger.info(f"Translating DOCX {input_file}")
                 translated_file, paragraphs_count = await self.translate_docx(
                     input_file, model, source_language, target_language, output_dir
                 )
                 # Estimate pages
                 doc = Document(translated_file)
                 total_words = sum(len(p.text.split()) for p in doc.paragraphs)
@@ -432,6 +521,10 @@ Translated text:"""
             else:
                 raise Exception(f"Unsupported file format: {file_extension}")
             return TranslationReport(
                 original_file=original_file,
                 translated_file=translated_file,

         ]
     async def translate_text(self, text: str, model: str, source_lang: str = "auto", target_lang: str = "en") -> str:
+        """Translate text using OpenRouter API with improved prompt and validation"""
         if not text.strip():
             return text
+        # Validate API key first
+        if not self.api_key:
+            raise Exception("OpenRouter API key not configured")
         # Create a more specific translation prompt
         if source_lang == "auto":
             prompt = f"""You are a professional document translator. Translate the following text to {target_lang} (Arabic if 'ar', English if 'en', etc.).
 2. Maintain the original formatting and structure
 3. Preserve technical terms appropriately
 4. Return ONLY the translated text
+5. If the text is already in the target language, still provide a proper translation/rewrite
 Text to translate:
 {text}
 2. Maintain the original formatting and structure
 3. Preserve technical terms appropriately
 4. Return ONLY the translated text
+5. If the text is already in the target language, still provide a proper translation/rewrite
 Text to translate:
 {text}
                 payload = {
                     "model": model,
                     "messages": [
+                        {"role": "system", "content": "You are a professional document translator. You MUST provide a translation. Never return the original text unchanged."},
                         {"role": "user", "content": prompt}
                     ],
                     "temperature": 0.1,
+                    "max_tokens": len(text) * 4 + 500  # More generous token limit
                 }
                 logger.info(f"Translating text: '{text[:50]}...' from {source_lang} to {target_lang}")
                         if "Translated text:" in translated:
                             translated = translated.split("Translated text:")[-1].strip()
+                        # Remove any introductory phrases
+                        for phrase in ["Here is the translation:", "Translation:", "The translation is:"]:
+                            if translated.startswith(phrase):
+                                translated = translated[len(phrase):].strip()
+                        # Validate that we got a meaningful translation
+                        if not translated or translated == text:
+                            logger.warning(f"Translation returned empty or unchanged text")
+                            # Don't fall back to original - raise error instead
+                            raise Exception("Translation failed: received empty or unchanged text")
                         logger.info(f"Translation successful: '{translated[:50]}...'")
                         return translated
                     else:
                         error_text = await response.text()
                         logger.error(f"Translation API error: {response.status} - {error_text}")
+                        raise Exception(f"Translation API error: {response.status} - {error_text}")
         except Exception as e:
             logger.error(f"Translation error: {e}")
+            raise Exception(f"Translation failed: {str(e)}")
     def extract_text_from_pdf(self, pdf_path: Path) -> str:
         """Extract text directly from PDF as fallback method"""
                 if len(paragraph.strip()) > 10:  # Only translate substantial paragraphs
                     logger.info(f"Translating paragraph {i+1}/{len(paragraphs)}: '{paragraph[:50]}...'")
+                    try:
+                        translated_text = await self.translate_text(
+                            paragraph, model, source_lang, target_lang
+                        )
+                        # Add translated paragraph to document
+                        doc.add_paragraph(translated_text)
+                        paragraphs_translated += 1
+                    except Exception as trans_error:
+                        logger.error(f"Failed to translate paragraph: {trans_error}")
+                        raise Exception(f"Translation failed for paragraph: {str(trans_error)}")
                     # Add delay to avoid rate limiting
+                    await asyncio.sleep(0.3)
                 else:
                     # Add short text as-is
                     doc.add_paragraph(paragraph)
+            if paragraphs_translated == 0:
+                raise Exception("No paragraphs were successfully translated")
             # Save translated document
             translated_path = output_dir / f"translated_{pdf_path.stem}.docx"
             doc.save(translated_path)
             raise
     async def translate_docx(self, docx_path: Path, model: str, source_lang: str, target_lang: str, output_dir: Path) -> Tuple[Path, int]:
+        """Translate DOCX document paragraph by paragraph with enhanced validation"""
         try:
             # Load the document
             logger.info(f"Loading DOCX document: {docx_path}")
             text_paragraphs = [p for p in doc.paragraphs if p.text.strip()]
             logger.info(f"Found {len(text_paragraphs)} paragraphs with text content")
+            if len(text_paragraphs) == 0:
+                raise Exception("No text content found in document")
             # Log first few paragraphs for debugging
             for i, paragraph in enumerate(text_paragraphs[:3]):
                 logger.info(f"Sample paragraph {i+1}: '{paragraph.text[:100]}...'")
                     original_text = paragraph.text.strip()
                     logger.info(f"Translating paragraph {paragraphs_count + 1}/{len(text_paragraphs)}: '{original_text[:50]}...'")
+                    try:
+                        translated_text = await self.translate_text(
+                            original_text, model, source_lang, target_lang
+                        )
+                        # Verify translation actually happened
+                        if translated_text == original_text:
+                            logger.warning(f"Translation returned identical text for: '{original_text[:50]}...'")
+                            # Continue anyway - maybe it was already in target language
+                        else:
+                            logger.info(f"Translation successful: '{translated_text[:50]}...'")
+                        paragraph.text = translated_text
+                        paragraphs_count += 1
+                    except Exception as trans_error:
+                        logger.error(f"Failed to translate paragraph: {trans_error}")
+                        raise Exception(f"Translation failed for paragraph: {str(trans_error)}")
                     # Add small delay to avoid rate limiting
+                    await asyncio.sleep(0.3)
             # Translate tables if any
             table_cells_translated = 0
                     for cell_idx, cell in enumerate(row.cells):
                         if cell.text.strip():
                             original_text = cell.text.strip()
+                            try:
+                                translated_text = await self.translate_text(
+                                    original_text, model, source_lang, target_lang
+                                )
+                                cell.text = translated_text
+                                table_cells_translated += 1
+                            except Exception as trans_error:
+                                logger.warning(f"Failed to translate table cell: {trans_error}")
+                                # Continue with other cells
                             await asyncio.sleep(0.1)
             logger.info(f"Translated {table_cells_translated} table cells")
             total_translated = paragraphs_count + table_cells_translated
+            if total_translated == 0:
+                raise Exception("No content was successfully translated")
             # Save translated document
             translated_path = output_dir / f"translated_{docx_path.name}"
             doc.save(translated_path)
             if translated_path.exists():
                 file_size = translated_path.stat().st_size
                 logger.info(f"Translated document saved (size: {file_size} bytes)")
+            else:
+                raise Exception("Failed to save translated document")
             return translated_path, total_translated
     ) -> TranslationReport:
         """
         Main translation function that handles both PDF and DOCX files
+        Maintains original filename and format (PDF input returns PDF output)
         """
         if output_dir is None:
             output_dir = input_file.parent
         original_file = input_file
         file_extension = input_file.suffix.lower()
+        original_filename = input_file.stem  # filename without extension
         try:
             if file_extension == ".pdf":
                         logger.warning("LibreOffice conversion produced no translatable content, trying direct extraction")
                         raise Exception("No content found in LibreOffice conversion")
+                    # Convert translated DOCX back to PDF with ORIGINAL filename
+                    logger.info(f"Converting translated DOCX back to PDF with original filename")
+                    final_translated_file = output_dir / f"{original_filename}.pdf"
+                    # Use LibreOffice to convert with specific output name
+                    cmd = [
+                        "libreoffice",
+                        "--headless",
+                        "--convert-to", "pdf",
+                        "--outdir", str(output_dir),
+                        str(translated_docx)
+                    ]
+                    result = subprocess.run(cmd, capture_output=True, text=True, timeout=60)
+                    # LibreOffice creates file with docx stem name, rename to original
+                    temp_pdf = output_dir / f"{translated_docx.stem}.pdf"
+                    if temp_pdf.exists() and temp_pdf != final_translated_file:
+                        temp_pdf.rename(final_translated_file)
+                    translated_file = final_translated_file
                 except Exception as libreoffice_error:
                     logger.warning(f"LibreOffice method failed: {libreoffice_error}")
                         input_file, model, source_language, target_language, output_dir
                     )
+                    # Convert the translated DOCX to PDF with original filename
+                    final_translated_file = output_dir / f"{original_filename}.pdf"
+                    cmd = [
+                        "libreoffice",
+                        "--headless",
+                        "--convert-to", "pdf",
+                        "--outdir", str(output_dir),
+                        str(translated_docx)
+                    ]
+                    result = subprocess.run(cmd, capture_output=True, text=True, timeout=60)
+                    # LibreOffice creates file with docx stem name, rename to original
+                    temp_pdf = output_dir / f"{translated_docx.stem}.pdf"
+                    if temp_pdf.exists() and temp_pdf != final_translated_file:
+                        temp_pdf.rename(final_translated_file)
+                    translated_file = final_translated_file
                 # Estimate pages (rough estimate: 1 page = ~500 words)
                 doc = Document(translated_docx)
                 pages_count = max(1, total_words // 500)
             elif file_extension == ".docx":
+                # Translate DOCX directly, keeping original filename
                 logger.info(f"Translating DOCX {input_file}")
+                # Create output file with original filename
+                final_translated_file = output_dir / f"{original_filename}.docx"
                 translated_file, paragraphs_count = await self.translate_docx(
                     input_file, model, source_language, target_language, output_dir
                 )
+                # Rename to original filename if different
+                if translated_file != final_translated_file:
+                    translated_file.rename(final_translated_file)
+                    translated_file = final_translated_file
                 # Estimate pages
                 doc = Document(translated_file)
                 total_words = sum(len(p.text.split()) for p in doc.paragraphs)
             else:
                 raise Exception(f"Unsupported file format: {file_extension}")
+            # Verify translation was successful
+            if paragraphs_count == 0:
+                raise Exception("Translation failed: No paragraphs were translated")
             return TranslationReport(
                 original_file=original_file,
                 translated_file=translated_file,

اخطاء.txt DELETED Viewed

The diff for this file is too large to render. See raw diff