CLI Redaction Test Suite
This test suite provides comprehensive testing for the cli_redact.py script based on all the examples shown in the CLI epilog.
Overview
The test suite includes tests for:
PDF Redaction Examples
- Default settings (local OCR)
- Text extraction only (no redaction)
- Text extraction with whole page redaction
- Redaction with allow lists
- Limited pages with custom fuzzy matching
- Custom deny/allow/whole page lists
- Image redaction
Tabular Anonymisation Examples
- CSV anonymisation with specific columns
- Different anonymisation strategies
- Word document anonymisation
AWS Services Examples
- Textract and Comprehend redaction
- Signature extraction
- Layout extraction
Duplicate Detection Examples
- Duplicate pages in OCR files
- Line-level duplicate detection
- Tabular duplicate detection
Textract Batch Operations
- Submit documents for analysis
- Retrieve results by job ID
- List recent jobs
Running the Tests
Method 1: Run the test suite directly
cd test
python test.py
Method 2: Use the convenience script
cd test
python run_tests.py
Method 3: Run with unittest
cd test
python -m unittest test.test.TestCLIRedactExamples -v
Test Behavior
- File Dependencies: Tests will be skipped if required example files are not found in the
example_data/directory - AWS Tests: AWS-related tests may fail if credentials are not configured, but this is expected
- Temporary Output: All tests use temporary output directories that are cleaned up automatically
- Timeout: Each test has a 10-minute timeout to prevent hanging
Test Structure
The test suite uses Python's unittest framework with the following structure:
TestCLIRedactExamples: Main test class containing all test methodsrun_cli_redact(): Helper function that executes the CLI script with specified parametersrun_all_tests(): Main function that runs all tests and provides a summary
Example Output
================================================================================
DOCUMENT REDACTION CLI TEST SUITE
================================================================================
This test suite runs through all the examples from the CLI epilog.
Tests will be skipped if required example files are not found.
AWS-related tests may fail if credentials are not configured.
================================================================================
Test setup complete. Script: /path/to/cli_redact.py
Example data directory: /path/to/example_data
Temp output directory: /tmp/test_output_xyz
=== Testing PDF redaction with default settings ===
β
PDF redaction with default settings passed
=== Testing PDF text extraction only ===
β
PDF text extraction only passed
...
================================================================================
TEST SUMMARY
================================================================================
Tests run: 20
Failures: 0
Errors: 0
Skipped: 2
Overall result: β
PASSED
================================================================================
Requirements
- Python 3.6+
- All dependencies for the main CLI script
- Example data files in the
example_data/directory (for full test coverage) - AWS credentials (for AWS-related tests)
Notes
- Tests are designed to be robust and will skip gracefully if files are missing
- AWS tests are marked as completed even if they fail due to missing credentials
- The test suite provides detailed output for debugging
- All temporary files are cleaned up automatically