Spaces:

seanpedrickcase
/

document_redaction

Running

document_redaction / requirements.txt

Improved paddle and hybrid OCR analysis across all options. Tried to revise requirements for spaces

2c00d05 3 days ago

1.27 kB

	# --- Core and data packages ---
	numpy==2.2.6
	pandas==2.3.3
	polars==1.33.1
	pyarrow==21.0.0
	openpyxl==3.1.5
	boto3==1.40.57
	python-dotenv==1.0.1
	defusedxml==0.7.1
	Faker==37.8.0
	python-levenshtein==0.27.1
	rapidfuzz==3.14.1

	# --- PDF / OCR / Redaction tools ---
	pdfminer.six==20250506
	pdf2image==1.17.0
	pymupdf==1.26.4
	pikepdf==9.11.0
	opencv-python==4.12.0.88
	presidio_analyzer==2.2.360
	presidio_anonymizer==2.2.360
	presidio-image-redactor==0.0.57

	# --- Document generation ---
	python-docx==1.2.0

	# --- Gradio and apps ---
	gradio==5.49.1
	https://github.com/seanpedrick-case/gradio_image_annotator/releases/download/v0.3.3/gradio_image_annotation-0.3.3-py3-none-any.whl # Custom annotator version with rotation, zoom, labels, and box IDs
	spaces==0.42.1

	# --- AWS Lambda runtime ---
	awslambdaric==3.1.1

	# --- Machine learning / NLP ---
	scikit-learn==1.7.2
	spacy==3.8.7
	spaczz==0.6.1
	en_core_web_lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0.tar.gz
	transformers==4.57.1
	accelerate==1.11.0

	# --- Testing ---
	pytest>=7.0.0
	pytest-cov>=4.0.0

	# --- PyTorch (CUDA 12.6) ---
	--extra-index-url https://download.pytorch.org/whl/cu126
	torch<=2.8.0
	torchvision>=0.20.1