Unstructured
ActiveDescription
Unstructured provides document parsing and cleaning capabilities, commonly used in RAG ingestion and preprocessing pipelines.
Key Features
- Open-source document parsing for PDFs, HTML, Word docs, and more
- Modular partitioning functions for text extraction and structure detection
- Docker support with multi-platform images for x86_64 and Apple Silicon
- Integration-ready for RAG ingestion and preprocessing pipelines
- Supports images, tables, and complex document layouts
- PyPI installable with local development setup
Use Cases
Categories
Quick Start
1. Pull the Docker image: `docker pull downloads.unstructured.io/unstructured-io/unstructured:latest`.
2. Or install from PyPI: `pip install unstructured`.
3. Run partitioning on your documents using the `partition` function.
4. Use the structured output in your RAG or LLM pipeline.