AI-DMS: AI-Driven Intelligent Document Management System

By Deevia Software India Pvt Ltd AI

1. Inspiration

Organizations everywhere are still buried under mountains of PDFs, scanned files, manuals, and records with no smart way to search them.
Finding one small piece of information can take hours of manual digging.
Most document systems just store files; they don’t actually understand what’s inside them.
Teams constantly ask, “Where is that document?” or “Where was that clause mentioned?”
We want to turn document collections into something you can actually talk to—an intelligent, searchable knowledge base.
We want to show that GridDB isn’t just for IoT and sensor data; it can power smart document intelligence too.

Turns static documents into an intelligent, searchable knowledge base; it understands content, not just file names.
Enables AI-powered semantic search and chat over documents, including text extraction from scanned files using OCR as part of the AI pipeline.
Provides secure Role-Based Access Control (RBAC) across folders and documents.
Reduces LLM costs by reusing previous responses through semantic similarity matching on stored Q/A embeddings.
Maintains an immutable, filterable, and exportable audit trail of all users, documents, chats, and system events.
Supports document versioning, archival, structured folder management, and metadata tagging.

Backend: Built with Python FastAPI (async-enabled) and Socket.IO for real-time WebSocket communication.
Frontend: Developed using React and TypeScript for a responsive, modern UI.
Database: Powered by GridDB (via JPype) using a container-per-file architecture; each document has its own container to enable parallel reads and writes without contention.
Lifecycle Management: Leveraged GridDB’s built-in partition expiry to automatically manage and clean up chat history, Q/A data, and audit logs.

Designing a complex ACL tree to control access at the file and folder level was simplified by leveraging GridDB’s container model, where access directly maps to allowed containers.
Performing in-file search across thousands of documents without triggering expensive table scans was solved through a container-per-file architecture that eliminates full-table scans.
Handling document version comparisons that were becoming heavy SQL-style operations was optimized using GridDB’s time-ordered container model for natural time-series queries.
Managing chat and Q&A lifecycle (TTL, cache invalidation, cleanup) traditionally requiring tools like Redis was handled natively using GridDB’s partition expiry.

Proved that GridDB is not limited to sensor/IoT time-series; it excels wherever data behaves like time-ordered events (documents, chats, questions, versions, audits).
Designed the entire system around GridDB’s container philosophy instead of forcing it into a relational database pattern.
Built cost-effective Q&A that avoids redundant LLM calls by reusing past answers through semantic similarity search.
Achieved RBAC-aware search that is naturally scoped with no full table scans, no join-heavy ACL traversal, just direct container lookup.
Eliminated the need for external caching (e.g., Redis) and scheduled cleanup (cron jobs) by leveraging GridDB’s native partition expiry.
Delivered a complete end-to-end intelligent document platform—from OCR ingestion to semantic chat—as a working prototype.

Documents, chats, Q&A pairs, versions, and audit logs behave like event streams making a time-series database a more natural fit than a traditional RDBMS.
A container-per-file architecture eliminates table contention and simplifies access control compared to row-level filtering with complex joins.
GridDB’s hybrid in-memory + disk model enables built-in hot/cold data separation without requiring a separate caching layer.
Working with JPype requires disciplined thread-safety management, but the architectural and performance benefits make it worthwhile.
Designing around the database’s strengths—instead of forcing relational patterns—results in a cleaner and simpler system architecture.

Enable parallel uploads, searches, and chats that scale naturally since each document operates within its own container, avoiding table contention.
Extend version comparison and document evolution into native time-series analytics capabilities.
Optimize hot/cold data movement so active data remains in memory while older data shifts to disk, scaling efficiently to millions of files without redesign.
Introduce cross-department knowledge graphs linking related documents across organizational boundaries.
Enhance multi-language OCR and translation pipelines for broader accessibility.
Scale the system into a production-grade deployment suitable for real PSU and enterprise environments.
Integrate a structure-first RAG that navigates a document’s TOC tree instead of using embeddings, chunking, or vector search.

Team Members: Deevia Software India Pvt Ltd