
Cracking the Code: AI-native Intelligent Document Processing for Medical Records
20 February, 2025 | 8 Min | By Srivatsan SridharHealthcare data is as vast as it is complex, encompassing millions of medical records, clinical notes, diagnostic reports, and administrative documents generated each day. AI-native Intelligent Document Processing is crucial in handling this data, given its massive volume and diverse formats—ranging from structured Electronic Health Records (EHRs) to unstructured handwritten notes or scanned documents.
The stakes in healthcare data management are exceptionally high. Errors in processing or misclassifications can lead to delayed care, incorrect billing, or compliance violations. Moreover, the sensitive nature of this data demands unwavering adherence to privacy regulations like HIPAA, adding another layer of complexity. Traditional methods of data indexing and retrieval, reliant on manual intervention or template-based systems, often fall short of meeting these challenges at scale.
At Dexit, we’ve harnessed AI-native Intelligent Document Processing to simplify and supercharge the indexing of medical records. By using cutting-edge machine learning, we perform document classification and entity extraction, our platform doesn't just automate—it redefines how medical records are processed. No barcodes. No rigid templates. Just high-speed, high-accuracy solutions built to handle complex, real-world healthcare documentation. Going beyond advanced LLMs, Dexit is trained on the specific demands and data of the healthcare industry, delivering precision and performance where it matters most.
This blog unpacks the tech behind Dexit’s Intelligent Document Processing: the experiments, breakthroughs, and engineering finesse that make it a game-changer for health tech pros.
Let’s get into the details—how it works, where it excels, and what’s next.
Contents
- Experiments, Insights, and Innovations Behind Dexit’s AI-native Intelligent Document Processing
- How Do we Achieve High Accuracy?
- Exploratory Data Analysis (EDA) in Dexit’s AI
- When the Model Makes Mistakes - Human Feedback for Model Retraining
- Commitment to Data Privacy and Security
- Dexit in the Real World & How We Are Getting Better
Experiments, Insights, and Innovations Behind Dexit’s AI-native Intelligent Document Processing
Dexit revolutionizes healthcare document management with Intelligent Document Processing to classify files, extract key entities, and enable seamless auto-indexing. This capability allows organizations to handle vast and complex document repositories efficiently, eliminating reliance on traditional identifiers like templates or barcodes. Instead, Dexit leverages visual and spatial context for unmatched accuracy in extracting details such as patient names and addresses.
However, achieving this level of precision was no straightforward feat. The Dexit team explored multiple avenues for document classification and entity extraction before arriving at the optimal solution through AI-native Intelligent Document Processing.
Document Classification: From Trial to Triumph
Early experiments focused on image similarity metrics, utilizing methods like Structural Similarity Index Measure (SSIM) and Mean Squared Error (MSE) to group visually similar pages. While this achieved modest success, its inability to incorporate textual content resulted in misclassifications when documents with similar layouts varied in type.
Next, the team developed a fusion strategy, combining visual and textual analysis through models like CNN and ResNet alongside ClinicalBERT for text classification. Although accuracy improved, the inability to account for spatial relationships between text elements limited its effectiveness, particularly for complex layouts like forms and invoices.
The breakthrough came with the adoption of a State-of-the-Art (SOTA) image classification model. This approach integrates both visual and spatial information, enabling highly accurate classification even for structured documents. Trained on a dataset of 300 images spanning 100 document types, this solution achieved 97% accuracy overall, with some room for improvement in documents with high variation.
Entity Extraction: Overcoming Challenges
The journey of entity extraction was equally rigorous. Initial methods included tools like DocQuery, which struggled due to lack of support and adaptability, and InternVL 2.0, which excelled at question answering but faltered in token-level entity extraction. Similarly, BERT required extensive preprocessing and lacked document structural awareness, making it inefficient for Dexit’s needs.
The final solution combines a fine-tuned model with an OCR and Large Language Model (LLM) workflow, tailored to specific document requirements:
- Fine-Tuned Model: Designed for auto-indexing, this model uses token-level bounding boxes and OCR to capture granular spatial relationships, achieving precise entity extraction even for multi-word entities.
- OCR + LLM Workflow: For documents requiring natural language queries, this approach integrates OCR with Llama3 for entity extraction. By preserving text layout during OCR, the model maintains a high accuracy rate, avoiding the common pitfalls of layout-dependent errors.
Unmatched Efficiency and Accuracy
Dexit’s final design delivers a powerful blend of classification, extraction, and indexing capabilities, allowing users to search and retrieve documents based on content rather than tags. Whether handling invoices, medical reports, or forms, Dexit streamlines workflows, reduces errors, and saves time.
Here’s a quick look at the Dexit way of handling document classification and entity extraction:

How Do We Achieve High Accuracy?
Dexit’s approach to achieving high accuracy in entity extraction extends beyond traditional text-based models by integrating textual, visual, and spatial information. This enables the model to make nuanced and accurate predictions, particularly for structured documents. Here’s a breakdown of the key strategies:
- Capturing Textual Features: Dexit analyzes keywords, phrases, and semantic patterns using advanced NLP to recognize document types and contextual relationships between terms.
- Incorporating Visual Cues: The model incorporates spatial information, such as text positioning, tables, and headers, to identify unique visual structures crucial for entity extraction.
- Joint Representation Learning: Textual and visual features are combined into a unified representation, enabling the model to learn complex relationships and improve accuracy for structured documents.
- Bounding Box Annotation: Using labeling tools, bounding boxes are manually drawn around entities, teaching the model to recognize spatial relationships critical for accurate entity labeling.
- OCR with Layout Preservation: The OCR engine extracts text and bounding boxes while preserving spatial relationships, maintaining the original layout for higher accuracy in downstream entity extraction.
Exploratory Data Analysis (EDA) in Dexit’s AI
For both document type classification and entity extraction, thorough Exploratory Data Analysis (EDA) is performed to ensure that the models are trained with clean, high-quality data. This process maximizes model performance by identifying key patterns and preparing data for optimal model input. Here's a breakdown of the steps involved:
- Data Collection: A diverse set of documents is gathered from the client, covering the different document types relevant to their operations. This includes forms, reports, and other domain-specific documents, ensuring broad representation across document categories.
- Data Preprocessing: Collected documents are preprocessed to standardize formats and improve the accuracy of subsequent analysis:
- Text Normalisation: Converts text to lowercase, removes special characters, and standardizes variations of common words to reduce noise and ensure uniformity.
- Noise Removal: Irrelevant elements such as watermarks, stamps, or handwritten annotations are eliminated using techniques like OpenCV and other state-of-the-art image preprocessing methods.
- Data Analysis: Once preprocessing is complete, Dexit’s AI performs an in-depth analysis to gain insights and prepare the data for model training:
- Distribution Analysis: The frequency of each document type is analyzed to detect class imbalances. If certain document types are underrepresented or overrepresented, adjustments can be made to ensure balanced model training.
- Document Structure Examination: The layout and organizational patterns of documents are examined to identify structural elements like headers, footers, tables, and sections that help differentiate document types.
- Key Feature Identification: Visual and textual features specific to each document type are identified, including keywords, formatting styles, and logos. These features help in distinguishing between document types for more accurate classification.
When the Model Makes Mistakes - Human Feedback for Model Retraining
Dexit’s AI integrates human feedback into its document processing models to continuously refine and improve accuracy. This feedback loop is managed through a robust MLOps pipeline, which automatically triggers model retraining when certain conditions are met. Here's how the system works:
Model Retraining Process
- Feedback Collection and Storage: When a user identifies an error in document classification or entity extraction, they can correct it within the Dexit platform. These corrections are tagged and stored separately from the original data for future retraining purposes.
- Automated Model Retraining: On a monthly basis, the system evaluates the current model's performance against the accumulated human-corrected data. If the accuracy for any document type drops below a predetermined threshold, an automated retraining cycle is triggered.
- Prioritising Corrected Data: During retraining, human-corrected documents are prioritized, especially those from document types that have undergone substantial corrections. This ensures that the retrained model learns from real-world mistakes and improves its accuracy for those document types.
- Model Evaluation and Deployment: After retraining, the new model is rigorously evaluated using a suite of metrics. If it surpasses the accuracy of the currently deployed model, it is pushed to the model registry and deployed for use in the Dexit platform.
- Monitoring and Alerting: Continuous monitoring ensures that model performance remains high. Weekly reports track key metrics, including accuracy, confusion matrices, and potential data drift. If any performance degradation, retraining failures, or anomalies are detected, automated alerts are triggered via email or Slack.
Dexit's feedback-driven approach ensures consistent accuracy, efficiency, and adaptability in its document processing models, making sure the system improves over time.
Technical Implementation of the MLOps Pipeline
The MLOps pipeline at Dexit relies on several essential technologies to streamline and enhance machine learning workflows. ClearML manages the entire process, including data versioning, task distribution across Kubernetes clusters, and experiment tracking. GitHub serves as the central repository for source code, training scripts, data preparation workflows, and configuration files, ensuring version control and collaboration. Hugging Face oversees model registry, version control, and deployment of retrained models, while SkyPilot automates the deployment and management of ClearML agents on cloud clusters, triggered by GitHub Actions. By integrating these tools, Dexit has built a scalable, automated system that learns from human feedback and continuously enhances its document processing capabilities.
Commitment to Data Privacy and Security
At Dexit, our commitment to data privacy is a core principle that drives everything we do. Unlike many AI solutions that rely on third-party cloud infrastructures, we operate entirely on-premise, giving our clients complete control over their data. This on-premise approach ensures that sensitive information never leaves the client's environment, safeguarding against data breaches or misuse. Furthermore, we tailor our models specifically to each client's data, fine-tuning them to meet unique needs and challenges. By avoiding external vendors, we eliminate the risks associated with sharing data, allowing our clients to trust that their privacy is always a top priority while still benefiting from advanced AI-powered solutions.
Dexit in the Real World & How Are We Preparing Ourselves to Get Better?
Dexit is constantly enhancing its AI-native Intelligent Document Processing capabilities through a combination of advanced AI techniques, robust infrastructure, and a commitment to incorporating human feedback. Here are the key areas where Dexit is set to improve:
- Performance Monitoring and Alerting: With each metric being monitored regularly along with the automated pipeline to trigger retraining as and when required will only improve the performance of Dexit with time.
- Addressing Edge Cases and Model Caveats: We have already identified a couple of edge cases for which we have solutions. With time, as and when we identify more edge cases arising from real-world scenarios, Dexit will be capable of handling them using SOTA techniques.
- Ongoing Research and Development: Dexit is actively exploring new techniques to further improve accuracy in both document type classification and entity extraction, particularly for handwritten text.
By leveraging AI-native Intelligent Document Processing, Dexit ensures that medical records are indexed, classified, and extracted with precision, setting new benchmarks for efficiency and accuracy in healthcare document management.
Stay on Top of Everything in Healthcare IT
Join over 3,200 subscribers and keep up-to-date with the latest innovations & best practices in Healthcare IT.