--

Thanks for the analysis. I agree it is not the best product out there but it gets the work done when you consider handling multiple file formats without a load of external libraries.

Also, there is some promising research going on which proposes to replace the entire OCR / layout extraction / text embedding process (for RAG use cases) with just image based information extraction and reasoning using image encoders and vision LLMs with decent results. One such example is the "ColPali: Efficient Document Retrieval with Vision Language Models" https://arxiv.org/abs/2407.01449 approach and it is quite likely the major companies would offer APIs that will make this mode of content extraction much easier.

--

--

Anurag Chatterjee
Anurag Chatterjee

Written by Anurag Chatterjee

I am an experienced professional who likes to build solutions to real-world problems using innovative technologies and then share my learnings with everyone.

No responses yet