|All requests||>||Decode PDF, ODT, Word, DOC, DOCX, RTF||>||Request new recommendation||>||Featured requests||>||No recommendations|
by herman lapre - 7 years ago (2016-02-20)
The PHP file reader must be able to read PDF, ODT, DOC, DOCX and RTF documents.
10. by Nitin Shukla - 6 years ago (2017-02-16) Reply
I want to convert .doc, .rtf and .docx format into HTMl page without lost any content, style (bullets, tables, text format etc.). Can anyone please provide me any library/script that can handle all of my requirement.
Script for individual format will also work for me.
9. by Backiaraj - 6 years ago (2016-12-02) Reply
2. by Christian Vigh - 7 years ago (2016-02-26) Reply
Please clarify your demand : which data do you expect from the PDF/ODT/DOC/DOCX and RTF document reader ? do you want to manipulate document elements after decoding ? do you want to be able to perform modifications after decoding ? or do you simply want to display the document contents on a web page ?
3. by Manuel Lemos - 7 years ago (2016-02-27) in reply to comment 2 by Christian Vigh Comment
According to the request tags he wants a file viewer for those formats. So I suppose something that converts those formats to images will be helpful.
It seems that OpenOffice/LibreOffice can be used for that purpose. the soffice program has options that can start the program opening a given file and convert the file to some other format, like Web pages with pictures, and then it exits without opening the GUI.
So it can run from the console using the options --headless and --convert-to .
4. by Christian Vigh - 7 years ago (2016-02-27) in reply to comment 3 by Manuel Lemos Comment
I have had some experience with OpenOffice/LibreOffice for converting .DOC/.DOCX to .PDF documents. I have encountered some formatting issues, especially with tables but in general it works well.
In addition, the unoconv script provides a command-line interface for doing the conversion.
However, as far as I can remember, I requires the openoffice daemon to be up and running.
I don't know if this could address Herman's needs ?
5. by Manuel Lemos - 7 years ago (2016-02-27) in reply to comment 4 by Christian Vigh Comment
You do not need to have the OpenOffice daemon running. You can just start OppeOffice on demand to make the format conversion using the soffice command with the options mentioned above. So you do not need the unoconv script as well.
Starting OpenOffice as a daemon has the advantage of keeping OpenOffice running in memory, just in case you need to convert many documents without delay. In that case you would use a script like unoconv to communicate with the daemon.
6. by satya teja - 7 years ago (2016-07-11) in reply to comment 2 by Christian Vigh Comment
hi i have the same question and i want to simply display the documents contents into their respective fields, for example if i upload a resume the data must be displayed into fileds like first name, last name etc.
7. by Muhammad Khalid Chaudhary - 6 years ago (2016-11-02) in reply to comment 4 by Christian Vigh Comment
Can you explain how to convert .DOC/.DOCX to .PDF documents Using PHP and OpenOffice/LibreOffice ?
8. by Manuel Lemos - 6 years ago (2016-11-02) in reply to comment 7 by Muhammad Khalid Chaudhary Comment
I do not remember exactly. You need to check the documentation but I think it is something pretty easy. What may be hard is to have OpenOffice installed on the server. In any case maybe somebody can publish a class that can do that for you.
1. by Manuel Lemos - 7 years ago (2016-02-26) Reply
There are packages that can render some of those formats as images that you can display on a Web page.
There are not packages for all those formats but some of them could be added later using external programs to render the files as images.
That could be a innovative solution.
As Manuel said, there is currently no universal solution for that. The package referenced here is able to capture html contents and generate either an image or a pdf document, using a third party web service.
1. by Dave Smith - 7 years ago (2016-02-29) Reply
While I support apiLayer, I do not think this is what the requester is looking for. They do NOT want to convert html, they want to view rtf, office doc and docx, and openoffice odt files.
2. by Dave Smith - 7 years ago (2016-02-29) in reply to comment 1 by Dave Smith Reply
Forgot to mention that they also want to read adobe pdf, not create them.
3. by herman lapre - 7 years ago (2016-03-04) in reply to comment 1 by Dave Smith Reply
that is exactly what i need; i have research TET, TIKA , several pdf decoders etc. but they all cover partial solutions. Migrating to eg. elasticsearch solution is a bit overkill to me. I need a reader that is able to read the plain text from PDF,ODT,DOCX,DOC,RTF documents and the like