Title: Convert a Word document into a PDF document in Python
Figuring out how to dissect a Word file and then use it to build a PDF file in their native formats sounds daunting. Fortunately, it's easy if you use a Word application server. That's a COM (Common Object Model) program that runs Word's backend. I think it is installed by default on Windows systems. You can probably get the same thing to work on other systems if you can figure out how to install the Word server. Sorry, but I can't help you there.
To use this example, first install the comtypes library using the following command.
# pip install comtypes
The following method loads a Word file and saves it as a PDF file.'
def convert_word_file(in_filename, out_filename):
import os
import comtypes.client
# Convert relative paths to absolute paths. Otherwise Word
# looks somewhere silly like C:\WINDOWS\system32.
in_filename = os.path.abspath(in_filename)
out_filename = os.path.abspath(out_filename)
# Create a Word application COM server.
word = comtypes.client.CreateObject('Word.Application')
# Open the file.
doc = word.Documents.Open(in_filename)
# Save the file in pdf format.
# For other file format values, see:
# https://learn.microsoft.com/en-us/office/vba/api/word.wdsaveformat
wdFormatPDF = 17
doc.SaveAs(out_filename, FileFormat=wdFormatPDF)
# Clean up.
doc.Close()
word.Quit()
This method takes as parameters the input and output file names. It first converts those names into absolute file name so the Word server doesn't look in the directory where it is located.
The code then creates the Word application server. It uses the server's Documents collection's open method to open the Word file. Next, it uses the document's SaveAs method to save the file in the PDF format.
The method finishes by closing the document and ending the Word server. This is very important! If you don't properly close the server, it may stick around as a zombie wasting system resources. Even more importantly, it will probably keep the input file locked so you can't edit or delete it.
If you fail to close the server, open Task Manager, track it down (it will be a background process named Microsoft Word), and kill it. You may want to close any other interactive Word processes first so you don't accidentally kill one of them and lose edits that you haven't saved. The following picture shows the Word server in Task Manager.
The following code shows the example's main program.
# Main program.
convert_word_file('test_file.docx', 'test_file.pdf')
print('Done')
This code simply calls the method to convert the local file test_file.docx into the file test_file.pdf.
Download the example to experiment.
|