[Rod Stephens Books]
Index Books Python Examples About Rod Contact
[Mastodon] [Bluesky]
[Build Your Own Ray Tracer With Python]

[Beginning Database Design Solutions, Second Edition]

[Beginning Software Engineering, Second Edition]

[Essential Algorithms, Second Edition]

[The Modern C# Challenge]

[WPF 3d, Three-Dimensional Graphics with WPF and C#]

[The C# Helper Top 100]

[Interview Puzzles Dissected]

Title: Make, modify, and display a Word document in Python

[A Microsoft Word document created and modified in Python]

A Microsoft Word .docx file is basically a zipped text file that contains a bunch of formatting tokens. If you want to look at the file's contents, follow these steps.

  1. Copy the document and name the copy so it has a .zip extension.
  2. Unzip the file or open the file in File Explorer.
  3. In the word subdirectory, look at the file document.xml.
You can try to uncompress the file and fiddle with the document.xml file yourself. It should be straightforward if rather complex. Happily the python-docx library makes manipulating Microsoft Word documents pleasantly easy.

Install the library by using the following command.

pip install python-docx

Now you can use simple python-docx methods to create, modify, and read Word documents. The following method creates a simple Word document that has a title and paragraphs.

def make_word_doc(filename, title, paragraphs): from docx import Document document = Document() # Add a level 0 title. document.add_heading(title, level=0) # Add some paragaphs. for paragraph in paragraphs: document.add_paragraph(paragraph) document.save(filename)

This method creates a Document object, adds a level 0 heading, and then adds a list of paragraphs. It finishes by saving the new document with the given file name. This will overwrite the file without warning if it already exists (although that will fail if the file is locked by another program such as Word).

The following method makes string replacements in a Word file and saves the result into a new file.

def replace_in_word_doc(in_filename, out_filename, replacements): from docx import Document document = Document(in_filename) for paragraph in document.paragraphs: for replacement in replacements: paragraph.text = paragraph.text.replace( replacement[0], replacement[1]) document.save(out_filename)

The method creates a Document object representing the input file and then loops through the file's paragraphs. For each paragraph, it loops through the replacement values, each of which is a pair of strings. It uses the paragraph's text.replace method to replace the first string with the second.

The method finishes by saving the result in the output document.

The following code shows the last method in this example.

def show_word_doc(filename): from docx import Document document = Document(filename) for paragraph in document.paragraphs: print(paragraph.text)

This method creates a Document object representing the input file. It then loops through the file's paragraphs and displays them.

The following code shows the example's main program.

# Make a document. paragraphs = [ 'This is paragraph one in the initial document.', 'This is paragraph two. It has two sentences!', 'This is the third paragraph in the initial document.', 'This is the fourth and last paragraph.', ] make_word_doc('initial_file.docx', 'Initial File', paragraphs) # Replace "initial" with "modified." replacements = ( ('Initial', 'Modified'), ('initial', 'modified'), ) replace_in_word_doc('initial_file.docx', 'modified_file.docx', replacements) # Display the resulting file. show_word_doc('modified_file.docx')

The code makes a list of strings and then calls make_word_doc to write those strings as paragraphs into a new file.

Next, it calls replace_in_word_doc to replace the string "Initial" with "Modified" and "initial" with "modified." The replace method is case-sensitive so the program needs to make both replacements if it is to be thorough.

The program finishes by calling show_word_doc to display the result, which looks like this:

Modified File This is paragraph one in the modified document. This is paragraph two. It has two sentences! This is the third paragraph in the modified document. This is the fourth and last paragraph.

This is just the beginning of the things the python-docx can do. For more information, see the documentation at Read The Docs.

Download the example to experiment with it. (Or just copy and paste; all of the code is here.)

© 2024 Rocky Mountain Computer Consulting, Inc. All rights reserved.