• Skip to main content
  • Skip to primary sidebar

Ryan McCormick

Dedicated Dad, Software Engineer and Lover of Coffee

MS Word Document text to String with VBA

November 23, 2015 by Ryan McCormick Leave a Comment

I once had a project where I had to work with a large directory of unsorted word documents. The main directory had a deep sub-directory where there were multiple versions of the same document with the same file name. It was a mess.

Basically, my responsibilities were to pull out unique document filenames with the most recently modified version and parse each document for a unique key-id using regular expressions. I won’t get into all of the details, but basically I found that the easiest way to run regex searches was to break each document into raw text and then work from there.

For this post, I built both an early (needs references) and late binding version of the function I used to extract text from each Word Document for my project.

NOTE: If don’t need all of the inner document content, I added a method to pull a fragment by start, stop index. To use, you must comment out the line “docContent = oWdoc.Content” and un-comment out the line “docContent = oWdoc.Range(0, 500)”. Adjust the 0 (start) and 500 (stop) to your needs.

Extract Text From MS Word Document with VBA – Early Binding Example

'----------------------------------------------------
' Get Text From MS Word Document (Early Binding)
'----------------------------------------------------
' NOTE: To use this code, you must reference
' The Microsoft Word 14.0 (or current version)
' Object Library by clicking menu Tools > References
' Check the box for:
' Microsoft Word 14.0 Object Library in Word 2010
' Microsoft Word 15.0 Object Library in Word 2013
' Click OK
'----------------------------------------------------
Function getWordDocText(iFile) As String
    Dim oWord As Word.Application
    Dim oWdoc As Word.Document
    Dim docHeader As String
    Dim docFooter As String
    Dim docContent As String
        
    ' Initialize Word Objects
    '---------------------------------
    Set oWord = New Word.Application
    Set oWdoc = oWord.Documents.Open(iFile)
    
    ' Get Content From Document
    '---------------------------------
    ' Get primary header
    docHeader = oWdoc.Sections(1).Headers(1).Range.Text
    
    ' Get primary footer
    docFooter = oWdoc.Sections(1).Footers(1).Range.Text
    
    ' Get document content
    docContent = oWdoc.Content
    '---------------------------------
    ' Limit to first 500 characters of
    ' main document content. Uncomment
    ' to use and adjust accordingly:
    '---------------------------------
    'docContent = oWdoc.Range(0, 500)
    '---------------------------------
        
    ' Return Document Content
    '---------------------------------
    getWordDocText = docHeader & vbNewLine & docContent & vbNewLine & docFooter
    
    ' Clear Memory
    '---------------------------------
    oWdoc.Close
    oWord.Quit
    Set oWdoc = Nothing
    Set oWord = Nothing
End Function

Extract Text From MS Word Document with VBA – Late Binding Example

'----------------------------------------------------
' Get Text From MS Word Document (Late Binding)
'----------------------------------------------------
' NOTE: This is the late binding version of the
' Get Text From MS Word Document code. No reference
' to Microsoft Word XX.0 Object Library is needed
'----------------------------------------------------
Function getWordDocText(iFile) As String
    Dim oWord As Object
    Dim oWdoc As Object
    Dim docHeader As String
    Dim docFooter As String
    Dim docContent As String
    
    ' Initialize Word Objects
    '---------------------------------
    Set oWord = CreateObject("Word.Application")
    Set oWdoc = oWord.Documents.Open(iFile)
    
    ' Get Content From Document
    '---------------------------------
    ' Get primary header
    docHeader = oWdoc.Sections(1).Headers(1).Range.Text
    
    ' Get primary footer
    docFooter = oWdoc.Sections(1).Footers(1).Range.Text
    
    ' Get All Main Document Content
    docContent = oWdoc.Content
    '---------------------------------
    ' Limit to first 500 characters of
    ' main document content. Uncomment
    ' to use and adjust accordingly:
    '---------------------------------
    'docContent = oWdoc.Range(0, 500)
    '---------------------------------
    
    ' Return Document Content
    '---------------------------------
    getWordDocText = docHeader & vbNewLine & docContent & vbNewLine & docFooter
    
    ' Clear Memory
    '---------------------------------
    oWdoc.Close
    oWord.Quit
    Set oWdoc = Nothing
    Set oWord = Nothing
End Function

As always, please comment with questions, issues, etc…

Related

Filed Under: Microsoft Access, Microsoft Excel, Microsoft Word, VBA Tagged With: extract text, ms word document, vba

Reader Interactions

Leave a Reply Cancel reply

Primary Sidebar

Recent Posts

  • Force Quit Kill all Chrome Windows MacOS
  • SOLVED: Angular 6 CLI Karma Stuck in Single Run | Karma Stops Running
  • How to Manually Install Java 8 on Ubuntu 18.04 LTS
  • Remove VirtualBox from Ubuntu 16.04 Xenial
  • Clear all Node Modules Folders Recursively Mac/Linux

Recent Comments

  • KKV on Webstorm adding spaces between imports and braces | JavaScript and TypeScript
  • jusopi on Clear all Node Modules Folders Recursively Mac/Linux
  • Qaisar Irfan on Clear all Node Modules Folders Recursively Mac/Linux
  • mustafa on Remove VirtualBox from Ubuntu 16.04 Xenial
  • Pourya on How to Manually Install Java 8 on Ubuntu 18.04 LTS

Archives

  • May 2019
  • May 2018
  • April 2018
  • March 2018
  • January 2018
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • March 2017
  • December 2015
  • November 2015
  • July 2015
  • April 2015
  • February 2015
  • September 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • October 2013
  • August 2013
  • June 2013
  • April 2013
  • March 2013
  • February 2013
  • December 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • May 2012
  • March 2012
  • February 2012
  • December 2011
  • November 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • August 2009
  • July 2009
  • May 2009

Categories

  • Angular
  • Angular 2
  • AngularJS (1x branch)
  • Computer Q&A
  • ES2015
  • Internet Marketing
  • Javascript
  • Job Interviews
  • Job Search
  • Karma
  • Laravel
  • Linux
  • Linux/Unix Tips
  • MacOS
  • Microsoft Access
  • Microsoft Excel
  • Microsoft Outlook
  • Microsoft Word
  • News
  • Node
  • Open Source
  • PHP
  • Protractor
  • Resume Writing
  • Spring Boot
  • SQL
  • Ubuntu
  • VBA
  • VBScript
  • VirtualBox
  • Web Development
  • Windows Tips
  • Wordpress

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Copyright © 2023 · Magazine Pro on Genesis Framework · WordPress · Log in