File Handling | Reading data from word document(.doc or .docx) in JAVA.

Reading data from word document(.doc or .docx) in JAVA is similar as we have done for Excel.

Visit the below article, to know the process to read data from an Excel file.

Reading data from Word files


Steps to Read the word File

Step #1 Firstly, create two-word documents, one with tempdata.doc and other with tempdata_1.docx extensions.
Step #2 Secondly, download and add the jar files in your java project as mentioned in the ‘Reference Jar Files‘ section.
Step #3 Copy and paste the code in your class file and run the code to observe the output.
Step #4 To identify the extension of the word file we have used the getExtension() method of FilenameUtils Class as mentioned below:
             String fileExtension= FilenameUtils.getExtension(filePath);
Step #5 Once we get the file extension, we have to call the correct method accordingly.
Step #6 For the file with the extension “.docx” we have to use XWPFDocument and XWPFParagraph Classes.
              XWPFDocument doc =new XWPFDocument(FileInputStream fis);
              List<XWPFParagraph> getDocParagraphs= doc.getParagraphs();
Step #7 For the file with the extension “.doc” we have to use HWPFDocument and WordExtractor classes.
               HWPFDocument doc=new HWPFDocument(FileInputStream fis);
               WordExtractor extractor=new WordExtractor(doc);

Note: Please change the path of the Word document file accordingly.

Reference Jar files

1. Navigate to :- poi-bin-3.16-20170419.tar.gz
2. Click on the first link, poi-bin3.16-20170419.tar.gz link.
3. Jar files get downloaded automatically.
4. Then add the jar files in your Project using the ‘configure build path‘ option.

To use FilenameUtils class firstly we have to add the commons library in the project.

  1. Download the library files from the common library link.
  2. Add library files to the Project using Build Path. So that we could use the getExtension() of FilenameUtils class.

Code Example

import java.util.List;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;

public class WordHandling 

	public static void main(String[] args) throws IOException 
		// TODO Auto-generated method stub
		String filePath="input_Word//tempdata_1.doc";
	public static void loadFile(String filePath) throws IOException
		File file=new File(filePath);                               // Creating File Object
		String fileExtension=FilenameUtils.getExtension(filePath);  // Getting extension of file
		else if(fileExtension.equalsIgnoreCase("doc"))
	// Reading data from ".docx" file.
	public static void readDocxFile(File file) throws IOException
		FileInputStream fis=new FileInputStream(file); 
		XWPFDocument doc =new XWPFDocument(fis); 
		List<XWPFParagraph> getDocParagraphs= doc.getParagraphs(); // Getting all the paragraphs from the document and adding the same in ArrayList
		int totalParagraphs=getDocParagraphs.size();               // Getting total number of paragraphs in word document.
		System.out.println("Total number of paragraphs : "+totalParagraphs);
		for (XWPFParagraph currentParagraph : getDocParagraphs) 

//Similarly we can use Iterator to traverse the document.
Iterator<XWPFParagraph> para1=doc.getParagraphsIterator();


doc.close(); } // Reading data from ".doc" file. public static void readDocFile(File file) throws IOException { FileInputStream fis =new FileInputStream(file); HWPFDocument doc=new HWPFDocument(fis); WordExtractor extractor=new WordExtractor(doc); String[] getDocParagraphs= extractor.getParagraphText(); // Getting all the paragraphs from the document and adding the same in String array. int totalParagraphs=getDocParagraphs.length; // Getting total number of paragraphs in word document. System.out.println("Total count of paragraphs : "+totalParagraphs+"\n"); for (String currentPara : getDocParagraphs) { System.out.print(currentPara); } extractor.close(); } }

Related Links:


Java Basics:

Computer Baiscs:

OOPs Concept:

Java Question And Answer:

Java Programs:

Leave a Comment

Your email address will not be published. Required fields are marked *

   YouTube ChannelQuora

            Ashok Kumar is working in an IT Company as a QA Consultant. He has started his career as a Test Trainee in manual testing in August 2010. Then he moves towards the automation testing after 4 years. He started learning JAVA and Selenium by self to get the knowledge of automation.

       While learning these tools and working on multiple projects, he found that sometimes people get stuck in live scenarios in their project and they have to do lots of RnD to get out of it. So he decided to start blogging only for such scenarios, where anyone facing any problem in their project, can ask any question or give a solution or you can say an alternate solution to achieve the goal successfully.

Later on, he observed that some people want to learn Java but they have few questions in their mind like how to start Java, whether we should go for the online or offline course. So he started writing tutorials on Java, Jira, Selenium, Excel etc.