How to read image from MS word document and save

Author - Sahil Bhalla
18.12.2018
|
1 Comment
|

How to read .jpg and .png image from a word document and save in a different folder using Java?

Each line inside a word document is defined as a different paragraph and each image represents a paragraph. If there is some text present in between the images then this text will also be read as a different paragraph.

Code to read word document image data and save on local system in java :-

XWPFParagraph is used to identify the paragraph in the document.
XWPFRun class holds the attributes of the paragraph.
XWPFPicture is used to get image data.

if(elem instanceof XWPFParagraph) {
    for (XWPFRun run: para.get(countpara).getRuns()) {
        for (XWPFPicture pic: run.getEmbeddedPictures()) {
            String imageName = null;

            //get file name using getFileName() function.
            picdata = pic.getPictureData().getFileName();

            //condition to check if the image is .jpg or .png
            if (picdata.contains(".png") || picdata.contains(".jpg") || picdata.contains(".jpeg")) {
                if (currentSection != null) {
                    currentSection = currentSection.trim();
                    courseName = courseName.trim();
                    if (currentSection != null && oldSection != null) {
                        String courseName = getCourseName().replace(' ', '-');
                        courseName = courseName.toLowerCase();
                        if (currentSection == oldSection) {
                            imageName = courseName + "--" + currentSection + "--0" + imageCounter;
                            imageName = imageName.replace("/", "-");
                            imageCounter++;
                        } else {
                            imageCounter = 1;
                            oldSection = currentSection;
                            imageName = courseName + "--" + currentSection + "--0" + imageCounter;
                            imageName = imageName.replace("/", "-");
                            imageCounter++;
                        }
                    }
                    //get image data in byte storage and save in byte type
                    byte[] fileData = pic.getPictureData().getData();
                    if (picdata.contains(".png") && imageName != null) {
                        imageName = imageName + ".png";
                        varArray.add(new String[] {
                            st,
                            imageName
                        });
                    } else if (picdata.contains(".jpg")) {
                        varArray.add(new String[] {
                            st,
                            imageName
                        });
                    } else if (picdata.contains(".jpeg")) {
                        imageName = imageName + ".jpg";
                        varArray.add(new String[] {
                            st,
                            imageName
                        });
                    }
                    //create folder in which image is to be saved
                    File directory1 = new File(“/home/sumit / sahil / images / ”+imageName);
                    FileOutputStream out = new fileOutputStream(“/home/sumit / sahil / images / ”+imageName);
                    out.write(fileData);
                    out.close();
                }
            }
        }
    }
}
Webner Solutions is a Software Development company focused on developing Insurance Agency Management Systems, Learning Management Systems and Salesforce apps. Contact us at dev@webners.com for your Insurance, eLearning and Salesforce applications.

One response on “How to read image from MS word document and save

  1. […] edit MS Word documents online, one of the quick approaches is saving the Word document to OneDrive and use the […]

Leave a Reply

Your email address will not be published.