Read PDF content using Selenium

Author - Webner
31.01.2022
|
0 Comments
||

To read PDF document file in Selenium, we can use a Java library called PDFBox. Apache PDFBox is an open-source library that helps in managing PDF files. We can use it to verify the text or images present in the file. To use this with Selenium testing, we need to add the maven dependency in the pom.xml file or add an external jar in the build path.

Here we will use add as an external jar method:

  • Download the jar file from the below path:
    https://pdfbox.apache.org/download.html
    I am using the jar version of PDFbox 1.8.16.
  • Go to the project and select “Configure Build Path” and add the external jar file.
  • After adding the jar, click on the “apply” and “close” buttons.

Code to extract the content of the PDF:

package Testing;
import java.io.BufferedInputStream;
import java.io.InputStream;
import java.net.URL;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import io.github.bonigarcia.wdm.WebDriverManager;
public class pdfread {
public static WebDriver driver;
public void ReadPDF() throws Exception {
WebDriverManager.chromedriver().setup();
driver = new ChromeDriver();
driver.manage().window().maximize();
driver.get("https://unec.edu.az/application/uploads/2014/12/pdf-sample.pdf");
String Currentlink=driver.getCurrentUrl();
URL URL = new URL(Currentlink);
InputStream Inputfile = URL.openStream();
BufferedInputStream file =new BufferedInputStream(Inputfile);
PDDocument document = PDDocument.load(file);
String pdfContent= new PDFTextStripper().getText(document);
System.out.println(pdfContent);
}
public static void main(String[] args) throws Exception {
pdfread read = new pdfread();
read.ReadPDF();
driver.quit();
}
}

Result:
code-file

Webner Solutions is a Software Development company focused on developing Insurance Agency Management Systems, Learning Management Systems and Salesforce apps. Contact us at dev@webners.com for your Insurance, eLearning and Salesforce applications.

Leave a Reply

Your email address will not be published.