iText tutorial: Merge & Split PDF files using iText JAR

pdf-merge-pictureIn previous article about Generating PDF files using iText JAR, Kiran Hegde had described a nice and basic way of generating PDF files in Java using iTest JAR. It is a great starter tutorial for those who wants to start working with iText.
In one of the requirement, I had to merge two or more PDF files and generate a single PDF file out of it. I thought of implementing the functionality from scratch in iText, but then thought to google it and see if already someone have written code for what I was looking for.

As expected, I got a nice implementation of java code that merges 2 or more PDF files using iText jar. I thought of dissecting the code in this post and give credit to original author of the post.

Merge PDF files in Java using iText JAR

So here we go. First let us see the code.

package net.viralpatel.itext.pdf;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import com.lowagie.text.Document;
import com.lowagie.text.pdf.BaseFont;
import com.lowagie.text.pdf.PdfContentByte;
import com.lowagie.text.pdf.PdfImportedPage;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.PdfWriter;

public class MergePDF {

	public static void main(String[] args) {
		try {
			List<InputStream> pdfs = new ArrayList<InputStream>();
			pdfs.add(new FileInputStream("c:\\1.pdf"));
			pdfs.add(new FileInputStream("c:\\2.pdf"));
			OutputStream output = new FileOutputStream("c:\\merge.pdf");
			MergePDF.concatPDFs(pdfs, output, true);
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

	public static void concatPDFs(List<InputStream> streamOfPDFFiles,
			OutputStream outputStream, boolean paginate) {

		Document document = new Document();
		try {
			List<InputStream> pdfs = streamOfPDFFiles;
			List<PdfReader> readers = new ArrayList<PdfReader>();
			int totalPages = 0;
			Iterator<InputStream> iteratorPDFs = pdfs.iterator();

			// Create Readers for the pdfs.
			while (iteratorPDFs.hasNext()) {
				InputStream pdf = iteratorPDFs.next();
				PdfReader pdfReader = new PdfReader(pdf);
				readers.add(pdfReader);
				totalPages += pdfReader.getNumberOfPages();
			}
			// Create a writer for the outputstream
			PdfWriter writer = PdfWriter.getInstance(document, outputStream);

			document.open();
			BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA,
					BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
			PdfContentByte cb = writer.getDirectContent(); // Holds the PDF
			// data

			PdfImportedPage page;
			int currentPageNumber = 0;
			int pageOfCurrentReaderPDF = 0;
			Iterator<PdfReader> iteratorPDFReader = readers.iterator();

			// Loop through the PDF files and add to the output.
			while (iteratorPDFReader.hasNext()) {
				PdfReader pdfReader = iteratorPDFReader.next();

				// Create a new page in the target for each source page.
				while (pageOfCurrentReaderPDF < pdfReader.getNumberOfPages()) {
					document.newPage();
					pageOfCurrentReaderPDF++;
					currentPageNumber++;
					page = writer.getImportedPage(pdfReader,
							pageOfCurrentReaderPDF);
					cb.addTemplate(page, 0, 0);

					// Code for pagination.
					if (paginate) {
						cb.beginText();
						cb.setFontAndSize(bf, 9);
						cb.showTextAligned(PdfContentByte.ALIGN_CENTER, ""
								+ currentPageNumber + " of " + totalPages, 520,
								5, 0);
						cb.endText();
					}
				}
				pageOfCurrentReaderPDF = 0;
			}
			outputStream.flush();
			document.close();
			outputStream.close();
		} catch (Exception e) {
			e.printStackTrace();
		} finally {
			if (document.isOpen())
				document.close();
			try {
				if (outputStream != null)
					outputStream.close();
			} catch (IOException ioe) {
				ioe.printStackTrace();
			}
		}
	}
}

If you see what the code does is pretty simple.

  1. In main() method, we create a List of InputStream objects that points to all the input PDF files we need to merge
  2. We call MergePDF.concatPDFs() static method passing list of input PDFs, OutputStream object for merged output PDF and a boolean flag that represents whether you need to include page numbers at the end of each page as command line arguments
  3. In concatPDFs() method, first we convert List of InputStream objects to List of PdfReader objects in first while loop. And also we keep count of the total pages in all the input PDF files.
  4. Next we create BaseFont object using BaseFont.createFont() method. This will be the font for writing page numbers
  5. Next we create output objects to write our merged PDF file using Document class object and PdfWriter.getInstance() method
  6. Finally we write all the input PDFs into merged output PDF iterating each PDF and then writing each page of it in two while loops
  7. And then, close all the streams and clear all the buffers. Good boys do this ;-)

So now we know how to merge PDF files into one, let us see the way to split a PDF file or extract a part of PDF into another PDF.

Split PDF files in Java using iText JAR

Let us see the code.

/**
 * @author viralpatel.net
 *
 * @param inputStream Input PDF file
 * @param outputStream Output PDF file
 * @param fromPage start page from input PDF file
 * @param toPage end page from input PDF file
 */
public static void splitPDF(InputStream inputStream,
		OutputStream outputStream, int fromPage, int toPage) {
	Document document = new Document();
	try {
		PdfReader inputPDF = new PdfReader(inputStream);

		int totalPages = inputPDF.getNumberOfPages();

		//make fromPage equals to toPage if it is greater
		if(fromPage > toPage ) {
			fromPage = toPage;
		}
		if(toPage > totalPages) {
			toPage = totalPages;
		}

		// Create a writer for the outputstream
		PdfWriter writer = PdfWriter.getInstance(document, outputStream);

		document.open();
		PdfContentByte cb = writer.getDirectContent(); // Holds the PDF data
		PdfImportedPage page;

		while(fromPage <= toPage) {
			document.newPage();
			page = writer.getImportedPage(inputPDF, fromPage);
			cb.addTemplate(page, 0, 0);
			fromPage++;
		}
		outputStream.flush();
		document.close();
		outputStream.close();
	} catch (Exception e) {
		e.printStackTrace();
	} finally {
		if (document.isOpen())
			document.close();
		try {
			if (outputStream != null)
				outputStream.close();
		} catch (IOException ioe) {
			ioe.printStackTrace();
		}
	}
}

In above code, we have created a method splitPDF () that can be used to extracts pages out of a PDF and write it into another PDF. The code is pretty much self explanatory and is similar to the one to merge PDF files.
Thus, if you need to split an input.pdf (having 20 pages) into output1.pdf (1-12 pages of input.pdf) and output2.pdf (13-20 of input.pdf), you can call the above method as follow:

public static void main(String[] args) {
	try {
		MergePDF.splitPDF(new FileInputStream("C:\\input.pdf"),
					new FileOutputStream("C:\\output1.pdf"), 1, 12);
		MergePDF.splitPDF(new FileInputStream("C:\\input.pdf"),
					new FileOutputStream("C:\\output2.pdf"), 13, 20);

	} catch (Exception e) {
		e.printStackTrace();
	}
}

Feel free to bookmark the code and share it if you feel it will be useful to you :)



25 Comments

  • prem wrote on 23 December, 2009, 10:39

    Hi ,
    I want to split the pdf if my pdf exceeds the given size.
    For Example,
    If my pdf size is 12MB,I want to split the pdf in to 5MB parts.
    So i need to split the pdf in to 3, 5MB,5MB,2MB respectively.
    Pls let me know whether this is possible.

    Regards,
    Prem

    • Viral Patel wrote on 26 December, 2009, 0:54

      @Prem, This example only splits the PDF on basis of page numbers. I will definitely try to write code for splitting pdf on basis of size and update you.

  • Goblin_Queen wrote on 8 January, 2010, 14:33

    I would like to paste my code here to split a PDF based on bookmarks using iText. All my bookmarks are on level1. I based my code sample on the example on this page, and that’s why I would like to place my code here. I couldn’t really find any example of splitting a PDF based on bookmarks with iText, so I thought it could help other people if I put my code on this page:

    public static void splitPDFByBookmarks(String pdf, String outputFolder){
            try
            {
                PdfReader reader = new PdfReader(pdf);
                //List of bookmarks: each bookmark is a map with values for title, page, etc
                List<HashMap> bookmarks = SimpleBookmark.getBookmark(reader);
                for(int i=0; i<bookmarks.size(); i++){
                    HashMap bm = bookmarks.get(i);
                    HashMap nextBM = i==bookmarks.size()-1 ? null : bookmarks.get(i+1);
                    //In my case I needed to split the title string
                    String title = ((String)bm.get("Title")).split(" ")[2];
    
                    log.debug("Titel: " + title);
                    String startPage = ((String)bm.get("Page")).split(" ")[0];
                    String startPageNextBM = nextBM==null ? "" + (reader.getNumberOfPages() + 1) : ((String)nextBM.get("Page")).split(" ")[0];
                    log.debug("Page: " + startPage);
                    log.debug("------------------");
                    extractBookmarkToPDF(reader, Integer.valueOf(startPage), Integer.valueOf(startPageNextBM), title + ".pdf",outputFolder);
                }
            }
            catch (IOException e)
            {
                log.error(e.getMessage());
            }
        }
    
        private static void extractBookmarkToPDF(PdfReader reader, int pageFrom, int pageTo, String outputName, String outputFolder){
            Document document = new Document();
            OutputStream os = null;
    
            try{
                os = new FileOutputStream(outputFolder + outputName);
    
                // Create a writer for the outputstream
                PdfWriter writer = PdfWriter.getInstance(document, os);
                document.open();
                PdfContentByte cb = writer.getDirectContent(); // Holds the PDF data
                PdfImportedPage page;
    
                while(pageFrom < pageTo) {
                    document.newPage();
                    page = writer.getImportedPage(reader, pageFrom);
                    cb.addTemplate(page, 0, 0);
                    pageFrom++;
                }
    
                os.flush();
                document.close();
                os.close();
            }catch(Exception ex){
                log.error(ex.getMessage());
            }finally {
                if (document.isOpen())
                    document.close();
                try {
                    if (os != null)
                        os.close();
                } catch (IOException ioe) {
                    log.error(ioe.getMessage());
                }
            }
        }
    
    • Viral Patel wrote on 8 January, 2010, 15:29

      Thanks Goblin for the code. I appreciate your effort of sharing this here.

  • dharmendra wrote on 16 March, 2010, 14:31

    i had downloaded iText jar 5.0.1 but it do not contain com.lowagie.text.table package what should i do?? i want to create table dynamically in pdf page..

  • Puru wrote on 24 May, 2010, 20:42

    Viral,
    I was searching pdf splittig on the basis of size(excedding a given size say > 10 MB). Prem was also looking for the same last year.Just checking if you got an opportunity to write on this.

    Thanks for your help
    Puru

  • Kunu wrote on 17 June, 2010, 19:35

    Viral

    Thanks for posting the code here.

  • Mohamed nazar wrote on 20 August, 2010, 9:40

    i was searching for the code to read a image and parse it can you post the code

  • Dimitri wrote on 6 September, 2010, 20:13

    Viral : If you already found the script for splitting pdfs by page size, it would be VERY handy
    Thx in advance

  • Vishnu wrote on 20 October, 2010, 18:05

    Hi Viral,

    By any chance is it possible to split the pages randomly? E.g. I have a PDF document of 20 pages and now I was a new PDF containing only 2nd, 5th, 11th and 16th page of the original document.

    Thanks in advance.
    Vishnu.

  • ManhHD wrote on 1 November, 2010, 15:50

    Thank for sharing

  • maddireddy wrote on 1 November, 2010, 17:00

    hi viral…can you provide sample code to generate thumbnails to a pdf page

  • Swati wrote on 27 December, 2010, 18:53

    nice coding…:)its wrks for me always in 1 shoot viral

  • Edwin Tan wrote on 16 February, 2011, 8:07

    Hi Viral Patel,

    Did you manage to develop sample code to split PDFs based on size? i.e. Splitting a 8MB file into 2MB PDFs.

    Regards,
    Edwin

  • Shivakumar wrote on 22 February, 2011, 8:20

    Hi Viral , I got the following errors:

    Exception in thread “main” java.lang.OutOfMemoryError: Java heap space
    at java.io.ByteArrayOutputStream.write(Unknown Source)
    at com.itextpdf.text.pdf.RandomAccessFileOrArray.InputStreamToArray(Rand
    omAccessFileOrArray.java:182)
    at com.itextpdf.text.pdf.RandomAccessFileOrArray.(RandomAccessFile
    OrArray.java:172)
    at com.itextpdf.text.pdf.PdfReader.(PdfReader.java:237)
    at com.itextpdf.text.pdf.PdfReader.(PdfReader.java:248)
    at MergePDF.splitPDF(split.java:38)
    at ShivaPdf.main(split.java:87)

    How to resolve them.

  • Luke Pacholski wrote on 5 March, 2011, 2:01

    Dimitri asked:

    “Viral : If you already found the script for splitting pdfs by page size, it would be VERY handy”

    I found the answer here:

    http://java-x.blogspot.com/2006/11/merge-pdf-files-with-itext.html#comment-3167256311373072651

    document.setPageSize(pdfReader.getPageSizeWithRotation(pageOfCurrentReaderPDF));
    document.newPage();

    I.e., immediately before creating a new page, set the page size to the page size (and orientation) of the page you are reading in. The only issue to be aware of is the PDF has to be (I believe) version 1.5 or higher. You’ll get errors trying to open PDFs saved as 1.4.

    pdfWriter.setPdfVersion(PdfWriter.PDF_VERSION_1_5);

  • natalie Afota wrote on 28 March, 2011, 10:53

    Hey

    I have used your code for merging few pdf files but I still have a problem to merge files that carry different sizes!
    I have try to had a command to set the page size

    document.newPage();
    pageOfCurrentReaderPDF++;
    currentPageNumber++;
    document.setPageSize(pdfReader.getPageSize(pageOfCurrentReaderPDF));

    Thanks, for the helpful code!

    Natalie

  • uday wrote on 31 March, 2011, 13:40

    Very good code for merging & splitting….
    good work itext…………exactly suits my requirement.

  • teakey wrote on 13 April, 2011, 14:23

    hi,thanks for your coding ,but I still have a problem to merge files that carry different sizes!
    how could i solve it? some page is large while other is small.I am tired with it.help!

  • fryguy wrote on 29 April, 2011, 0:36

    Does anybody have any examples for splitting a PDF at bookmarks that exist at a level 2? The code provided by Goblin_Queen works great for level 1 bookmarks, but I’m not quite sure how to expand that to level 2 bookmarks as the resulting hashmap for the “Kids” do not contain page information.

    HashMap bm = bookmarks.get(i);
    List kids = (List) bm.get(“Kids”);

    The kids hashmap does not have the “Page” defined.

  • Searock wrote on 23 July, 2011, 10:12

    You could reduce the number of lines by using PdfStamper class.

    PdfReader pdfReader = new PdfReader(fileDialog.FileName);
    FileOutputStream fout = new FileOutputStream(“C:\\output1.pdf”);
    PdfStamper splitter = new PdfStamper(pdfReader, fout);
    pdfReader.SelectPages(“1 – 10″);
    splitter.Close();
    pdfReader.Close();

  • mani wrote on 2 September, 2011, 16:15

    very useful code for all guys who r in initial stage thank u very much

  • Dennis Lindeman wrote on 3 December, 2011, 1:01

    I tried the concatPDFs code. It concatenates two files OK. But, it does not copy the signatures that were in the files. It also does not copy text data fields.

  • to split wrote on 30 January, 2012, 17:28

    thank you for the great tutorial and comments. They help me a lot, to write my own split function

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Copyright © 2012 ViralPatel.net. All rights reserved.