Blazor Office File API: Search and Extract PDF Content - Extract Text from PDF

Extract Text from PDF

This demo uses the DevExpress PDF Document API (PdfDocumentProcessor) to extract text from a PDF document. You can process the predefined sample file or supply your own document. To do the latter, select Upload a File in the file selection drop-down menu.

In the Page Range Settings section, specify the pages to extract text from. Click Extract Text to extract text and download the result.

Example

Select a Document

Sample_Alternative.pdf

Page Range Settings

Select Pages

Custom Page Range

View Source
Docs




using DevExpress.Pdf;

Stream GetText(Stream documentStream, IEnumerable<int> pageRange) {
    using var processor = new PdfDocumentProcessor();

    processor.LoadDocument(documentStream);
    var outputStream = new MemoryStream();

    string text = string.Empty;

    if(!pageRange.Any())
        text = processor.GetText();
    else
        foreach(var index in pageRange)
            if(index < 1 || index > processor.Document.Pages.Count)
                continue;
            else
                text += processor.GetPageText(index);

    var writer = new StreamWriter(outputStream);
    writer.Write(text);
    writer.Flush();

    outputStream.Position = 0;
    return outputStream;
}

AI-powered Extensions

Word Processing

Spreadsheet Document API

New PDF Document API

PDF Document API

Presentation API

Extract Text from PDF