diff --git a/java/developer-guide/rendering-documents/rendering-to-pdf/optimization-pdf-options/pdf-remove-unused-resources.md b/java/developer-guide/rendering-documents/rendering-to-pdf/optimization-pdf-options/pdf-remove-unused-resources.md new file mode 100644 index 0000000..be4c42a --- /dev/null +++ b/java/developer-guide/rendering-documents/rendering-to-pdf/optimization-pdf-options/pdf-remove-unused-resources.md @@ -0,0 +1,75 @@ +--- +id: optimization-pdf-resources +url: viewer/java/optimization-pdf-remove-unused-resources +title: Optimize the PDF file by removing unused resources +linkTitle: Optimize the PDF file by removing unused resources +weight: 10 +description: "This topic describes how to optimize PDF file using the GroupDocs.Viewer Java API by removing the unused (orphaned) resources and thus to reduce the file size." +keywords: convert to pdf, optimize size, pdf reduce size, pdf remove unused resources, pdf remove orphaned resources +productName: GroupDocs.Viewer for Java +hideChildren: False +toc: True +--- + +In some cases [PDF](https://docs.fileformat.com/pdf/) documents may contain different resources, which are unused, which means they are not accessible and visible when viewing the document in any PDF viewer. Starting from the [version 24.12](https://releases.groupdocs.com/viewer/java/release-notes/2024/groupdocs-viewer-for-java-24-12-release-notes/) the GroupDocs.Viewer is able to remove such unused resources using two new public properties of the boolean type: `setRemoveUnusedObjects(...)` and `setRemoveUnusedStreams(...)`, both of which are located in the [`PdfOptimizationOptions`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/pdfoptimizationoptions/) class. By default, both options are disabled (`false`), so the GroupDocs.Viewer will not apply this optimization. + +In order to explain these two options and their differences, we need to dive into the PDF structure a little bit. + +PDF document consists of PDF objects. Every object has its number (ID) and may belong to one of the following types: name, string, number, boolean, null object, dictionary, array (forms PDF document structure), and stream (raw binary data). Objects may be referenced from other objects, for example, a dictionary or array may contain references to other objects. These references unite all parts of the PDF document and form a PDF document structure. Stream objects contain binary data, and the size of these data may be large. For example, images or fonts are stored as stream objects. After some manipulations with the document, some streams may be "orphaned" i.e. they may not have any reference to them. For example, the old image was replaced with the new one, but the binary data of the old image was not removed. In other words, the stream does not belong anymore to the document logically but still contained in the document physically. For removing such orphaned objects the `RemoveUnusedObjects` property exists — it finds orphaned objects in the document and removes them, this can help to decrease the document size of such objects found. + +Every document page has its `Resources` dictionary which contains data like images, fonts, etc. which are used in the page contents. Resources are referenced by their names in the dictionary, for example, the page may contain the operator to draw the image with the name "Image12" on the particular place of the page. In some cases, the resource may become unused, for example, the image was removed from the page contents but left in page resources, or the page was extracted from the document but its resources still contain common resources of the document. Resource became "orphaned", please note that this is another situation, then described in `RemoveUnusedObject` explanation, because the object is still referenced from the resources dictionary of the page, but the resource is never used by the page (its name never used in page contents). `RemoveUnusedStreams` property, when enabled, finds and removes these unnecessary resources. Since after this process removed resource stream objects became not linked with document structure, `RemoveUnusedObjects` option is automatically activated when `RemoveUnusedResources` is used. + +Here is an example, where both options are applied to the same input PDF file, so Viewer produces two output PDF files with distinct options applied. + +{{< tabs "Example1">}} +{{< tab "Java" >}} +```java +final String filename = "sample.pdf"; + +PdfViewOptions viewOptions1 = new PdfViewOptions("output1.pdf"); +viewOptions1.setPdfOptimizationOptions(new PdfOptimizationOptions()); +viewOptions1.getPdfOptimizationOptions().setRemoveUnusedObjects(true); + +PdfViewOptions viewOptions2 = new PdfViewOptions("output2.pdf"); +viewOptions2.setPdfOptimizationOptions(new PdfOptimizationOptions()); +viewOptions2.getPdfOptimizationOptions().setRemoveUnusedStreams(true); + +try (Viewer viewer = new Viewer(filename)) { + viewer.view(viewOptions1); + viewer.view(viewOptions2); +} +``` +{{< /tab >}} +{{< tab "Kotlin">}} +```kitlin +val filename = "sample.pdf" + +val viewOptions1 = PdfViewOptions("output1.pdf").apply { + pdfOptimizationOptions = PdfOptimizationOptions().apply { + removeUnusedObjects = true + } +} + +val viewOptions2 = PdfViewOptions("output2.pdf").apply { + pdfOptimizationOptions = PdfOptimizationOptions().apply { + removeUnusedStreams = true + } +} + +Viewer(filename).use { viewer -> + viewer.view(viewOptions1) + viewer.view(viewOptions2) +} +``` +{{< /tab >}} +{{< /tabs >}} + +As it is explained above, the effectiveness of the described optimizations depends solely on the specific PDF file — if it has no “orphaned” objects or streams, then these optimizations will do nothing, they only increase the document processing time. However, in some cases they can reduce the document size significantly, even several times. + +We checked both `RemoveUnusedObjects` and `RemoveUnusedStreams` on our internal sample PDF documents, and measured their size before and after applied optimizations. Results are shown in the table below. + +| Filename | Original size, bytes | `RemoveUnusedObjects`, bytes | `RemoveUnusedStreams`, bytes | +|------------------------------------------------------------------------------------------|----------------------|------------------------------|-------------------------------| +| [Sample1.pdf](/viewer/java/sample-files/developer-guide/rendering-documents/Sample1.pdf) | 131 832 | 2 274 | 131 832 | +| [Sample2.pdf](/viewer/java/sample-files/developer-guide/rendering-documents/Sample2.pdf) | 131 870 | 131 774 | 2 690 | + diff --git a/java/images/rendering-basics/render-xml-documents/XML-fixed.png b/java/images/rendering-basics/render-xml-documents/XML-fixed.png new file mode 100644 index 0000000..57a31be Binary files /dev/null and b/java/images/rendering-basics/render-xml-documents/XML-fixed.png differ diff --git a/java/images/rendering-basics/render-xml-documents/XML-to-HTML.png b/java/images/rendering-basics/render-xml-documents/XML-to-HTML.png new file mode 100644 index 0000000..b673793 Binary files /dev/null and b/java/images/rendering-basics/render-xml-documents/XML-to-HTML.png differ diff --git a/java/rendering-basics/render-xml-documents.md b/java/rendering-basics/render-xml-documents.md new file mode 100644 index 0000000..6692cd6 --- /dev/null +++ b/java/rendering-basics/render-xml-documents.md @@ -0,0 +1,312 @@ +--- +id: render-xml-documents +url: viewer/java/render-xml-documents +title: Render XML documents as HTML, PDF, PNG, and JPEG files +linkTitle: Render XML documents +weight: 10 +description: "This topic describes how to use the GroupDocs.Viewer Java API to convert XML documents to HTML (with and without pagination), PDF documents, PNG, and JPEG raster formats." +keywords: convert xml to html, xml to html, xml to pdf, xml to jpeg, xml to png, xml to image, xml correcter, fix xml structure +productName: GroupDocs.Viewer for Java +hideChildren: False +toc: True +aliases: + - /viewer/java/view-xml-documents + - /viewer/java/how-to-convert-and-view-xml-files +--- +[GroupDocs.Viewer for Java](https://products.groupdocs.com/viewer/java) started to support XML format a long time ago, but XML documents were treated as plain text, and thus it was not as useful as it might be. Starting from [version 24.12](https://releases.groupdocs.com/viewer/java/release-notes/2024/groupdocs-viewer-for-java-24-12-release-notes/#new-xml-converter), the completely new XML processing module was implemented, and now XML documents are processed differently, not as plain text documents. This article explains this new XML processing module. + +## Opening the XML document + +First of all need to emphasize that the new XML processing module had not touched the public API at all — no new options, classes, properties or methods were added or modified. In order to process input XML document properly using the new XML processing module, need to either specify the [`LoadOptions`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/loadoptions/) class instance with [`FileType.XML`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer/filetype/#XML) in its [constructor](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/loadoptions/#LoadOptions-com.groupdocs.viewer.FileType-) or just pass the XML document as filename with `*.xml` extension. Code example below shows all possible ways: + +{{< tabs "Loading example">}} +{{< tab "Java" >}} +```java +// 1. Specify by filename +String inputXmlDocument = "Sample.xml"; +try (Viewer viewer = new Viewer(inputXmlDocument)) { + // do some work... +} + +// 2. Specify by filename and FileStream +String inputXmlPath = "path/Sample.xml"; +try (FileInputStream inputXmlFileStream = new FileInputStream(inputXmlPath); + Viewer viewer = new Viewer(inputXmlFileStream)) { + // do some work... +} + +// 3. Specify by load options +// fill ByteArrayOutputStream with content of XML document +LoadOptions loadOptions = new LoadOptions(FileType.XML); +try (Viewer viewer = new Viewer(new ByteArrayInputStream(xmlContent.toByteArray()), loadOptions)) { + // do some work... +} +``` +{{< /tab >}} +{{< tab "Kotlin">}} +```kitlin +// 1. Specify by filename +val inputXmlDocument = "Sample.xml" +Viewer(inputXmlDocument).use { viewer -> + // do some work... +} + +// 2. Specify by filename and FileStream +val inputXmlPath = "path/Sample.xml" +FileInputStream(inputXmlPath).use { inputXmlFileStream -> + Viewer(inputXmlFileStream).use { viewer -> + // do some work... + } +} + +// 3. Specify by load options +// fill ByteArrayOutputStream with content of XML document +val loadOptions = LoadOptions(FileType.XML) +ByteArrayInputStream(xmlContent.toByteArray()).use { inputStream -> + Viewer(inputStream, loadOptions).use { viewer -> + // do some work... + } +} +``` +{{< /tab >}} +{{< /tabs >}} + + +If the instance of the [`Viewer`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer/viewer/) class is initialized using one of the ways described above, the new XML processing module will be used. + +By default, all XML documents must have an [XML declaration](https://developer.mozilla.org/en-US/docs/Web/XML/XML_introduction#xml_declaration), which is located in the very beginning of the XML document and which stores the encoding of the consecutive content, for example: + +`` + +By default, GroupDocs.Viewer uses it. But there is a possibility to override this character encoding, if needed. In order to do this the [`loadOptions.setCharset(...)`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/loadoptions/#setCharset-java.nio.charset.Charset-) property should be set while initializing the [`Viewer`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer/viewer/) class, as it is shown below: + +{{< tabs "Custom encoding example">}} +{{< tab "Java" >}} +```java +LoadOptions loadOpts = new LoadOptions(FileType.XML); +loadOpts.setCharset(java.nio.charset.StandardCharsets.US_ASCII); +try (Viewer viewer = new Viewer(new ByteArrayInputStream(xmlContent.getBytes(java.nio.charset.StandardCharsets.US_ASCII)), loadOpts)) { + // do some work... +} +``` +{{< /tab >}} +{{< tab "Kotlin">}} +```kotlin +val loadOpts = LoadOptions(FileType.XML) +loadOpts.charset = Charsets.US_ASCII +Viewer(ByteArrayInputStream(xmlContent.toByteArray(Charsets.US_ASCII)), loadOpts).use { viewer -> + // do some work... +} +``` +{{< /tab >}} +{{< /tabs >}} + + +The rest of this article explains features of this new XML processing module. + +## Representation + +The main task of the new XML processing module is to represent the XML markup, obtained from the user, in a structured, formatted, hierarchical view, with highlighting of every distinct entity of this XML structure. For the GroupDocs.Viewer there is no matter how human-friendly is represented the XML markup in original document: it may be divided by line breaks onto separate lines per every element, or the whole document may be stored in a single line of text; it may have indents or not, — all of this does not matter. The GroupDocs.Viewer parses the input XML document and creates a hierarchical Document Object Model (DOM), and then serializes it to the HTML, PDF, PNG or JPEG depending on user options. + +In particular, when serializing, GroupDocs.Viewer puts every XML element (node) on a new line, and with left indent, which indicates nesting of a certain element. Every entity of the XML document, — XML element, attribute, its value, text node, XML comment, CDATA section, — has its own highlighting: font style, type, color, size and so on. All quotes, used for unquoting the attribute values, are unified. + +The screenshot below shows such scenario. On the left side there is a [sample XML document](/viewer/java/sample-files/render-xml-documents/books-single-line.xml), where all content is stored within a single line, with no indents, line breaks, horizontal tabs, or even extra whitespaces. On the right side the resultant HTML file, generated by the GroupDocs.Viewer, is shown. There can be seen structured view with correct line breaks and indents, valid highlighting of every XML entity, and recognition of the URIs and email addresses. + +![Generate HTML view for input XML](/viewer/java/images/rendering-basics/render-xml-documents/XML-to-HTML.png) + +## Fix incorrect XML structure + +The World Wide Web Consortium has clearly defined what is a valid XML document, and what is not. The term “[well-formed document](https://developer.mozilla.org/en-US/docs/Web/XML/XML_introduction#correct_xml_valid_and_well-formed)” defines a those XML document, which "adheres to the syntax rules specified by the XML 1.0 specification in that it must satisfy both physical and logical structures"([ref](http://www.csdservices.com/articles/csdservices)). In particular, a valid XML document must contain only valid characters, its start tags and end tags must be matched, correctly opened and closed, elements must be properly nested, and so on. Unfortunately, not all existing XML documents are well-formed and sometimes there is a necessity to view them. Different XML markup viewers often are unable to properly show invalid XML documents. + +GroupDocs.Viewer with its new XML processing module is able to correctly parse, process, format, highlight and view even the heavily distorted XML documents. There is partial list of different damages in XML structure, which GroupDocs.Viewer can fix and process: + +- Invalid and illegal characters, including `<` and `&` characters in wrong places. +- Start tags, which are not closed. +- End tags, which are not opened. +- Interleaved (overlapped) tags. +- Start and end tags with unmatched letter cases. +- Truncated (cut from end) markup. +- Attribute names without values. +- Unquoted attribute values (without enclosing quotes). +- Attribute values, which are unquoted partially, with only opening or only closing quote. +- Attribute values, which have redundant quote inside. + +GroupDocs.Viewer detects and fixes all these and even more issues in XML markup and also writes them to the log. + +Screenshot below demonstrates this in action. [Sample XML file "InvalidXml.xml"](/viewer/java/sample-files/render-xml-documents/InvalidXml.xml) contains all possible damages, described above. In cannot be correctly formatted and highlighted by most popular XML viewing and editing applications. But with the new XML processing module the GroupDocs.Viewer fixes its structure and displays it absolutely correct. + +![Generate HTML view for input XML](/viewer/java/images/rendering-basics/render-xml-documents/XML-fixed.png) + +## Recognition of URIs and email addresses + +While processing the XML markup, the GroupDocs.Viewer scans the XML content for any valid URI, if found, represents them as external links in the resultant HTML format: by using the [A element](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a). GroupDocs.Viewer is searching for URIs in: text nodes, CDATA sections, XML comments, attribute values, DocType definitions. + +Regarding the email addresses, the GroupDocs.Viewer searches them only in attribute values, and if found, represents them with [mailto](https://en.wikipedia.org/wiki/Mailto) scheme and [A element](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a). + +If the XML document is saved not to the HTML format, but to the PDF, the URIs and email addresses will be interactive too. But if the output format is PNG or JPEG, the output will be a raster image without any interactive links, of course. + +## Saving to HTML format + +For saving the documents to the HTML format the GroupDocs.Viewer provides a [`HtmlViewOptions`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/htmlviewoptions/) class. There are two ways of creating an instance of this class: using either [`forExternalResources`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/htmlviewoptions/#forExternalResources--) or [`forEmbeddedResources`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/htmlviewoptions/#forEmbeddedResources--) static methods. First method is designed for saving HTML document in a such way, that all its resources (stylesheets, images, fonts etc) are stored separately, while second method stores all resources of the HTML document inside its content: stylesheets are saved inside the STYLE elements, SVG graphics is inlined inside HTML markup, while all other resources (mostly raster images and fonts) are stored according to the [data URI scheme](https://en.wikipedia.org/wiki/Data_URI_scheme) and converted to the [base64](https://en.wikipedia.org/wiki/Base64) format. + +But in the context of the XML documents the way of creating the [`HtmlViewOptions`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/htmlviewoptions/) instance is not important, because the XML documents cannot have resources, which may be stored externally or embedded. So, when saving XML documents to the HTML, you can create the [`HtmlViewOptions`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/htmlviewoptions/) instance in both ways — the result will be the same, no external resources will be produced. + +Another important thing is that the XML format by its nature has no pages — it is a hierarchical structure, where some elements are nested inside another, and there is no even similar to pages here. So the best way to represent them in HTML format is to generate a single-page HTML document, so all XML content will be represented in a single HTML document. In order to do this the option [`htmlViewOptions.setRenderToSinglePage(...)`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/htmlviewoptions/#setRenderToSinglePage-boolean-) needs to be set up to the true value. By default, this option has a false value, so the output HTML document will be paginated — split into multiple chunks. + +Code example below shows rendering of input XML file to the HTML in both ways: + +{{< tabs "Saving to HTML example">}} +{{< tab "Java" >}} +```java +HtmlViewOptions paginatedHtmlOptions = HtmlViewOptions.forEmbeddedResources("page-{0}.html"); +HtmlViewOptions singleHtmlOptions = HtmlViewOptions.forEmbeddedResources("single-page.html"); +singleHtmlOptions.setRenderToSinglePage(true); + +String inputXmlDocument = "Sample.xml"; + +try (Viewer viewer = new Viewer(inputXmlDocument)) { + viewer.view(paginatedHtmlOptions); + viewer.view(singleHtmlOptions); +} +``` +{{< /tab >}} +{{< tab "Kotlin">}} +```kotlin +val paginatedHtmlOptions = HtmlViewOptions.forEmbeddedResources("page-{0}.html") +val singleHtmlOptions = HtmlViewOptions.forEmbeddedResources("single-page.html").apply { + isRenderToSinglePage = true +} + +val inputXmlDocument = "Sample.xml" +Viewer(inputXmlDocument).use { viewer -> + viewer.view(paginatedHtmlOptions) + viewer.view(singleHtmlOptions) +} +``` +{{< /tab >}} +{{< /tabs >}} + +All other options, which are present in the [`HtmlViewOptions`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/htmlviewoptions/) class, have no effect when saving XML to HTML, except the [`setRenderToSinglePage(...)`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/htmlviewoptions/#setRenderToSinglePage-boolean-) flag. + +## Saving to PDF format + +PDF format by its nature has pages, so if the XML content because of its big size cannot fit in the single PDF page, then it will be paginated. Unlike the HTML, PNG, or JPEG, the GroupDocs.Viewer generates only a single PDF file for a single input XML document, with one or more pages. [`PdfViewOptions`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/pdfviewoptions/) class is responsible for saving XML to the PDF, and example below shows this: + +{{< tabs "Saving to PDF example">}} +{{< tab "Java" >}} +```java +PdfViewOptions pdfOptions = new PdfViewOptions("output.pdf"); +String inputXmlDocument = "Sample.xml"; +try (Viewer viewer = new Viewer(inputXmlDocument)) { + viewer.view(pdfOptions); +} +``` +{{< /tab >}} +{{< tab "Kotlin">}} +```kotlin +val pdfOptions = PdfViewOptions("output.pdf") +val inputXmlDocument = "Sample.xml" +Viewer(inputXmlDocument).use { viewer -> + viewer.view(pdfOptions) +} +``` +{{< /tab >}} +{{< /tabs >}} + +As for the version 24.12 all options, which are present in the [`PdfViewOptions`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/pdfviewoptions/) class, have no effect when saving XML to PDF. + +## Saving to raster PNG and JPEG formats + +[`PngViewOptions`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/pngviewoptions/) and [`JpgViewOptions`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/jpgviewoptions/) are responsible for saving XML to the PNG and JPEG raster image formats. Like for the PDF, if XML content cannot fit into the area of one image, it will be paginated and spread across multiple images. + +Size of the output images will be calculated automatically based on the XML content, as for the version 24.12 there is no possibility to set the size forcibly, and `setWidth(...)`, `setHeight(...)`, `setMaxWidth(...)`, and `setMaxHeight(...)` properties of the [`PngViewOptions`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/pngviewoptions/) and [`JpgViewOptions`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/jpgviewoptions/) classes have no effect when saving XML to PNG or JPEG. + +There is a possibility to set a quality of output JPEG image by setting a [`jpgViewOptions.setQuality(...)`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/jpgviewoptions/#setQuality-byte-) instance property, which has a default value `90`. + +Example below shows saving input XML to the output PNG and JPEG: + +{{< tabs "Saving to PNG and JPEG example">}} +{{< tab "Java" >}} +```java +PngViewOptions pngOptions = new PngViewOptions("page-{0}.png"); +JpgViewOptions jpegOptions = new JpgViewOptions("page-{0}.jpeg"); +jpegOptions.setQuality((byte) 80); // setting output JPEG quality explicitly + +String inputXmlDocument = "Sample.xml"; +try (Viewer viewer = new Viewer(inputXmlDocument)) { + viewer.view(pngOptions); + viewer.view(jpegOptions); +} +``` +{{< /tab >}} +{{< tab "Kotlin">}} +```kitlin +val pngOptions = PngViewOptions("page-{0}.png") +val jpegOptions = JpgViewOptions("page-{0}.jpeg").apply { + quality = 80 +} + +val inputXmlDocument = "Sample.xml" +Viewer(inputXmlDocument).use { viewer -> + viewer.view(pngOptions) + viewer.view(jpegOptions) +} +``` +{{< /tab >}} +{{< /tabs >}} + +## Retrieving information about XML view + +Like for all other supported formats, GroupDocs.Viewer supports returning information about specific XML documents. Like for all other formats, for doing this you need to call the [`getViewInfo(...)`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer/viewer/#getViewInfo-com.groupdocs.viewer.options.ViewInfoOptions-) instance method of the [`Viewer`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer/viewer/) class, which returns an instance of [`ViewInfo`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.results/viewinfo/) class. This [`ViewInfo`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.results/viewinfo/) instance contains all information about the view depending on [`ViewInfoOptions`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.options/viewinfooptions/), passed to the [`getViewInfo(...)`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer/viewer/#getViewInfo-com.groupdocs.viewer.options.ViewInfoOptions-) method. + +Example below shows obtaining [`ViewInfo`](https://reference.groupdocs.com/viewer/java/com.groupdocs.viewer.results/viewinfo/) for a single XML document for HTML, PDF, and PNG formats. + +{{< tabs "Retrieving information about XML view example">}} +{{< tab "Java" >}} +```java +ViewInfoOptions viewInfoOptionsHtmlSingle = ViewInfoOptions.forHtmlView(true); +ViewInfoOptions viewInfoOptionsPdf = ViewInfoOptions.forPdfView(); +ViewInfoOptions viewInfoOptionsPng = ViewInfoOptions.forPngView(); + +String inputXmlDocument = "Sample.xml"; + +try (Viewer viewer = new Viewer(inputXmlDocument)) { + ViewInfo resultHtmlSingle = viewer.getViewInfo(viewInfoOptionsHtmlSingle); + ViewInfo resultPdf = viewer.getViewInfo(viewInfoOptionsPdf); + ViewInfo resultPng = viewer.getViewInfo(viewInfoOptionsPng); +} +``` +{{< /tab >}} +{{< tab "Kotlin">}} +```kotlin +val viewInfoOptionsHtmlSingle = ViewInfoOptions.forHtmlView(true) +val viewInfoOptionsPdf = ViewInfoOptions.forPdfView() +val viewInfoOptionsPng = ViewInfoOptions.forPngView() + +val inputXmlDocument = "Sample.xml" + +Viewer(inputXmlDocument).use { viewer -> + val resultHtmlSingle = viewer.getViewInfo(viewInfoOptionsHtmlSingle) + val resultPdf = viewer.getViewInfo(viewInfoOptionsPdf) + val resultPng = viewer.getViewInfo(viewInfoOptionsPng) +} +``` +{{< /tab >}} +{{< /tabs >}} + +## Conclusion + +Before the release of the [version 24.12](https://releases.groupdocs.com/viewer/java/release-notes/2024/groupdocs-viewer-for-java-24-12-release-notes/#new-xml-converter) the XML format was supported by the GroupDocs.Viewer, but XML files were treated as the plain text, without any XML-specific features like structure formatting, highlighting, proper pagination, and so on. + +Starting from the version 24.12, the new dedicated XML processing module makes XML support to be a truly powerful and useful feature, and ability to fix and display even the heavily corrupted XML documents allows to use the GroupDocs.Viewer for viewing XML documents in those cases, when all other competitors failed. + + + + + + + + + + + diff --git a/java/sample-files/developer-guide/rendering-documents/Sample1.pdf b/java/sample-files/developer-guide/rendering-documents/Sample1.pdf new file mode 100644 index 0000000..e142845 Binary files /dev/null and b/java/sample-files/developer-guide/rendering-documents/Sample1.pdf differ diff --git a/java/sample-files/developer-guide/rendering-documents/Sample2.pdf b/java/sample-files/developer-guide/rendering-documents/Sample2.pdf new file mode 100644 index 0000000..3429599 Binary files /dev/null and b/java/sample-files/developer-guide/rendering-documents/Sample2.pdf differ diff --git a/java/sample-files/render-xml-documents/InvalidXml.xml b/java/sample-files/render-xml-documents/InvalidXml.xml new file mode 100644 index 0000000..9186d8e --- /dev/null +++ b/java/sample-files/render-xml-documents/InvalidXml.xml @@ -0,0 +1 @@ +xxxyyy@live.comhttps://www.youtube.com/watch?v=euf-GKJV2S8&t=80sTrueTrue10624/http://localhost:5879/FalseFalseFalseJohn Smith]]>
B is openedI is opened inside BB is closed after opened II is closed after closing B
A tag is opened. B tag is opened. C tag is opened. B tag is closed after C was opened. D tag is opened. A tag is closed after D was opened. D tag was closed after closed A. C tag was closed after closed D.
Play with attributesSome Invalid chars: " (quote) ' (apostrophe) & (ampersand)lower casemixed capitalizationIn XML, in counterpart to HTML, names in the start (opening) and end (closing) tags must match in case-sensitive way, so this example, where start tag name is "TEXT", and end tag name is "text", shows invalid XML markup.Truncated after this!!!!!!!!! \ No newline at end of file diff --git a/java/sample-files/render-xml-documents/books-single-line.xml b/java/sample-files/render-xml-documents/books-single-line.xml new file mode 100644 index 0000000..c4c39b0 --- /dev/null +++ b/java/sample-files/render-xml-documents/books-single-line.xml @@ -0,0 +1 @@ +Gambardella, Matthewhttps://www.matthewgambardella.com/XML Developer's GuideComputer44.952000-10-01An in-depth look at creating applications with XML.Ralls, KimMidnight RainFantasy5.952000-12-16A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.Corets, EvaMaeve AscendantFantasy5.952000-11-17After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society.Corets, EvaEva]]>Oberon's LegacyFantasy5.952001-03-10In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant.Corets, EvaThe Sundered GrailFantasy5.952001-09-10The two daughters of Maeve, half-sisters, battle one another for control of England. Sequel to Oberon's Legacy.Randall, CynthiaLover BirdsRomance4.952000-09-02When Carla meets Paul at an ornithology conference, tempers fly as feathers get ruffled.Thurman, PaulaSplish SplashRomance4.952000-11-02A deep sea diver finds true love twenty thousand leagues beneath the sea.Knorr, StefanCreepy CrawliesHorror4.952000-12-06An anthology of horror stories about roaches,centipedes, scorpions and other insects.Kress, PeterParadox LostScience Fiction6.952000-11-02After an inadvertant trip through a HeisenbergUncertainty Device, James Salway discovers the problems of being quantum.O'Brien, TimMicrosoft .NET: The Programming BibleComputer36.952000-12-09Microsoft's .NET initiative is explored in detail in this deep programmer's reference.O'Brien, TimMSXML3: A Comprehensive GuideComputer36.952000-12-01The Microsoft MSXML3 parser is covered in detail, with attention to XML DOM interfaces, XSLT processing, SAX and more.Galos, MikeVisual Studio 7: A Comprehensive GuideComputer49.952001-04-16Microsoft Visual Studio 7 is explored in depth,looking at how Visual Basic, Visual C++, C#, and ASP+ are integrated into a comprehensive development environment. \ No newline at end of file