This Java document transformation engine is based on XSL-FO description language.
Apache FOP supports XSL-FO in version 1.1. However, a number of features described in XSL-FO v1.1 are not implemented by Apache FOP (click here for more details).
XSL-FO language
This language is used to create a document as a set of formatting objects.
Each object has a meaning (chapter, page, paragraph, list, etc.) and behavioral and rendering properties (layout, character font, border, etc.).
This language is based on XML: a document is therefore created as a tree structure of tags in XML format.
Click below for more information (non-exhaustive list):
- http://w3schools.sinsixx.com/xslfo/default.asp.htm,
- https://www.qctutorials.com/learning/xslfo/index.html,
- https://www.youscribe.com/BookReader/Index/539403/?documentId=510526
- https://www.antennahouse.com/comprehensive-xsl-fo-tutorials-and-samples-collection/,
- http://www.renderx.com/tutorial.html
Here is an example of an XSL-FO document:
<?xml version="1.0" encoding="utf-8"?> (1)
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> (2)
<fo:layout-master-set> (3)
<fo:simple-page-master master-name="A4_portrait" page-width="21cm" page-height="29.7cm">
<fo:region-body region-name="PageBody" margin="7mm"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="A4_portrait"> (4)
<fo:flow flow-name="PageBody"> (5)
<fo:block>Hello World</fo:block> (6)
</fo:flow>
</fo:page-sequence>
</fo:root>
- XML prolog (declaration). When using in APE, an XSL-FO document must be encoded in UTF-8.
- The <fo:root> tag is the root element of an XSL-FO document. It is also used to declare the namespace for the XSL-FO language. All XSL-FO tags have the prefix "fo".
- This element contains one or more declarations of page masters and page sequence masters (elements which define the layouts of single pages and page sequences). This example defines a rudimentary page master (A4 size: 21 cm x 29.7 cm) with one single area. The area has a 7 mm margin on all sides of the page.
- The document pages are grouped into sequences. Each sequence starts from a new page. The "master-reference" attribute selects an appropriate layout scheme from masters listed in <fo:layout-master-set>. Setting a reference to a page master name means that all the pages in this sequence will be formatted using this page master.
- It is the container object for all user text in the document. Everything contained in the flow will be formatted into regions on pages generated inside the page sequence. The flow name links the flow to a specific region on the page (defined in the page master). In the example, it is the "body region".
- This object corresponds to the <DIV> tag in HTML (block definition) and normally includes a paragraph of text. The XSL-FO language does not allow text to be placed directly into a flow.
An XSL-FO document may contain text and images at the same time (via an image inclusion tag).
Apache FOP supports a certain number of image formats (JPEG, PNG, SVG, etc.) according to the final output format. ↑ Top of page
Output formats
Apache FOP can transform an XSL-FO document into several output formats. The main formats are defined by their MIME (Multipurpose Internet Mail Extensions) type:
- application/pdf: "Portable Document Format". The supported version is 1.4. If the TransformDoc document is saved in a file, it is recommended that it has the .PDF extension,
- application/postscript: PostScript document format. The supported version is PostScript Level 3. If the TransformDoc document is saved in a file, it is recommended that it has the .PS extension,
- application/x-pcl: "Printer Command Language" document format. The supported version is PCL 5 (with HP GL/2 and PJL support). If the TransformDoc document is saved in a file, it is recommended that it has the .PCL extension,
- application/rtf: "Rich Text Format" document format. This format is no longer maintained by the Apache FOP 2.3 engine. If the TransformDoc document is saved in a file, it is recommended that it has the .RTF extension,
- text/plain: text document format using ASCII encoding. If the TransformDoc document is saved in a file, it is recommended that it has the .TXT extension,
- image/png: "Portable Network Graphics" image format. If the TransformDoc document is saved in a file, it is recommended that it has the .PNG extension,
- image/tiff: "Tagged Image File Format" image format. If the TransformDoc document is saved in a file, it is recommended that it has the .TIFF (or .TIF) extension.
Important
Depending on the output format, there image format and character font format support differences.
Character font management
Apache FOP is used to manage several fonts in an XSL-FO document.
Text from a tag is attached to a font via the following attributes:
- font-family: describes the font family (case sensitive),
- font-size: describes the font size,
- font-style: describes the font style (normal, italics, etc.),
- font-weight: describes the font weight (normal, bold, etc.).
Standard XSL-FO v1.1 defines other attributes (such as font-stretch for example), but these cannot be implemented in the Apache FOP 2.3 engine.
Manipulating a font in an XSL-FO document requires it to be known (and accessible) to Apache FOP.
For this, the font must:
- either be installed on the operating system,
- or be accessible in a directory known by Apache FOP.
Font management by Apache FOP is configured in the configuration file fop.xconf.
Font detection
By default, Apache FOP is configured to automatically load all the fonts installed in the operating system ("auto-detect" tag in the configuration file fop.xconf).
As this feature impacts the engine's performance in high load situations (and depends on the number of fonts installed), it has been temporarily disabled for the APE (and can be re-enabled in the fop.xconf file).
It is possible to manipulate a font that is not installed on the operating system. In this case, you need to specify the directory the font is in ("directory" tag in the fop.xconf file).
Incorporation of fonts into the document
By default, when a font is used in an XSL-FO document, it is incorporated into the result document. This can have an effect on the final document production time as well as its size.
However, the main printing formats PDF, PostScript and PCL require a certain number of fonts to be natively supported by read and print peripherals.
These fonts ("base 14 fonts") do not need to be incorporated into the document. These fonts are as follows:
- Helvetica (normal, bold, italic, bold italic),
- Times (normal, bold, italic, bold italic),
- Courier (normal, bold, italic, bold italic),
- Symbol,
- ZapfDingbats.
Apache FOP combines the following font families for each of base 14 font:
Base font |
Associated font family |
Helvetica |
Helvetica, sans-serif, SansSerif |
Times |
Times, Times Roman, Times-Roman, serif, any |
Courier |
Courier, monospace, Monospaced |
Symbol |
Symbol |
ZapfDingbats |
ZapfDingbats |
Please note that recent versions of Adobe Acrobat Reader replace "Helvetica" with "Arial" and "Times" with "Times New Roman".
GhostScript replaces "Helvetica" with "Nimbus Sans L" and "Times" with "Nimbus Roman No9 L". Other document readers may make similar font substitutions. If you need to check that there are no such substitutions, you need to specify an explicit font and integrate it into the target document.
When Apache FOP does not have a specific font (because it is not installed in the operating system or configured in the fop.xconf configuration), the font is replaced by "any". "any" is internally mapped to the Base-14 font "Times" (see above).
Each font contains a particular set of glyphs. If no glyph can be found for a given character, Apache FOP will issue a warning in the traces and use the glyph for "#" instead (if available).