Package org.docx4j.fonts.fop.util
Class CharUtilities
java.lang.Object
org.docx4j.fonts.fop.util.CharUtilities
public class CharUtilities
extends java.lang.Object
This class provides utilities to distinguish various kinds of Unicode
whitespace and to get character widths in a given FontState.
-
Field Summary
Fields Modifier and Type Field Description static charCARRIAGE_RETURNcarriage returnstatic charCODE_EOTCharacter code used to signal a character boundary in inline content, such as an inline with borders and padding or a nested block object.static intEOTCharacter class: Boundary between text runsstatic charIDEOGRAPHIC_SPACEIdeogreaphic spacestatic charLINE_SEPARATORline-separatorstatic intLINEFEEDCharacter class: Line feedstatic charLINEFEED_CHARlinefeed characterstatic charLREleft-to-right embeddingstatic charLRMleft-to-right markstatic charLROleft-to-right overridestatic charMISSING_IDEOGRAPHmissing ideographstatic charNBSPACEnon-breaking spacestatic charNEXT_LINEnext line control characterstatic intNONWHITESPACECharacter class: non-whitespacestatic charNOT_A_CHARACTERUnicode value indicating the the character is "not a character".static charNULL_CHARnull charstatic charOBJECT_REPLACEMENT_CHARACTERObject replacement characterstatic charPARAGRAPH_SEPARATORparagraph-separatorstatic charPDFpop directional formattingstatic charRLEright-to-left embeddingstatic charRLMright-to-left markstatic charRLOright-to-left overridestatic charSOFT_HYPHENsoft hyphenstatic charSPACEnormal spacestatic charTABnormal tabstatic intUCWHITESPACECharacter class: Unicode white spacestatic charWORD_JOINERword joinerstatic intXMLWHITESPACECharacter class: XML whitespacestatic charZERO_WIDTH_JOINERzero-width joinerstatic charZERO_WIDTH_NOBREAK_SPACEzero-width no-break space (= byte order mark)static charZERO_WIDTH_SPACEzero-width space -
Constructor Summary
Constructors Modifier Constructor Description protectedCharUtilities()Utility class: Constructor prevents instantiating when subclassed. -
Method Summary
Modifier and Type Method Description static java.lang.StringcharToNCRef(int c)Convert a single unicode scalar value to an XML numeric character reference.static intclassOf(int c)Return the appropriate CharClass constant for the type of the passed character.static java.lang.Iterable<java.lang.Integer>codepointsIter(java.lang.CharSequence s)Creates an iterator to iter aCharSequencecodepoints.static java.lang.Iterable<java.lang.Integer>codepointsIter(java.lang.CharSequence s, int beginIndex, int endIndex)Creates an iterator to iter a sub-CharSequence codepoints.static booleancontainsSurrogatePairAt(java.lang.CharSequence chars, int index)Tells whether there is a surrogate pair starting from the given index in theCharSequence.static java.lang.Stringformat(int c)Format character for debugging output, which it is prefixed with "0x", padded left with '0' and either 4 or 6 hex characters in width according to whether it is in the BMP or not.static intincrementIfNonBMP(int codePoint)Returns 1 if codePoint not in the BMP.static booleanisAdjustableSpace(int c)Method to determine if the character is an adjustable space.static booleanisAlphabetic(int c)Indicates whether a character is classified as "Alphabetic" by the Unicode standard.static booleanisAnySpace(int c)Determines if the character represents any kind of space.static booleanisBmpCodePoint(int codePoint)Determine whether the specified character (Unicode code point) is in then Basic Multilingual Plane (BMP).static booleanisBreakableSpace(int c)Helper method to determine if the character is a space with normal behavior.static booleanisExplicitBreak(int c)Indicates whether the given character is an explicit break-characterstatic booleanisFixedWidthSpace(int c)Method to determine if the character is a (breakable) fixed-width space.static booleanisNonBreakableSpace(int c)Method to determine if the character is a nonbreaking space.static booleanisSameSequence(java.lang.CharSequence cs1, java.lang.CharSequence cs2)Determine if two character sequences contain the same characters.static booleanisSurrogatePair(char ch)Determine if the given characters is part of a surrogate pair.static booleanisZeroWidthSpace(int c)Method to determine if the character is a zero-width space.static java.lang.StringpadLeft(java.lang.String s, int width, char pad)Pad a string S on left out to width W using padding character PAD.static java.lang.StringtoNCRefs(java.lang.String s)Convert a string to a sequence of ASCII or XML numeric character references.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Field Details
-
CODE_EOT
public static final char CODE_EOTCharacter code used to signal a character boundary in inline content, such as an inline with borders and padding or a nested block object.- See Also:
- Constant Field Values
-
UCWHITESPACE
public static final int UCWHITESPACECharacter class: Unicode white space- See Also:
- Constant Field Values
-
LINEFEED
public static final int LINEFEEDCharacter class: Line feed- See Also:
- Constant Field Values
-
EOT
public static final int EOTCharacter class: Boundary between text runs- See Also:
- Constant Field Values
-
NONWHITESPACE
public static final int NONWHITESPACECharacter class: non-whitespace- See Also:
- Constant Field Values
-
XMLWHITESPACE
public static final int XMLWHITESPACECharacter class: XML whitespace- See Also:
- Constant Field Values
-
NULL_CHAR
public static final char NULL_CHARnull char- See Also:
- Constant Field Values
-
LINEFEED_CHAR
public static final char LINEFEED_CHARlinefeed character- See Also:
- Constant Field Values
-
CARRIAGE_RETURN
public static final char CARRIAGE_RETURNcarriage return- See Also:
- Constant Field Values
-
TAB
public static final char TABnormal tab- See Also:
- Constant Field Values
-
SPACE
public static final char SPACEnormal space- See Also:
- Constant Field Values
-
NBSPACE
public static final char NBSPACEnon-breaking space- See Also:
- Constant Field Values
-
NEXT_LINE
public static final char NEXT_LINEnext line control character- See Also:
- Constant Field Values
-
ZERO_WIDTH_SPACE
public static final char ZERO_WIDTH_SPACEzero-width space- See Also:
- Constant Field Values
-
WORD_JOINER
public static final char WORD_JOINERword joiner- See Also:
- Constant Field Values
-
ZERO_WIDTH_JOINER
public static final char ZERO_WIDTH_JOINERzero-width joiner- See Also:
- Constant Field Values
-
LRM
public static final char LRMleft-to-right mark- See Also:
- Constant Field Values
-
RLM
public static final char RLMright-to-left mark- See Also:
- Constant Field Values
-
LRE
public static final char LREleft-to-right embedding- See Also:
- Constant Field Values
-
RLE
public static final char RLEright-to-left embedding- See Also:
- Constant Field Values
-
PDF
public static final char PDFpop directional formatting- See Also:
- Constant Field Values
-
LRO
public static final char LROleft-to-right override- See Also:
- Constant Field Values
-
RLO
public static final char RLOright-to-left override- See Also:
- Constant Field Values
-
ZERO_WIDTH_NOBREAK_SPACE
public static final char ZERO_WIDTH_NOBREAK_SPACEzero-width no-break space (= byte order mark)- See Also:
- Constant Field Values
-
SOFT_HYPHEN
public static final char SOFT_HYPHENsoft hyphen- See Also:
- Constant Field Values
-
LINE_SEPARATOR
public static final char LINE_SEPARATORline-separator- See Also:
- Constant Field Values
-
PARAGRAPH_SEPARATOR
public static final char PARAGRAPH_SEPARATORparagraph-separator- See Also:
- Constant Field Values
-
MISSING_IDEOGRAPH
public static final char MISSING_IDEOGRAPHmissing ideograph- See Also:
- Constant Field Values
-
IDEOGRAPHIC_SPACE
public static final char IDEOGRAPHIC_SPACEIdeogreaphic space- See Also:
- Constant Field Values
-
OBJECT_REPLACEMENT_CHARACTER
public static final char OBJECT_REPLACEMENT_CHARACTERObject replacement character- See Also:
- Constant Field Values
-
NOT_A_CHARACTER
public static final char NOT_A_CHARACTERUnicode value indicating the the character is "not a character".- See Also:
- Constant Field Values
-
-
Constructor Details
-
CharUtilities
protected CharUtilities()Utility class: Constructor prevents instantiating when subclassed.
-
-
Method Details
-
classOf
public static int classOf(int c)Return the appropriate CharClass constant for the type of the passed character.- Parameters:
c- character to inspect- Returns:
- the determined character class
-
isBreakableSpace
public static boolean isBreakableSpace(int c)Helper method to determine if the character is a space with normal behavior. Normal behavior means that it's not non-breaking.- Parameters:
c- character to inspect- Returns:
- True if the character is a normal space
-
isZeroWidthSpace
public static boolean isZeroWidthSpace(int c)Method to determine if the character is a zero-width space.- Parameters:
c- the character to check- Returns:
- true if the character is a zero-width space
-
isFixedWidthSpace
public static boolean isFixedWidthSpace(int c)Method to determine if the character is a (breakable) fixed-width space.- Parameters:
c- the character to check- Returns:
- true if the character has a fixed-width
-
isNonBreakableSpace
public static boolean isNonBreakableSpace(int c)Method to determine if the character is a nonbreaking space.- Parameters:
c- character to check- Returns:
- True if the character is a nbsp
-
isAdjustableSpace
public static boolean isAdjustableSpace(int c)Method to determine if the character is an adjustable space.- Parameters:
c- character to check- Returns:
- True if the character is adjustable
-
isAnySpace
public static boolean isAnySpace(int c)Determines if the character represents any kind of space.- Parameters:
c- character to check- Returns:
- True if the character represents any kind of space
-
isAlphabetic
public static boolean isAlphabetic(int c)Indicates whether a character is classified as "Alphabetic" by the Unicode standard.- Parameters:
c- the character- Returns:
- true if the character is "Alphabetic"
-
isExplicitBreak
public static boolean isExplicitBreak(int c)Indicates whether the given character is an explicit break-character- Parameters:
c- the character to check- Returns:
- true if the character represents an explicit break
-
charToNCRef
public static java.lang.String charToNCRef(int c)Convert a single unicode scalar value to an XML numeric character reference. If in the BMP, four digits are used, otherwise 6 digits are used.- Parameters:
c- a unicode scalar value- Returns:
- a string representing a numeric character reference
-
toNCRefs
public static java.lang.String toNCRefs(java.lang.String s)Convert a string to a sequence of ASCII or XML numeric character references.- Parameters:
s- a java string (encoded in UTF-16)- Returns:
- a string representing a sequence of numeric character reference or ASCII characters
-
padLeft
public static java.lang.String padLeft(java.lang.String s, int width, char pad)Pad a string S on left out to width W using padding character PAD.- Parameters:
s- string to padwidth- width of field to add paddingpad- character to use for padding- Returns:
- padded string
-
format
public static java.lang.String format(int c)Format character for debugging output, which it is prefixed with "0x", padded left with '0' and either 4 or 6 hex characters in width according to whether it is in the BMP or not.- Parameters:
c- character code- Returns:
- formatted character string
-
isSameSequence
public static boolean isSameSequence(java.lang.CharSequence cs1, java.lang.CharSequence cs2)Determine if two character sequences contain the same characters.- Parameters:
cs1- first character sequencecs2- second character sequence- Returns:
- true if both sequences have same length and same character sequence
-
isBmpCodePoint
public static boolean isBmpCodePoint(int codePoint)Determine whether the specified character (Unicode code point) is in then Basic Multilingual Plane (BMP). Such code points can be represented using a singlechar.- Parameters:
codePoint- the character (Unicode code point) to be tested- Returns:
trueif the specified code point is between Character#MIN_VALUE and Character#MAX_VALUE} inclusive;falseotherwise- See Also:
from Java 1.7
-
incrementIfNonBMP
public static int incrementIfNonBMP(int codePoint)Returns 1 if codePoint not in the BMP. This function is particularly useful in for loops over strings where, in presence of surrogate pairs, you need to skip one loop.- Parameters:
codePoint- 1 if codePoint > 0xFFFF, 0 otherwise- Returns:
- 1 if codePoint > 0xFFFF, 0 otherwise
-
isSurrogatePair
public static boolean isSurrogatePair(char ch)Determine if the given characters is part of a surrogate pair.- Parameters:
ch- character to be checked- Returns:
- true if ch is an high surrogate or a low surrogate
-
containsSurrogatePairAt
public static boolean containsSurrogatePairAt(java.lang.CharSequence chars, int index)Tells whether there is a surrogate pair starting from the given index in theCharSequence. If the character at index is an high surrogate then the character at index+1 is checked to be a low surrogate. If a malformed surrogate pair is encountered then anIllegalArgumentExceptionis thrown.high surrogate [0xD800 - 0xDC00] low surrogate [0xDC00 - 0xE000]
- Parameters:
chars- CharSequence to checkindex- index in the CharSequqnce where to start the check- Returns:
- true if there is a well-formed surrogate pair at index
- Throws:
java.lang.IllegalArgumentException- if there wrong usage of surrogate pairs
-
codepointsIter
public static java.lang.Iterable<java.lang.Integer> codepointsIter(java.lang.CharSequence s)Creates an iterator to iter aCharSequencecodepoints.- Parameters:
s-CharSequenceto iter- Returns:
- codepoint iterator for the given
CharSequence. - See Also:
codepointsIter(CharSequence, int, int)
-
codepointsIter
public static java.lang.Iterable<java.lang.Integer> codepointsIter(java.lang.CharSequence s, int beginIndex, int endIndex)Creates an iterator to iter a sub-CharSequence codepoints.- Parameters:
s-CharSequenceto iterbeginIndex- lower rangeendIndex- upper range- Returns:
- codepoint iterator for the given sub-CharSequence.
- See Also:
- Bug JDK-5003547
-