public abstract class

Collator

extends Object
implements Cloneable Comparator<T>
java.lang.Object
   ↳ java.text.Collator
Known Direct Subclasses

Class Overview

Performs locale-sensitive string comparison. A concrete subclass, RuleBasedCollator, allows customization of the collation ordering by the use of rule sets.

Following the Unicode Consortium's specifications for the Unicode Collation Algorithm (UCA), there are 4 different levels of strength used in comparisons:

  • PRIMARY strength: Typically, this is used to denote differences between base characters (for example, "a" < "b"). It is the strongest difference. For example, dictionaries are divided into different sections by base character.
  • SECONDARY strength: Accents in the characters are considered secondary differences (for example, "as" < "às" < "at"). Other differences between letters can also be considered secondary differences, depending on the language. A secondary difference is ignored when there is a primary difference anywhere in the strings.
  • TERTIARY strength: Upper and lower case differences in characters are distinguished at tertiary strength (for example, "ao" < "Ao" < "aò"). In addition, a variant of a letter differs from the base form on the tertiary strength (such as "A" and "Ⓐ"). Another example is the difference between large and small Kana. A tertiary difference is ignored when there is a primary or secondary difference anywhere in the strings.
  • IDENTICAL strength: When all other strengths are equal, the IDENTICAL strength is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared, just in case there is no difference. For example, Hebrew cantellation marks are only distinguished at this strength. This strength should be used sparingly, as only code point value differences between two strings are an extremely rare occurrence. Using this strength substantially decreases the performance for both comparison and collation key generation APIs. This strength also increases the size of the collation key.

This Collator deals only with two decomposition modes, the canonical decomposition mode and one that does not use any decomposition. The compatibility decomposition mode java.text.Collator.FULL_DECOMPOSITION is not supported here. If the canonical decomposition mode is set, Collator handles un-normalized text properly, producing the same results as if the text were normalized in NFD. If canonical decomposition is turned off, it is the user's responsibility to ensure that all text is already in the appropriate form before performing a comparison or before getting a CollationKey.

Examples:

 // Get the Collator for US English and set its strength to PRIMARY
 Collator usCollator = Collator.getInstance(Locale.US);
 usCollator.setStrength(Collator.PRIMARY);
 if (usCollator.compare("abc", "ABC") == 0) {
     System.out.println("Strings are equivalent");
 }
 

The following example shows how to compare two strings using the collator for the default locale.

 // Compare two strings in the default locale
 Collator myCollator = Collator.getInstance();
 myCollator.setDecomposition(Collator.NO_DECOMPOSITION);
 if (myCollator.compare("ˆ?", "a??") != 0) {
     System.out.println("ˆ? is not equal to a?? without decomposition");
     myCollator.setDecomposition(Collator.CANONICAL_DECOMPOSITION);
     if (myCollator.compare("ˆ?", "a??") != 0) {
         System.out.println("Error: ˆ? should be equal to a?? with decomposition");
     } else {
         System.out.println("ˆ? is equal to a?? with decomposition");
     }
 } else {
     System.out.println("Error: ˆ? should be not equal to a?? without decomposition");
 }
 

Summary

Constants
int CANONICAL_DECOMPOSITION Constant used to specify the decomposition rule.
int FULL_DECOMPOSITION Constant used to specify the decomposition rule.
int IDENTICAL Constant used to specify the collation strength.
int NO_DECOMPOSITION Constant used to specify the decomposition rule.
int PRIMARY Constant used to specify the collation strength.
int SECONDARY Constant used to specify the collation strength.
int TERTIARY Constant used to specify the collation strength.
Protected Constructors
Collator()
Constructs a new Collator instance.
Public Methods
Object clone()
Returns a new collator with the same decomposition mode and strength value as this collator.
abstract int compare(String string1, String string2)
Compares two strings to determine their relative order.
int compare(Object object1, Object object2)
Compares two objects to determine their relative order.
boolean equals(String string1, String string2)
Compares two strings using the collation rules to determine if they are equal.
boolean equals(Object object)
Compares this collator with the specified object and indicates if they are equal.
static Locale[] getAvailableLocales()
Gets the list of installed Locale objects which support Collator.
abstract CollationKey getCollationKey(String string)
Returns a CollationKey for the specified string for this collator with the current decomposition rule and strength value.
int getDecomposition()
Returns the decomposition rule for this collator.
static Collator getInstance()
Returns a Collator instance which is appropriate for the default Locale.
static Collator getInstance(Locale locale)
Returns a Collator instance which is appropriate for the specified Locale.
int getStrength()
Returns the strength value for this collator.
abstract int hashCode()
Returns an integer hash code for this collator.
void setDecomposition(int value)
Sets the decomposition rule for this collator.
void setStrength(int value)
Sets the strength value for this collator.
[Expand]
Inherited Methods
From class java.lang.Object
From interface java.util.Comparator

Constants

public static final int CANONICAL_DECOMPOSITION

Constant used to specify the decomposition rule.

Constant Value: 1 (0x00000001)

public static final int FULL_DECOMPOSITION

Constant used to specify the decomposition rule. This value for decomposition is not supported.

Constant Value: 2 (0x00000002)

public static final int IDENTICAL

Constant used to specify the collation strength.

Constant Value: 3 (0x00000003)

public static final int NO_DECOMPOSITION

Constant used to specify the decomposition rule.

Constant Value: 0 (0x00000000)

public static final int PRIMARY

Constant used to specify the collation strength.

Constant Value: 0 (0x00000000)

public static final int SECONDARY

Constant used to specify the collation strength.

Constant Value: 1 (0x00000001)

public static final int TERTIARY

Constant used to specify the collation strength.

Constant Value: 2 (0x00000002)

Protected Constructors

protected Collator ()

Constructs a new Collator instance.

Public Methods

public Object clone ()

Returns a new collator with the same decomposition mode and strength value as this collator.

Returns
  • a shallow copy of this collator.
See Also

public abstract int compare (String string1, String string2)

Compares two strings to determine their relative order.

Parameters
string1 the first string to compare.
string2 the second string to compare.
Returns
  • a negative value if string1 is less than string2, 0 if they are equal and a positive value if string1 is greater than string2.

public int compare (Object object1, Object object2)

Compares two objects to determine their relative order. The objects must be strings.

Parameters
object1 the first string to compare.
object2 the second string to compare.
Returns
  • a negative value if object1 is less than object2, 0 if they are equal, and a positive value if object1 is greater than object2.
Throws
ClassCastException if object1 or object2 is not a String.

public boolean equals (String string1, String string2)

Compares two strings using the collation rules to determine if they are equal.

Parameters
string1 the first string to compare.
string2 the second string to compare.
Returns
  • true if string1 and string2 are equal using the collation rules, false otherwise.

public boolean equals (Object object)

Compares this collator with the specified object and indicates if they are equal.

Parameters
object the object to compare with this object.
Returns
  • true if object is a Collator object and it has the same strength and decomposition values as this collator; false otherwise.
See Also

public static Locale[] getAvailableLocales ()

Gets the list of installed Locale objects which support Collator.

Returns
  • an array of Locale.

public abstract CollationKey getCollationKey (String string)

Returns a CollationKey for the specified string for this collator with the current decomposition rule and strength value.

Parameters
string the source string that is converted into a collation key.
Returns
  • the collation key for string.

public int getDecomposition ()

Returns the decomposition rule for this collator.

Returns
  • the decomposition rule, either NO_DECOMPOSITION or CANONICAL_DECOMPOSITION. FULL_DECOMPOSITION is not supported.

public static Collator getInstance ()

Returns a Collator instance which is appropriate for the default Locale.

Returns
  • the collator for the default locale.

public static Collator getInstance (Locale locale)

Returns a Collator instance which is appropriate for the specified Locale.

Parameters
locale the locale.
Returns
  • the collator for locale.

public int getStrength ()

Returns the strength value for this collator.

Returns
  • the strength value, either PRIMARY, SECONDARY, TERTIARY or IDENTICAL.

public abstract int hashCode ()

Returns an integer hash code for this collator.

Returns
  • this collator's hash code.

public void setDecomposition (int value)

Sets the decomposition rule for this collator.

Parameters
value the decomposition rule, either NO_DECOMPOSITION or CANONICAL_DECOMPOSITION. FULL_DECOMPOSITION is not supported.
Throws
IllegalArgumentException if the provided decomposition rule is not valid. This includes FULL_DECOMPOSITION.

public void setStrength (int value)

Sets the strength value for this collator.

Parameters
value the strength value, either PRIMARY, SECONDARY, TERTIARY, or IDENTICAL.
Throws
IllegalArgumentException if the provided strength value is not valid.