|
||||
File indexing completed on 2025-01-18 10:13:06
0001 // © 2016 and later: Unicode, Inc. and others. 0002 // License & terms of use: http://www.unicode.org/copyright.html 0003 /* 0004 ******************************************************************** 0005 * COPYRIGHT: 0006 * Copyright (c) 1996-2015, International Business Machines Corporation and 0007 * others. All Rights Reserved. 0008 ******************************************************************** 0009 */ 0010 0011 #ifndef NORMLZR_H 0012 #define NORMLZR_H 0013 0014 #include "unicode/utypes.h" 0015 0016 #if U_SHOW_CPLUSPLUS_API 0017 0018 /** 0019 * \file 0020 * \brief C++ API: Unicode Normalization 0021 */ 0022 0023 #if !UCONFIG_NO_NORMALIZATION 0024 0025 #include "unicode/chariter.h" 0026 #include "unicode/normalizer2.h" 0027 #include "unicode/unistr.h" 0028 #include "unicode/unorm.h" 0029 #include "unicode/uobject.h" 0030 0031 U_NAMESPACE_BEGIN 0032 /** 0033 * Old Unicode normalization API. 0034 * 0035 * This API has been replaced by the Normalizer2 class and is only available 0036 * for backward compatibility. This class simply delegates to the Normalizer2 class. 0037 * There is one exception: The new API does not provide a replacement for Normalizer::compare(). 0038 * 0039 * The Normalizer class supports the standard normalization forms described in 0040 * <a href="http://www.unicode.org/unicode/reports/tr15/" target="unicode"> 0041 * Unicode Standard Annex #15: Unicode Normalization Forms</a>. 0042 * 0043 * The Normalizer class consists of two parts: 0044 * - static functions that normalize strings or test if strings are normalized 0045 * - a Normalizer object is an iterator that takes any kind of text and 0046 * provides iteration over its normalized form 0047 * 0048 * The Normalizer class is not suitable for subclassing. 0049 * 0050 * For basic information about normalization forms and details about the C API 0051 * please see the documentation in unorm.h. 0052 * 0053 * The iterator API with the Normalizer constructors and the non-static functions 0054 * use a CharacterIterator as input. It is possible to pass a string which 0055 * is then internally wrapped in a CharacterIterator. 0056 * The input text is not normalized all at once, but incrementally where needed 0057 * (providing efficient random access). 0058 * This allows to pass in a large text but spend only a small amount of time 0059 * normalizing a small part of that text. 0060 * However, if the entire text is normalized, then the iterator will be 0061 * slower than normalizing the entire text at once and iterating over the result. 0062 * A possible use of the Normalizer iterator is also to report an index into the 0063 * original text that is close to where the normalized characters come from. 0064 * 0065 * <em>Important:</em> The iterator API was cleaned up significantly for ICU 2.0. 0066 * The earlier implementation reported the getIndex() inconsistently, 0067 * and previous() could not be used after setIndex(), next(), first(), and current(). 0068 * 0069 * Normalizer allows to start normalizing from anywhere in the input text by 0070 * calling setIndexOnly(), first(), or last(). 0071 * Without calling any of these, the iterator will start at the beginning of the text. 0072 * 0073 * At any time, next() returns the next normalized code point (UChar32), 0074 * with post-increment semantics (like CharacterIterator::next32PostInc()). 0075 * previous() returns the previous normalized code point (UChar32), 0076 * with pre-decrement semantics (like CharacterIterator::previous32()). 0077 * 0078 * current() returns the current code point 0079 * (respectively the one at the newly set index) without moving 0080 * the getIndex(). Note that if the text at the current position 0081 * needs to be normalized, then these functions will do that. 0082 * (This is why current() is not const.) 0083 * It is more efficient to call setIndexOnly() instead, which does not 0084 * normalize. 0085 * 0086 * getIndex() always refers to the position in the input text where the normalized 0087 * code points are returned from. It does not always change with each returned 0088 * code point. 0089 * The code point that is returned from any of the functions 0090 * corresponds to text at or after getIndex(), according to the 0091 * function's iteration semantics (post-increment or pre-decrement). 0092 * 0093 * next() returns a code point from at or after the getIndex() 0094 * from before the next() call. After the next() call, the getIndex() 0095 * might have moved to where the next code point will be returned from 0096 * (from a next() or current() call). 0097 * This is semantically equivalent to array access with array[index++] 0098 * (post-increment semantics). 0099 * 0100 * previous() returns a code point from at or after the getIndex() 0101 * from after the previous() call. 0102 * This is semantically equivalent to array access with array[--index] 0103 * (pre-decrement semantics). 0104 * 0105 * Internally, the Normalizer iterator normalizes a small piece of text 0106 * starting at the getIndex() and ending at a following "safe" index. 0107 * The normalized results is stored in an internal string buffer, and 0108 * the code points are iterated from there. 0109 * With multiple iteration calls, this is repeated until the next piece 0110 * of text needs to be normalized, and the getIndex() needs to be moved. 0111 * 0112 * The following "safe" index, the internal buffer, and the secondary 0113 * iteration index into that buffer are not exposed on the API. 0114 * This also means that it is currently not practical to return to 0115 * a particular, arbitrary position in the text because one would need to 0116 * know, and be able to set, in addition to the getIndex(), at least also the 0117 * current index into the internal buffer. 0118 * It is currently only possible to observe when getIndex() changes 0119 * (with careful consideration of the iteration semantics), 0120 * at which time the internal index will be 0. 0121 * For example, if getIndex() is different after next() than before it, 0122 * then the internal index is 0 and one can return to this getIndex() 0123 * later with setIndexOnly(). 0124 * 0125 * Note: While the setIndex() and getIndex() refer to indices in the 0126 * underlying Unicode input text, the next() and previous() methods 0127 * iterate through characters in the normalized output. 0128 * This means that there is not necessarily a one-to-one correspondence 0129 * between characters returned by next() and previous() and the indices 0130 * passed to and returned from setIndex() and getIndex(). 0131 * It is for this reason that Normalizer does not implement the CharacterIterator interface. 0132 * 0133 * @author Laura Werner, Mark Davis, Markus Scherer 0134 * @stable ICU 2.0 0135 */ 0136 class U_COMMON_API Normalizer : public UObject { 0137 public: 0138 #ifndef U_HIDE_DEPRECATED_API 0139 /** 0140 * If DONE is returned from an iteration function that returns a code point, 0141 * then there are no more normalization results available. 0142 * @deprecated ICU 56 Use Normalizer2 instead. 0143 */ 0144 enum { 0145 DONE=0xffff 0146 }; 0147 0148 // Constructors 0149 0150 /** 0151 * Creates a new <code>Normalizer</code> object for iterating over the 0152 * normalized form of a given string. 0153 * <p> 0154 * @param str The string to be normalized. The normalization 0155 * will start at the beginning of the string. 0156 * 0157 * @param mode The normalization mode. 0158 * @deprecated ICU 56 Use Normalizer2 instead. 0159 */ 0160 Normalizer(const UnicodeString& str, UNormalizationMode mode); 0161 0162 /** 0163 * Creates a new <code>Normalizer</code> object for iterating over the 0164 * normalized form of a given string. 0165 * <p> 0166 * @param str The string to be normalized. The normalization 0167 * will start at the beginning of the string. 0168 * 0169 * @param length Length of the string, or -1 if NUL-terminated. 0170 * @param mode The normalization mode. 0171 * @deprecated ICU 56 Use Normalizer2 instead. 0172 */ 0173 Normalizer(ConstChar16Ptr str, int32_t length, UNormalizationMode mode); 0174 0175 /** 0176 * Creates a new <code>Normalizer</code> object for iterating over the 0177 * normalized form of the given text. 0178 * <p> 0179 * @param iter The input text to be normalized. The normalization 0180 * will start at the beginning of the string. 0181 * 0182 * @param mode The normalization mode. 0183 * @deprecated ICU 56 Use Normalizer2 instead. 0184 */ 0185 Normalizer(const CharacterIterator& iter, UNormalizationMode mode); 0186 #endif /* U_HIDE_DEPRECATED_API */ 0187 0188 #ifndef U_FORCE_HIDE_DEPRECATED_API 0189 /** 0190 * Copy constructor. 0191 * @param copy The object to be copied. 0192 * @deprecated ICU 56 Use Normalizer2 instead. 0193 */ 0194 Normalizer(const Normalizer& copy); 0195 0196 /** 0197 * Destructor 0198 * @deprecated ICU 56 Use Normalizer2 instead. 0199 */ 0200 virtual ~Normalizer(); 0201 #endif // U_FORCE_HIDE_DEPRECATED_API 0202 0203 //------------------------------------------------------------------------- 0204 // Static utility methods 0205 //------------------------------------------------------------------------- 0206 0207 #ifndef U_HIDE_DEPRECATED_API 0208 /** 0209 * Normalizes a <code>UnicodeString</code> according to the specified normalization mode. 0210 * This is a wrapper for unorm_normalize(), using UnicodeString's. 0211 * 0212 * The <code>options</code> parameter specifies which optional 0213 * <code>Normalizer</code> features are to be enabled for this operation. 0214 * 0215 * @param source the input string to be normalized. 0216 * @param mode the normalization mode 0217 * @param options the optional features to be enabled (0 for no options) 0218 * @param result The normalized string (on output). 0219 * @param status The error code. 0220 * @deprecated ICU 56 Use Normalizer2 instead. 0221 */ 0222 static void U_EXPORT2 normalize(const UnicodeString& source, 0223 UNormalizationMode mode, int32_t options, 0224 UnicodeString& result, 0225 UErrorCode &status); 0226 0227 /** 0228 * Compose a <code>UnicodeString</code>. 0229 * This is equivalent to normalize() with mode UNORM_NFC or UNORM_NFKC. 0230 * This is a wrapper for unorm_normalize(), using UnicodeString's. 0231 * 0232 * The <code>options</code> parameter specifies which optional 0233 * <code>Normalizer</code> features are to be enabled for this operation. 0234 * 0235 * @param source the string to be composed. 0236 * @param compat Perform compatibility decomposition before composition. 0237 * If this argument is <code>false</code>, only canonical 0238 * decomposition will be performed. 0239 * @param options the optional features to be enabled (0 for no options) 0240 * @param result The composed string (on output). 0241 * @param status The error code. 0242 * @deprecated ICU 56 Use Normalizer2 instead. 0243 */ 0244 static void U_EXPORT2 compose(const UnicodeString& source, 0245 UBool compat, int32_t options, 0246 UnicodeString& result, 0247 UErrorCode &status); 0248 0249 /** 0250 * Static method to decompose a <code>UnicodeString</code>. 0251 * This is equivalent to normalize() with mode UNORM_NFD or UNORM_NFKD. 0252 * This is a wrapper for unorm_normalize(), using UnicodeString's. 0253 * 0254 * The <code>options</code> parameter specifies which optional 0255 * <code>Normalizer</code> features are to be enabled for this operation. 0256 * 0257 * @param source the string to be decomposed. 0258 * @param compat Perform compatibility decomposition. 0259 * If this argument is <code>false</code>, only canonical 0260 * decomposition will be performed. 0261 * @param options the optional features to be enabled (0 for no options) 0262 * @param result The decomposed string (on output). 0263 * @param status The error code. 0264 * @deprecated ICU 56 Use Normalizer2 instead. 0265 */ 0266 static void U_EXPORT2 decompose(const UnicodeString& source, 0267 UBool compat, int32_t options, 0268 UnicodeString& result, 0269 UErrorCode &status); 0270 0271 /** 0272 * Performing quick check on a string, to quickly determine if the string is 0273 * in a particular normalization format. 0274 * This is a wrapper for unorm_quickCheck(), using a UnicodeString. 0275 * 0276 * Three types of result can be returned UNORM_YES, UNORM_NO or 0277 * UNORM_MAYBE. Result UNORM_YES indicates that the argument 0278 * string is in the desired normalized format, UNORM_NO determines that 0279 * argument string is not in the desired normalized format. A 0280 * UNORM_MAYBE result indicates that a more thorough check is required, 0281 * the user may have to put the string in its normalized form and compare the 0282 * results. 0283 * @param source string for determining if it is in a normalized format 0284 * @param mode normalization format 0285 * @param status A reference to a UErrorCode to receive any errors 0286 * @return UNORM_YES, UNORM_NO or UNORM_MAYBE 0287 * 0288 * @see isNormalized 0289 * @deprecated ICU 56 Use Normalizer2 instead. 0290 */ 0291 static inline UNormalizationCheckResult 0292 quickCheck(const UnicodeString &source, UNormalizationMode mode, UErrorCode &status); 0293 0294 /** 0295 * Performing quick check on a string; same as the other version of quickCheck 0296 * but takes an extra options parameter like most normalization functions. 0297 * 0298 * @param source string for determining if it is in a normalized format 0299 * @param mode normalization format 0300 * @param options the optional features to be enabled (0 for no options) 0301 * @param status A reference to a UErrorCode to receive any errors 0302 * @return UNORM_YES, UNORM_NO or UNORM_MAYBE 0303 * 0304 * @see isNormalized 0305 * @deprecated ICU 56 Use Normalizer2 instead. 0306 */ 0307 static UNormalizationCheckResult 0308 quickCheck(const UnicodeString &source, UNormalizationMode mode, int32_t options, UErrorCode &status); 0309 0310 /** 0311 * Test if a string is in a given normalization form. 0312 * This is semantically equivalent to source.equals(normalize(source, mode)) . 0313 * 0314 * Unlike unorm_quickCheck(), this function returns a definitive result, 0315 * never a "maybe". 0316 * For NFD, NFKD, and FCD, both functions work exactly the same. 0317 * For NFC and NFKC where quickCheck may return "maybe", this function will 0318 * perform further tests to arrive at a true/false result. 0319 * 0320 * @param src String that is to be tested if it is in a normalization format. 0321 * @param mode Which normalization form to test for. 0322 * @param errorCode ICU error code in/out parameter. 0323 * Must fulfill U_SUCCESS before the function call. 0324 * @return Boolean value indicating whether the source string is in the 0325 * "mode" normalization form. 0326 * 0327 * @see quickCheck 0328 * @deprecated ICU 56 Use Normalizer2 instead. 0329 */ 0330 static inline UBool 0331 isNormalized(const UnicodeString &src, UNormalizationMode mode, UErrorCode &errorCode); 0332 0333 /** 0334 * Test if a string is in a given normalization form; same as the other version of isNormalized 0335 * but takes an extra options parameter like most normalization functions. 0336 * 0337 * @param src String that is to be tested if it is in a normalization format. 0338 * @param mode Which normalization form to test for. 0339 * @param options the optional features to be enabled (0 for no options) 0340 * @param errorCode ICU error code in/out parameter. 0341 * Must fulfill U_SUCCESS before the function call. 0342 * @return Boolean value indicating whether the source string is in the 0343 * "mode" normalization form. 0344 * 0345 * @see quickCheck 0346 * @deprecated ICU 56 Use Normalizer2 instead. 0347 */ 0348 static UBool 0349 isNormalized(const UnicodeString &src, UNormalizationMode mode, int32_t options, UErrorCode &errorCode); 0350 0351 /** 0352 * Concatenate normalized strings, making sure that the result is normalized as well. 0353 * 0354 * If both the left and the right strings are in 0355 * the normalization form according to "mode/options", 0356 * then the result will be 0357 * 0358 * \code 0359 * dest=normalize(left+right, mode, options) 0360 * \endcode 0361 * 0362 * For details see unorm_concatenate in unorm.h. 0363 * 0364 * @param left Left source string. 0365 * @param right Right source string. 0366 * @param result The output string. 0367 * @param mode The normalization mode. 0368 * @param options A bit set of normalization options. 0369 * @param errorCode ICU error code in/out parameter. 0370 * Must fulfill U_SUCCESS before the function call. 0371 * @return result 0372 * 0373 * @see unorm_concatenate 0374 * @see normalize 0375 * @see unorm_next 0376 * @see unorm_previous 0377 * 0378 * @deprecated ICU 56 Use Normalizer2 instead. 0379 */ 0380 static UnicodeString & 0381 U_EXPORT2 concatenate(const UnicodeString &left, const UnicodeString &right, 0382 UnicodeString &result, 0383 UNormalizationMode mode, int32_t options, 0384 UErrorCode &errorCode); 0385 #endif /* U_HIDE_DEPRECATED_API */ 0386 0387 /** 0388 * Compare two strings for canonical equivalence. 0389 * Further options include case-insensitive comparison and 0390 * code point order (as opposed to code unit order). 0391 * 0392 * Canonical equivalence between two strings is defined as their normalized 0393 * forms (NFD or NFC) being identical. 0394 * This function compares strings incrementally instead of normalizing 0395 * (and optionally case-folding) both strings entirely, 0396 * improving performance significantly. 0397 * 0398 * Bulk normalization is only necessary if the strings do not fulfill the FCD 0399 * conditions. Only in this case, and only if the strings are relatively long, 0400 * is memory allocated temporarily. 0401 * For FCD strings and short non-FCD strings there is no memory allocation. 0402 * 0403 * Semantically, this is equivalent to 0404 * strcmp[CodePointOrder](NFD(foldCase(s1)), NFD(foldCase(s2))) 0405 * where code point order and foldCase are all optional. 0406 * 0407 * UAX 21 2.5 Caseless Matching specifies that for a canonical caseless match 0408 * the case folding must be performed first, then the normalization. 0409 * 0410 * @param s1 First source string. 0411 * @param s2 Second source string. 0412 * 0413 * @param options A bit set of options: 0414 * - U_FOLD_CASE_DEFAULT or 0 is used for default options: 0415 * Case-sensitive comparison in code unit order, and the input strings 0416 * are quick-checked for FCD. 0417 * 0418 * - UNORM_INPUT_IS_FCD 0419 * Set if the caller knows that both s1 and s2 fulfill the FCD conditions. 0420 * If not set, the function will quickCheck for FCD 0421 * and normalize if necessary. 0422 * 0423 * - U_COMPARE_CODE_POINT_ORDER 0424 * Set to choose code point order instead of code unit order 0425 * (see u_strCompare for details). 0426 * 0427 * - U_COMPARE_IGNORE_CASE 0428 * Set to compare strings case-insensitively using case folding, 0429 * instead of case-sensitively. 0430 * If set, then the following case folding options are used. 0431 * 0432 * - Options as used with case-insensitive comparisons, currently: 0433 * 0434 * - U_FOLD_CASE_EXCLUDE_SPECIAL_I 0435 * (see u_strCaseCompare for details) 0436 * 0437 * - regular normalization options shifted left by UNORM_COMPARE_NORM_OPTIONS_SHIFT 0438 * 0439 * @param errorCode ICU error code in/out parameter. 0440 * Must fulfill U_SUCCESS before the function call. 0441 * @return <0 or 0 or >0 as usual for string comparisons 0442 * 0443 * @see unorm_compare 0444 * @see normalize 0445 * @see UNORM_FCD 0446 * @see u_strCompare 0447 * @see u_strCaseCompare 0448 * 0449 * @stable ICU 2.2 0450 */ 0451 static inline int32_t 0452 compare(const UnicodeString &s1, const UnicodeString &s2, 0453 uint32_t options, 0454 UErrorCode &errorCode); 0455 0456 #ifndef U_HIDE_DEPRECATED_API 0457 //------------------------------------------------------------------------- 0458 // Iteration API 0459 //------------------------------------------------------------------------- 0460 0461 /** 0462 * Return the current character in the normalized text. 0463 * current() may need to normalize some text at getIndex(). 0464 * The getIndex() is not changed. 0465 * 0466 * @return the current normalized code point 0467 * @deprecated ICU 56 Use Normalizer2 instead. 0468 */ 0469 UChar32 current(void); 0470 0471 /** 0472 * Return the first character in the normalized text. 0473 * This is equivalent to setIndexOnly(startIndex()) followed by next(). 0474 * (Post-increment semantics.) 0475 * 0476 * @return the first normalized code point 0477 * @deprecated ICU 56 Use Normalizer2 instead. 0478 */ 0479 UChar32 first(void); 0480 0481 /** 0482 * Return the last character in the normalized text. 0483 * This is equivalent to setIndexOnly(endIndex()) followed by previous(). 0484 * (Pre-decrement semantics.) 0485 * 0486 * @return the last normalized code point 0487 * @deprecated ICU 56 Use Normalizer2 instead. 0488 */ 0489 UChar32 last(void); 0490 0491 /** 0492 * Return the next character in the normalized text. 0493 * (Post-increment semantics.) 0494 * If the end of the text has already been reached, DONE is returned. 0495 * The DONE value could be confused with a U+FFFF non-character code point 0496 * in the text. If this is possible, you can test getIndex()<endIndex() 0497 * before calling next(), or (getIndex()<endIndex() || last()!=DONE) 0498 * after calling next(). (Calling last() will change the iterator state!) 0499 * 0500 * The C API unorm_next() is more efficient and does not have this ambiguity. 0501 * 0502 * @return the next normalized code point 0503 * @deprecated ICU 56 Use Normalizer2 instead. 0504 */ 0505 UChar32 next(void); 0506 0507 /** 0508 * Return the previous character in the normalized text and decrement. 0509 * (Pre-decrement semantics.) 0510 * If the beginning of the text has already been reached, DONE is returned. 0511 * The DONE value could be confused with a U+FFFF non-character code point 0512 * in the text. If this is possible, you can test 0513 * (getIndex()>startIndex() || first()!=DONE). (Calling first() will change 0514 * the iterator state!) 0515 * 0516 * The C API unorm_previous() is more efficient and does not have this ambiguity. 0517 * 0518 * @return the previous normalized code point 0519 * @deprecated ICU 56 Use Normalizer2 instead. 0520 */ 0521 UChar32 previous(void); 0522 0523 /** 0524 * Set the iteration position in the input text that is being normalized, 0525 * without any immediate normalization. 0526 * After setIndexOnly(), getIndex() will return the same index that is 0527 * specified here. 0528 * 0529 * @param index the desired index in the input text. 0530 * @deprecated ICU 56 Use Normalizer2 instead. 0531 */ 0532 void setIndexOnly(int32_t index); 0533 0534 /** 0535 * Reset the index to the beginning of the text. 0536 * This is equivalent to setIndexOnly(startIndex)). 0537 * @deprecated ICU 56 Use Normalizer2 instead. 0538 */ 0539 void reset(void); 0540 0541 /** 0542 * Retrieve the current iteration position in the input text that is 0543 * being normalized. 0544 * 0545 * A following call to next() will return a normalized code point from 0546 * the input text at or after this index. 0547 * 0548 * After a call to previous(), getIndex() will point at or before the 0549 * position in the input text where the normalized code point 0550 * was returned from with previous(). 0551 * 0552 * @return the current index in the input text 0553 * @deprecated ICU 56 Use Normalizer2 instead. 0554 */ 0555 int32_t getIndex(void) const; 0556 0557 /** 0558 * Retrieve the index of the start of the input text. This is the begin index 0559 * of the <code>CharacterIterator</code> or the start (i.e. index 0) of the string 0560 * over which this <code>Normalizer</code> is iterating. 0561 * 0562 * @return the smallest index in the input text where the Normalizer operates 0563 * @deprecated ICU 56 Use Normalizer2 instead. 0564 */ 0565 int32_t startIndex(void) const; 0566 0567 /** 0568 * Retrieve the index of the end of the input text. This is the end index 0569 * of the <code>CharacterIterator</code> or the length of the string 0570 * over which this <code>Normalizer</code> is iterating. 0571 * This end index is exclusive, i.e., the Normalizer operates only on characters 0572 * before this index. 0573 * 0574 * @return the first index in the input text where the Normalizer does not operate 0575 * @deprecated ICU 56 Use Normalizer2 instead. 0576 */ 0577 int32_t endIndex(void) const; 0578 0579 /** 0580 * Returns true when both iterators refer to the same character in the same 0581 * input text. 0582 * 0583 * @param that a Normalizer object to compare this one to 0584 * @return comparison result 0585 * @deprecated ICU 56 Use Normalizer2 instead. 0586 */ 0587 bool operator==(const Normalizer& that) const; 0588 0589 /** 0590 * Returns false when both iterators refer to the same character in the same 0591 * input text. 0592 * 0593 * @param that a Normalizer object to compare this one to 0594 * @return comparison result 0595 * @deprecated ICU 56 Use Normalizer2 instead. 0596 */ 0597 inline bool operator!=(const Normalizer& that) const; 0598 0599 /** 0600 * Returns a pointer to a new Normalizer that is a clone of this one. 0601 * The caller is responsible for deleting the new clone. 0602 * @return a pointer to a new Normalizer 0603 * @deprecated ICU 56 Use Normalizer2 instead. 0604 */ 0605 Normalizer* clone() const; 0606 0607 /** 0608 * Generates a hash code for this iterator. 0609 * 0610 * @return the hash code 0611 * @deprecated ICU 56 Use Normalizer2 instead. 0612 */ 0613 int32_t hashCode(void) const; 0614 0615 //------------------------------------------------------------------------- 0616 // Property access methods 0617 //------------------------------------------------------------------------- 0618 0619 /** 0620 * Set the normalization mode for this object. 0621 * <p> 0622 * <b>Note:</b>If the normalization mode is changed while iterating 0623 * over a string, calls to {@link #next() } and {@link #previous() } may 0624 * return previously buffers characters in the old normalization mode 0625 * until the iteration is able to re-sync at the next base character. 0626 * It is safest to call {@link #setIndexOnly }, {@link #reset() }, 0627 * {@link #setText }, {@link #first() }, 0628 * {@link #last() }, etc. after calling <code>setMode</code>. 0629 * <p> 0630 * @param newMode the new mode for this <code>Normalizer</code>. 0631 * @see #getUMode 0632 * @deprecated ICU 56 Use Normalizer2 instead. 0633 */ 0634 void setMode(UNormalizationMode newMode); 0635 0636 /** 0637 * Return the normalization mode for this object. 0638 * 0639 * This is an unusual name because there used to be a getMode() that 0640 * returned a different type. 0641 * 0642 * @return the mode for this <code>Normalizer</code> 0643 * @see #setMode 0644 * @deprecated ICU 56 Use Normalizer2 instead. 0645 */ 0646 UNormalizationMode getUMode(void) const; 0647 0648 /** 0649 * Set options that affect this <code>Normalizer</code>'s operation. 0650 * Options do not change the basic composition or decomposition operation 0651 * that is being performed, but they control whether 0652 * certain optional portions of the operation are done. 0653 * Currently the only available option is obsolete. 0654 * 0655 * It is possible to specify multiple options that are all turned on or off. 0656 * 0657 * @param option the option(s) whose value is/are to be set. 0658 * @param value the new setting for the option. Use <code>true</code> to 0659 * turn the option(s) on and <code>false</code> to turn it/them off. 0660 * 0661 * @see #getOption 0662 * @deprecated ICU 56 Use Normalizer2 instead. 0663 */ 0664 void setOption(int32_t option, 0665 UBool value); 0666 0667 /** 0668 * Determine whether an option is turned on or off. 0669 * If multiple options are specified, then the result is true if any 0670 * of them are set. 0671 * <p> 0672 * @param option the option(s) that are to be checked 0673 * @return true if any of the option(s) are set 0674 * @see #setOption 0675 * @deprecated ICU 56 Use Normalizer2 instead. 0676 */ 0677 UBool getOption(int32_t option) const; 0678 0679 /** 0680 * Set the input text over which this <code>Normalizer</code> will iterate. 0681 * The iteration position is set to the beginning. 0682 * 0683 * @param newText a string that replaces the current input text 0684 * @param status a UErrorCode 0685 * @deprecated ICU 56 Use Normalizer2 instead. 0686 */ 0687 void setText(const UnicodeString& newText, 0688 UErrorCode &status); 0689 0690 /** 0691 * Set the input text over which this <code>Normalizer</code> will iterate. 0692 * The iteration position is set to the beginning. 0693 * 0694 * @param newText a CharacterIterator object that replaces the current input text 0695 * @param status a UErrorCode 0696 * @deprecated ICU 56 Use Normalizer2 instead. 0697 */ 0698 void setText(const CharacterIterator& newText, 0699 UErrorCode &status); 0700 0701 /** 0702 * Set the input text over which this <code>Normalizer</code> will iterate. 0703 * The iteration position is set to the beginning. 0704 * 0705 * @param newText a string that replaces the current input text 0706 * @param length the length of the string, or -1 if NUL-terminated 0707 * @param status a UErrorCode 0708 * @deprecated ICU 56 Use Normalizer2 instead. 0709 */ 0710 void setText(ConstChar16Ptr newText, 0711 int32_t length, 0712 UErrorCode &status); 0713 /** 0714 * Copies the input text into the UnicodeString argument. 0715 * 0716 * @param result Receives a copy of the text under iteration. 0717 * @deprecated ICU 56 Use Normalizer2 instead. 0718 */ 0719 void getText(UnicodeString& result); 0720 0721 /** 0722 * ICU "poor man's RTTI", returns a UClassID for this class. 0723 * @returns a UClassID for this class. 0724 * @deprecated ICU 56 Use Normalizer2 instead. 0725 */ 0726 static UClassID U_EXPORT2 getStaticClassID(); 0727 #endif /* U_HIDE_DEPRECATED_API */ 0728 0729 #ifndef U_FORCE_HIDE_DEPRECATED_API 0730 /** 0731 * ICU "poor man's RTTI", returns a UClassID for the actual class. 0732 * @return a UClassID for the actual class. 0733 * @deprecated ICU 56 Use Normalizer2 instead. 0734 */ 0735 virtual UClassID getDynamicClassID() const override; 0736 #endif // U_FORCE_HIDE_DEPRECATED_API 0737 0738 private: 0739 //------------------------------------------------------------------------- 0740 // Private functions 0741 //------------------------------------------------------------------------- 0742 0743 Normalizer() = delete; // default constructor not implemented 0744 Normalizer &operator=(const Normalizer &that) = delete; // assignment operator not implemented 0745 0746 // Private utility methods for iteration 0747 // For documentation, see the source code 0748 UBool nextNormalize(); 0749 UBool previousNormalize(); 0750 0751 void init(); 0752 void clearBuffer(void); 0753 0754 //------------------------------------------------------------------------- 0755 // Private data 0756 //------------------------------------------------------------------------- 0757 0758 FilteredNormalizer2*fFilteredNorm2; // owned if not nullptr 0759 const Normalizer2 *fNorm2; // not owned; may be equal to fFilteredNorm2 0760 UNormalizationMode fUMode; // deprecated 0761 int32_t fOptions; 0762 0763 // The input text and our position in it 0764 CharacterIterator *text; 0765 0766 // The normalization buffer is the result of normalization 0767 // of the source in [currentIndex..nextIndex[ . 0768 int32_t currentIndex, nextIndex; 0769 0770 // A buffer for holding intermediate results 0771 UnicodeString buffer; 0772 int32_t bufferPos; 0773 }; 0774 0775 //------------------------------------------------------------------------- 0776 // Inline implementations 0777 //------------------------------------------------------------------------- 0778 0779 #ifndef U_HIDE_DEPRECATED_API 0780 inline bool 0781 Normalizer::operator!= (const Normalizer& other) const 0782 { return ! operator==(other); } 0783 0784 inline UNormalizationCheckResult 0785 Normalizer::quickCheck(const UnicodeString& source, 0786 UNormalizationMode mode, 0787 UErrorCode &status) { 0788 return quickCheck(source, mode, 0, status); 0789 } 0790 0791 inline UBool 0792 Normalizer::isNormalized(const UnicodeString& source, 0793 UNormalizationMode mode, 0794 UErrorCode &status) { 0795 return isNormalized(source, mode, 0, status); 0796 } 0797 #endif /* U_HIDE_DEPRECATED_API */ 0798 0799 inline int32_t 0800 Normalizer::compare(const UnicodeString &s1, const UnicodeString &s2, 0801 uint32_t options, 0802 UErrorCode &errorCode) { 0803 // all argument checking is done in unorm_compare 0804 return unorm_compare(toUCharPtr(s1.getBuffer()), s1.length(), 0805 toUCharPtr(s2.getBuffer()), s2.length(), 0806 options, 0807 &errorCode); 0808 } 0809 0810 U_NAMESPACE_END 0811 0812 #endif /* #if !UCONFIG_NO_NORMALIZATION */ 0813 0814 #endif // NORMLZR_H 0815 0816 #endif /* U_SHOW_CPLUSPLUS_API */
[ Source navigation ] | [ Diff markup ] | [ Identifier search ] | [ general search ] |
This page was automatically generated by the 2.3.7 LXR engine. The LXR team |