|
||||
Warning, file /include/unicode/usearch.h was not indexed or was modified since last indexation (in which case cross-reference links may be missing, inaccurate or erroneous).
0001 // © 2016 and later: Unicode, Inc. and others. 0002 // License & terms of use: http://www.unicode.org/copyright.html 0003 /* 0004 ********************************************************************** 0005 * Copyright (C) 2001-2011,2014 IBM and others. All rights reserved. 0006 ********************************************************************** 0007 * Date Name Description 0008 * 06/28/2001 synwee Creation. 0009 ********************************************************************** 0010 */ 0011 #ifndef USEARCH_H 0012 #define USEARCH_H 0013 0014 #include "unicode/utypes.h" 0015 0016 #if !UCONFIG_NO_COLLATION && !UCONFIG_NO_BREAK_ITERATION 0017 0018 #include "unicode/ucol.h" 0019 #include "unicode/ucoleitr.h" 0020 #include "unicode/ubrk.h" 0021 0022 #if U_SHOW_CPLUSPLUS_API 0023 #include "unicode/localpointer.h" 0024 #endif // U_SHOW_CPLUSPLUS_API 0025 0026 /** 0027 * \file 0028 * \brief C API: StringSearch 0029 * 0030 * C APIs for an engine that provides language-sensitive text searching based 0031 * on the comparison rules defined in a <code>UCollator</code> data struct, 0032 * see <code>ucol.h</code>. This ensures that language eccentricity can be 0033 * handled, e.g. for the German collator, characters ß and SS will be matched 0034 * if case is chosen to be ignored. 0035 * See the <a href="https://htmlpreview.github.io/?https://github.com/unicode-org/icu-docs/blob/main/design/collation/ICU_collation_design.htm"> 0036 * "ICU Collation Design Document"</a> for more information. 0037 * <p> 0038 * As of ICU4C 4.0 / ICU4J 53, the implementation uses a linear search. In previous versions, 0039 * a modified form of the Boyer-Moore searching algorithm was used. For more information 0040 * on the modified Boyer-Moore algorithm see 0041 * <a href="http://icu-project.org/docs/papers/efficient_text_searching_in_java.html"> 0042 * "Efficient Text Searching in Java"</a>, published in <i>Java Report</i> 0043 * in February, 1999. 0044 * <p> 0045 * There are 2 match options for selection:<br> 0046 * Let S' be the sub-string of a text string S between the offsets start and 0047 * end <start, end>. 0048 * <br> 0049 * A pattern string P matches a text string S at the offsets <start, end> 0050 * if 0051 * <pre> 0052 * option 1. Some canonical equivalent of P matches some canonical equivalent 0053 * of S' 0054 * option 2. P matches S' and if P starts or ends with a combining mark, 0055 * there exists no non-ignorable combining mark before or after S' 0056 * in S respectively. 0057 * </pre> 0058 * Option 2. will be the default. 0059 * <p> 0060 * This search has APIs similar to that of other text iteration mechanisms 0061 * such as the break iterators in <code>ubrk.h</code>. Using these 0062 * APIs, it is easy to scan through text looking for all occurrences of 0063 * a given pattern. This search iterator allows changing of direction by 0064 * calling a <code>reset</code> followed by a <code>next</code> or <code>previous</code>. 0065 * Though a direction change can occur without calling <code>reset</code> first, 0066 * this operation comes with some speed penalty. 0067 * Generally, match results in the forward direction will match the result 0068 * matches in the backwards direction in the reverse order 0069 * <p> 0070 * <code>usearch.h</code> provides APIs to specify the starting position 0071 * within the text string to be searched, e.g. <code>usearch_setOffset</code>, 0072 * <code>usearch_preceding</code> and <code>usearch_following</code>. Since the 0073 * starting position will be set as it is specified, please take note that 0074 * there are some dangerous positions which the search may render incorrect 0075 * results: 0076 * <ul> 0077 * <li> The midst of a substring that requires normalization. 0078 * <li> If the following match is to be found, the position should not be the 0079 * second character which requires to be swapped with the preceding 0080 * character. Vice versa, if the preceding match is to be found, 0081 * position to search from should not be the first character which 0082 * requires to be swapped with the next character. E.g certain Thai and 0083 * Lao characters require swapping. 0084 * <li> If a following pattern match is to be found, any position within a 0085 * contracting sequence except the first will fail. Vice versa if a 0086 * preceding pattern match is to be found, a invalid starting point 0087 * would be any character within a contracting sequence except the last. 0088 * </ul> 0089 * <p> 0090 * A breakiterator can be used if only matches at logical breaks are desired. 0091 * Using a breakiterator will only give you results that exactly matches the 0092 * boundaries given by the breakiterator. For instance the pattern "e" will 0093 * not be found in the string "\u00e9" if a character break iterator is used. 0094 * <p> 0095 * Options are provided to handle overlapping matches. 0096 * E.g. In English, overlapping matches produces the result 0 and 2 0097 * for the pattern "abab" in the text "ababab", where else mutually 0098 * exclusive matches only produce the result of 0. 0099 * <p> 0100 * Options are also provided to implement "asymmetric search" as described in 0101 * <a href="http://www.unicode.org/reports/tr10/#Asymmetric_Search"> 0102 * UTS #10 Unicode Collation Algorithm</a>, specifically the USearchAttribute 0103 * USEARCH_ELEMENT_COMPARISON and its values. 0104 * <p> 0105 * Though collator attributes will be taken into consideration while 0106 * performing matches, there are no APIs here for setting and getting the 0107 * attributes. These attributes can be set by getting the collator 0108 * from <code>usearch_getCollator</code> and using the APIs in <code>ucol.h</code>. 0109 * Lastly to update String Search to the new collator attributes, 0110 * usearch_reset() has to be called. 0111 * <p> 0112 * Restriction: <br> 0113 * Currently there are no composite characters that consists of a 0114 * character with combining class > 0 before a character with combining 0115 * class == 0. However, if such a character exists in the future, the 0116 * search mechanism does not guarantee the results for option 1. 0117 * 0118 * <p> 0119 * Example of use:<br> 0120 * <pre><code> 0121 * char *tgtstr = "The quick brown fox jumped over the lazy fox"; 0122 * char *patstr = "fox"; 0123 * UChar target[64]; 0124 * UChar pattern[16]; 0125 * UErrorCode status = U_ZERO_ERROR; 0126 * u_uastrcpy(target, tgtstr); 0127 * u_uastrcpy(pattern, patstr); 0128 * 0129 * UStringSearch *search = usearch_open(pattern, -1, target, -1, "en_US", 0130 * NULL, &status); 0131 * if (U_SUCCESS(status)) { 0132 * for (int pos = usearch_first(search, &status); 0133 * pos != USEARCH_DONE; 0134 * pos = usearch_next(search, &status)) 0135 * { 0136 * printf("Found match at %d pos, length is %d\n", pos, 0137 * usearch_getMatchedLength(search)); 0138 * } 0139 * } 0140 * 0141 * usearch_close(search); 0142 * </code></pre> 0143 * @stable ICU 2.4 0144 */ 0145 0146 /** 0147 * DONE is returned by previous() and next() after all valid matches have 0148 * been returned, and by first() and last() if there are no matches at all. 0149 * @stable ICU 2.4 0150 */ 0151 #define USEARCH_DONE -1 0152 0153 /** 0154 * Data structure for searching 0155 * @stable ICU 2.4 0156 */ 0157 struct UStringSearch; 0158 /** 0159 * Data structure for searching 0160 * @stable ICU 2.4 0161 */ 0162 typedef struct UStringSearch UStringSearch; 0163 0164 /** 0165 * @stable ICU 2.4 0166 */ 0167 typedef enum { 0168 /** 0169 * Option for overlapping matches 0170 * @stable ICU 2.4 0171 */ 0172 USEARCH_OVERLAP = 0, 0173 #ifndef U_HIDE_DEPRECATED_API 0174 /** 0175 * Option for canonical matches; option 1 in header documentation. 0176 * The default value will be USEARCH_OFF. 0177 * Note: Setting this option to USEARCH_ON currently has no effect on 0178 * search behavior, and this option is deprecated. Instead, to control 0179 * canonical match behavior, you must set UCOL_NORMALIZATION_MODE 0180 * appropriately (to UCOL_OFF or UCOL_ON) in the UCollator used by 0181 * the UStringSearch object. 0182 * @see usearch_openFromCollator 0183 * @see usearch_getCollator 0184 * @see usearch_setCollator 0185 * @see ucol_getAttribute 0186 * @deprecated ICU 53 0187 */ 0188 USEARCH_CANONICAL_MATCH = 1, 0189 #endif /* U_HIDE_DEPRECATED_API */ 0190 /** 0191 * Option to control how collation elements are compared. 0192 * The default value will be USEARCH_STANDARD_ELEMENT_COMPARISON. 0193 * @stable ICU 4.4 0194 */ 0195 USEARCH_ELEMENT_COMPARISON = 2, 0196 0197 #ifndef U_HIDE_DEPRECATED_API 0198 /** 0199 * One more than the highest normal USearchAttribute value. 0200 * @deprecated ICU 58 The numeric value may change over time, see ICU ticket #12420. 0201 */ 0202 USEARCH_ATTRIBUTE_COUNT = 3 0203 #endif /* U_HIDE_DEPRECATED_API */ 0204 } USearchAttribute; 0205 0206 /** 0207 * @stable ICU 2.4 0208 */ 0209 typedef enum { 0210 /** 0211 * Default value for any USearchAttribute 0212 * @stable ICU 2.4 0213 */ 0214 USEARCH_DEFAULT = -1, 0215 /** 0216 * Value for USEARCH_OVERLAP and USEARCH_CANONICAL_MATCH 0217 * @stable ICU 2.4 0218 */ 0219 USEARCH_OFF, 0220 /** 0221 * Value for USEARCH_OVERLAP and USEARCH_CANONICAL_MATCH 0222 * @stable ICU 2.4 0223 */ 0224 USEARCH_ON, 0225 /** 0226 * Value (default) for USEARCH_ELEMENT_COMPARISON; 0227 * standard collation element comparison at the specified collator 0228 * strength. 0229 * @stable ICU 4.4 0230 */ 0231 USEARCH_STANDARD_ELEMENT_COMPARISON, 0232 /** 0233 * Value for USEARCH_ELEMENT_COMPARISON; 0234 * collation element comparison is modified to effectively provide 0235 * behavior between the specified strength and strength - 1. Collation 0236 * elements in the pattern that have the base weight for the specified 0237 * strength are treated as "wildcards" that match an element with any 0238 * other weight at that collation level in the searched text. For 0239 * example, with a secondary-strength English collator, a plain 'e' in 0240 * the pattern will match a plain e or an e with any diacritic in the 0241 * searched text, but an e with diacritic in the pattern will only 0242 * match an e with the same diacritic in the searched text. 0243 * 0244 * This supports "asymmetric search" as described in 0245 * <a href="http://www.unicode.org/reports/tr10/#Asymmetric_Search"> 0246 * UTS #10 Unicode Collation Algorithm</a>. 0247 * 0248 * @stable ICU 4.4 0249 */ 0250 USEARCH_PATTERN_BASE_WEIGHT_IS_WILDCARD, 0251 /** 0252 * Value for USEARCH_ELEMENT_COMPARISON. 0253 * collation element comparison is modified to effectively provide 0254 * behavior between the specified strength and strength - 1. Collation 0255 * elements in either the pattern or the searched text that have the 0256 * base weight for the specified strength are treated as "wildcards" 0257 * that match an element with any other weight at that collation level. 0258 * For example, with a secondary-strength English collator, a plain 'e' 0259 * in the pattern will match a plain e or an e with any diacritic in the 0260 * searched text, but an e with diacritic in the pattern will only 0261 * match an e with the same diacritic or a plain e in the searched text. 0262 * 0263 * This option is similar to "asymmetric search" as described in 0264 * [UTS #10 Unicode Collation Algorithm](http://www.unicode.org/reports/tr10/#Asymmetric_Search), 0265 * but also allows unmarked characters in the searched text to match 0266 * marked or unmarked versions of that character in the pattern. 0267 * 0268 * @stable ICU 4.4 0269 */ 0270 USEARCH_ANY_BASE_WEIGHT_IS_WILDCARD, 0271 0272 #ifndef U_HIDE_DEPRECATED_API 0273 /** 0274 * One more than the highest normal USearchAttributeValue value. 0275 * @deprecated ICU 58 The numeric value may change over time, see ICU ticket #12420. 0276 */ 0277 USEARCH_ATTRIBUTE_VALUE_COUNT 0278 #endif /* U_HIDE_DEPRECATED_API */ 0279 } USearchAttributeValue; 0280 0281 /* open and close ------------------------------------------------------ */ 0282 0283 /** 0284 * Creates a String Search iterator data struct using the argument locale language 0285 * rule set. A collator will be created in the process, which will be owned by 0286 * this String Search and will be deleted in <code>usearch_close</code>. 0287 * 0288 * The UStringSearch retains a pointer to both the pattern and text strings. 0289 * The caller must not modify or delete them while using the UStringSearch. 0290 * 0291 * @param pattern for matching 0292 * @param patternlength length of the pattern, -1 for null-termination 0293 * @param text text string 0294 * @param textlength length of the text string, -1 for null-termination 0295 * @param locale name of locale for the rules to be used 0296 * @param breakiter A BreakIterator that will be used to restrict the points 0297 * at which matches are detected. If a match is found, but 0298 * the match's start or end index is not a boundary as 0299 * determined by the <code>BreakIterator</code>, the match will 0300 * be rejected and another will be searched for. 0301 * If this parameter is <code>NULL</code>, no break detection is 0302 * attempted. 0303 * @param status for errors if it occurs. If pattern or text is NULL, or if 0304 * patternlength or textlength is 0 then an 0305 * U_ILLEGAL_ARGUMENT_ERROR is returned. 0306 * @return search iterator data structure, or NULL if there is an error. 0307 * @stable ICU 2.4 0308 */ 0309 U_CAPI UStringSearch * U_EXPORT2 usearch_open(const UChar *pattern, 0310 int32_t patternlength, 0311 const UChar *text, 0312 int32_t textlength, 0313 const char *locale, 0314 UBreakIterator *breakiter, 0315 UErrorCode *status); 0316 0317 /** 0318 * Creates a String Search iterator data struct using the argument collator language 0319 * rule set. Note, user retains the ownership of this collator, thus the 0320 * responsibility of deletion lies with the user. 0321 0322 * NOTE: String Search cannot be instantiated from a collator that has 0323 * collate digits as numbers (CODAN) turned on (UCOL_NUMERIC_COLLATION). 0324 * 0325 * The UStringSearch retains a pointer to both the pattern and text strings. 0326 * The caller must not modify or delete them while using the UStringSearch. 0327 * 0328 * @param pattern for matching 0329 * @param patternlength length of the pattern, -1 for null-termination 0330 * @param text text string 0331 * @param textlength length of the text string, -1 for null-termination 0332 * @param collator used for the language rules 0333 * @param breakiter A BreakIterator that will be used to restrict the points 0334 * at which matches are detected. If a match is found, but 0335 * the match's start or end index is not a boundary as 0336 * determined by the <code>BreakIterator</code>, the match will 0337 * be rejected and another will be searched for. 0338 * If this parameter is <code>NULL</code>, no break detection is 0339 * attempted. 0340 * @param status for errors if it occurs. If collator, pattern or text is NULL, 0341 * or if patternlength or textlength is 0 then an 0342 * U_ILLEGAL_ARGUMENT_ERROR is returned. 0343 * @return search iterator data structure, or NULL if there is an error. 0344 * @stable ICU 2.4 0345 */ 0346 U_CAPI UStringSearch * U_EXPORT2 usearch_openFromCollator( 0347 const UChar *pattern, 0348 int32_t patternlength, 0349 const UChar *text, 0350 int32_t textlength, 0351 const UCollator *collator, 0352 UBreakIterator *breakiter, 0353 UErrorCode *status); 0354 0355 /** 0356 * Destroys and cleans up the String Search iterator data struct. 0357 * If a collator was created in <code>usearch_open</code>, then it will be destroyed here. 0358 * @param searchiter The UStringSearch to clean up 0359 * @stable ICU 2.4 0360 */ 0361 U_CAPI void U_EXPORT2 usearch_close(UStringSearch *searchiter); 0362 0363 #if U_SHOW_CPLUSPLUS_API 0364 0365 U_NAMESPACE_BEGIN 0366 0367 /** 0368 * \class LocalUStringSearchPointer 0369 * "Smart pointer" class, closes a UStringSearch via usearch_close(). 0370 * For most methods see the LocalPointerBase base class. 0371 * 0372 * @see LocalPointerBase 0373 * @see LocalPointer 0374 * @stable ICU 4.4 0375 */ 0376 U_DEFINE_LOCAL_OPEN_POINTER(LocalUStringSearchPointer, UStringSearch, usearch_close); 0377 0378 U_NAMESPACE_END 0379 0380 #endif 0381 0382 /* get and set methods -------------------------------------------------- */ 0383 0384 /** 0385 * Sets the current position in the text string which the next search will 0386 * start from. Clears previous states. 0387 * This method takes the argument index and sets the position in the text 0388 * string accordingly without checking if the index is pointing to a 0389 * valid starting point to begin searching. 0390 * Search positions that may render incorrect results are highlighted in the 0391 * header comments 0392 * @param strsrch search iterator data struct 0393 * @param position position to start next search from. If position is less 0394 * than or greater than the text range for searching, 0395 * an U_INDEX_OUTOFBOUNDS_ERROR will be returned 0396 * @param status error status if any. 0397 * @stable ICU 2.4 0398 */ 0399 U_CAPI void U_EXPORT2 usearch_setOffset(UStringSearch *strsrch, 0400 int32_t position, 0401 UErrorCode *status); 0402 0403 /** 0404 * Return the current index in the string text being searched. 0405 * If the iteration has gone past the end of the text (or past the beginning 0406 * for a backwards search), <code>USEARCH_DONE</code> is returned. 0407 * @param strsrch search iterator data struct 0408 * @see #USEARCH_DONE 0409 * @stable ICU 2.4 0410 */ 0411 U_CAPI int32_t U_EXPORT2 usearch_getOffset(const UStringSearch *strsrch); 0412 0413 /** 0414 * Sets the text searching attributes located in the enum USearchAttribute 0415 * with values from the enum USearchAttributeValue. 0416 * <code>USEARCH_DEFAULT</code> can be used for all attributes for resetting. 0417 * @param strsrch search iterator data struct 0418 * @param attribute text attribute to be set 0419 * @param value text attribute value 0420 * @param status for errors if it occurs 0421 * @see #usearch_getAttribute 0422 * @stable ICU 2.4 0423 */ 0424 U_CAPI void U_EXPORT2 usearch_setAttribute(UStringSearch *strsrch, 0425 USearchAttribute attribute, 0426 USearchAttributeValue value, 0427 UErrorCode *status); 0428 0429 /** 0430 * Gets the text searching attributes. 0431 * @param strsrch search iterator data struct 0432 * @param attribute text attribute to be retrieve 0433 * @return text attribute value 0434 * @see #usearch_setAttribute 0435 * @stable ICU 2.4 0436 */ 0437 U_CAPI USearchAttributeValue U_EXPORT2 usearch_getAttribute( 0438 const UStringSearch *strsrch, 0439 USearchAttribute attribute); 0440 0441 /** 0442 * Returns the index to the match in the text string that was searched. 0443 * This call returns a valid result only after a successful call to 0444 * <code>usearch_first</code>, <code>usearch_next</code>, <code>usearch_previous</code>, 0445 * or <code>usearch_last</code>. 0446 * Just after construction, or after a searching method returns 0447 * <code>USEARCH_DONE</code>, this method will return <code>USEARCH_DONE</code>. 0448 * <p> 0449 * Use <code>usearch_getMatchedLength</code> to get the matched string length. 0450 * @param strsrch search iterator data struct 0451 * @return index to a substring within the text string that is being 0452 * searched. 0453 * @see #usearch_first 0454 * @see #usearch_next 0455 * @see #usearch_previous 0456 * @see #usearch_last 0457 * @see #USEARCH_DONE 0458 * @stable ICU 2.4 0459 */ 0460 U_CAPI int32_t U_EXPORT2 usearch_getMatchedStart( 0461 const UStringSearch *strsrch); 0462 0463 /** 0464 * Returns the length of text in the string which matches the search pattern. 0465 * This call returns a valid result only after a successful call to 0466 * <code>usearch_first</code>, <code>usearch_next</code>, <code>usearch_previous</code>, 0467 * or <code>usearch_last</code>. 0468 * Just after construction, or after a searching method returns 0469 * <code>USEARCH_DONE</code>, this method will return 0. 0470 * @param strsrch search iterator data struct 0471 * @return The length of the match in the string text, or 0 if there is no 0472 * match currently. 0473 * @see #usearch_first 0474 * @see #usearch_next 0475 * @see #usearch_previous 0476 * @see #usearch_last 0477 * @see #USEARCH_DONE 0478 * @stable ICU 2.4 0479 */ 0480 U_CAPI int32_t U_EXPORT2 usearch_getMatchedLength( 0481 const UStringSearch *strsrch); 0482 0483 /** 0484 * Returns the text that was matched by the most recent call to 0485 * <code>usearch_first</code>, <code>usearch_next</code>, <code>usearch_previous</code>, 0486 * or <code>usearch_last</code>. 0487 * If the iterator is not pointing at a valid match (e.g. just after 0488 * construction or after <code>USEARCH_DONE</code> has been returned, returns 0489 * an empty string. If result is not large enough to store the matched text, 0490 * result will be filled with the partial text and an U_BUFFER_OVERFLOW_ERROR 0491 * will be returned in status. result will be null-terminated whenever 0492 * possible. If the buffer fits the matched text exactly, a null-termination 0493 * is not possible, then a U_STRING_NOT_TERMINATED_ERROR set in status. 0494 * Pre-flighting can be either done with length = 0 or the API 0495 * <code>usearch_getMatchedLength</code>. 0496 * @param strsrch search iterator data struct 0497 * @param result UChar buffer to store the matched string 0498 * @param resultCapacity length of the result buffer 0499 * @param status error returned if result is not large enough 0500 * @return exact length of the matched text, not counting the null-termination 0501 * @see #usearch_first 0502 * @see #usearch_next 0503 * @see #usearch_previous 0504 * @see #usearch_last 0505 * @see #USEARCH_DONE 0506 * @stable ICU 2.4 0507 */ 0508 U_CAPI int32_t U_EXPORT2 usearch_getMatchedText(const UStringSearch *strsrch, 0509 UChar *result, 0510 int32_t resultCapacity, 0511 UErrorCode *status); 0512 0513 #if !UCONFIG_NO_BREAK_ITERATION 0514 0515 /** 0516 * Set the BreakIterator that will be used to restrict the points at which 0517 * matches are detected. 0518 * @param strsrch search iterator data struct 0519 * @param breakiter A BreakIterator that will be used to restrict the points 0520 * at which matches are detected. If a match is found, but 0521 * the match's start or end index is not a boundary as 0522 * determined by the <code>BreakIterator</code>, the match will 0523 * be rejected and another will be searched for. 0524 * If this parameter is <code>NULL</code>, no break detection is 0525 * attempted. 0526 * @param status for errors if it occurs 0527 * @see #usearch_getBreakIterator 0528 * @stable ICU 2.4 0529 */ 0530 U_CAPI void U_EXPORT2 usearch_setBreakIterator(UStringSearch *strsrch, 0531 UBreakIterator *breakiter, 0532 UErrorCode *status); 0533 0534 /** 0535 * Returns the BreakIterator that is used to restrict the points at which 0536 * matches are detected. This will be the same object that was passed to the 0537 * constructor or to <code>usearch_setBreakIterator</code>. Note that 0538 * <code>NULL</code> 0539 * is a legal value; it means that break detection should not be attempted. 0540 * @param strsrch search iterator data struct 0541 * @return break iterator used 0542 * @see #usearch_setBreakIterator 0543 * @stable ICU 2.4 0544 */ 0545 U_CAPI const UBreakIterator * U_EXPORT2 usearch_getBreakIterator( 0546 const UStringSearch *strsrch); 0547 0548 #endif 0549 0550 /** 0551 * Set the string text to be searched. Text iteration will hence begin at the 0552 * start of the text string. This method is useful if you want to re-use an 0553 * iterator to search for the same pattern within a different body of text. 0554 * 0555 * The UStringSearch retains a pointer to the text string. The caller must not 0556 * modify or delete the string while using the UStringSearch. 0557 * 0558 * @param strsrch search iterator data struct 0559 * @param text new string to look for match 0560 * @param textlength length of the new string, -1 for null-termination 0561 * @param status for errors if it occurs. If text is NULL, or textlength is 0 0562 * then an U_ILLEGAL_ARGUMENT_ERROR is returned with no change 0563 * done to strsrch. 0564 * @see #usearch_getText 0565 * @stable ICU 2.4 0566 */ 0567 U_CAPI void U_EXPORT2 usearch_setText( UStringSearch *strsrch, 0568 const UChar *text, 0569 int32_t textlength, 0570 UErrorCode *status); 0571 0572 /** 0573 * Return the string text to be searched. 0574 * @param strsrch search iterator data struct 0575 * @param length returned string text length 0576 * @return string text 0577 * @see #usearch_setText 0578 * @stable ICU 2.4 0579 */ 0580 U_CAPI const UChar * U_EXPORT2 usearch_getText(const UStringSearch *strsrch, 0581 int32_t *length); 0582 0583 /** 0584 * Gets the collator used for the language rules. 0585 * <p> 0586 * Deleting the returned <code>UCollator</code> before calling 0587 * <code>usearch_close</code> would cause the string search to fail. 0588 * <code>usearch_close</code> will delete the collator if this search owns it. 0589 * @param strsrch search iterator data struct 0590 * @return collator 0591 * @stable ICU 2.4 0592 */ 0593 U_CAPI UCollator * U_EXPORT2 usearch_getCollator( 0594 const UStringSearch *strsrch); 0595 0596 /** 0597 * Sets the collator used for the language rules. User retains the ownership 0598 * of this collator, thus the responsibility of deletion lies with the user. 0599 * This method causes internal data such as the pattern collation elements 0600 * and shift tables to be recalculated, but the iterator's position is unchanged. 0601 * @param strsrch search iterator data struct 0602 * @param collator to be used 0603 * @param status for errors if it occurs 0604 * @stable ICU 2.4 0605 */ 0606 U_CAPI void U_EXPORT2 usearch_setCollator( UStringSearch *strsrch, 0607 const UCollator *collator, 0608 UErrorCode *status); 0609 0610 /** 0611 * Sets the pattern used for matching. 0612 * Internal data like the pattern collation elements will be recalculated, but the 0613 * iterator's position is unchanged. 0614 * 0615 * The UStringSearch retains a pointer to the pattern string. The caller must not 0616 * modify or delete the string while using the UStringSearch. 0617 * 0618 * @param strsrch search iterator data struct 0619 * @param pattern string 0620 * @param patternlength pattern length, -1 for null-terminated string 0621 * @param status for errors if it occurs. If text is NULL, or textlength is 0 0622 * then an U_ILLEGAL_ARGUMENT_ERROR is returned with no change 0623 * done to strsrch. 0624 * @stable ICU 2.4 0625 */ 0626 U_CAPI void U_EXPORT2 usearch_setPattern( UStringSearch *strsrch, 0627 const UChar *pattern, 0628 int32_t patternlength, 0629 UErrorCode *status); 0630 0631 /** 0632 * Gets the search pattern 0633 * @param strsrch search iterator data struct 0634 * @param length return length of the pattern, -1 indicates that the pattern 0635 * is null-terminated 0636 * @return pattern string 0637 * @stable ICU 2.4 0638 */ 0639 U_CAPI const UChar * U_EXPORT2 usearch_getPattern( 0640 const UStringSearch *strsrch, 0641 int32_t *length); 0642 0643 /* methods ------------------------------------------------------------- */ 0644 0645 /** 0646 * Returns the first index at which the string text matches the search 0647 * pattern. 0648 * The iterator is adjusted so that its current index (as returned by 0649 * <code>usearch_getOffset</code>) is the match position if one was found. 0650 * If a match is not found, <code>USEARCH_DONE</code> will be returned and 0651 * the iterator will be adjusted to the index <code>USEARCH_DONE</code>. 0652 * @param strsrch search iterator data struct 0653 * @param status for errors if it occurs 0654 * @return The character index of the first match, or 0655 * <code>USEARCH_DONE</code> if there are no matches. 0656 * @see #usearch_getOffset 0657 * @see #USEARCH_DONE 0658 * @stable ICU 2.4 0659 */ 0660 U_CAPI int32_t U_EXPORT2 usearch_first(UStringSearch *strsrch, 0661 UErrorCode *status); 0662 0663 /** 0664 * Returns the first index equal or greater than <code>position</code> at which 0665 * the string text 0666 * matches the search pattern. The iterator is adjusted so that its current 0667 * index (as returned by <code>usearch_getOffset</code>) is the match position if 0668 * one was found. 0669 * If a match is not found, <code>USEARCH_DONE</code> will be returned and 0670 * the iterator will be adjusted to the index <code>USEARCH_DONE</code> 0671 * <p> 0672 * Search positions that may render incorrect results are highlighted in the 0673 * header comments. If position is less than or greater than the text range 0674 * for searching, an U_INDEX_OUTOFBOUNDS_ERROR will be returned 0675 * @param strsrch search iterator data struct 0676 * @param position to start the search at 0677 * @param status for errors if it occurs 0678 * @return The character index of the first match following <code>pos</code>, 0679 * or <code>USEARCH_DONE</code> if there are no matches. 0680 * @see #usearch_getOffset 0681 * @see #USEARCH_DONE 0682 * @stable ICU 2.4 0683 */ 0684 U_CAPI int32_t U_EXPORT2 usearch_following(UStringSearch *strsrch, 0685 int32_t position, 0686 UErrorCode *status); 0687 0688 /** 0689 * Returns the last index in the target text at which it matches the search 0690 * pattern. The iterator is adjusted so that its current 0691 * index (as returned by <code>usearch_getOffset</code>) is the match position if 0692 * one was found. 0693 * If a match is not found, <code>USEARCH_DONE</code> will be returned and 0694 * the iterator will be adjusted to the index <code>USEARCH_DONE</code>. 0695 * @param strsrch search iterator data struct 0696 * @param status for errors if it occurs 0697 * @return The index of the first match, or <code>USEARCH_DONE</code> if there 0698 * are no matches. 0699 * @see #usearch_getOffset 0700 * @see #USEARCH_DONE 0701 * @stable ICU 2.4 0702 */ 0703 U_CAPI int32_t U_EXPORT2 usearch_last(UStringSearch *strsrch, 0704 UErrorCode *status); 0705 0706 /** 0707 * Returns the first index less than <code>position</code> at which the string text 0708 * matches the search pattern. The iterator is adjusted so that its current 0709 * index (as returned by <code>usearch_getOffset</code>) is the match position if 0710 * one was found. 0711 * If a match is not found, <code>USEARCH_DONE</code> will be returned and 0712 * the iterator will be adjusted to the index <code>USEARCH_DONE</code> 0713 * <p> 0714 * Search positions that may render incorrect results are highlighted in the 0715 * header comments. If position is less than or greater than the text range 0716 * for searching, an U_INDEX_OUTOFBOUNDS_ERROR will be returned. 0717 * <p> 0718 * When <code>USEARCH_OVERLAP</code> option is off, the last index of the 0719 * result match is always less than <code>position</code>. 0720 * When <code>USERARCH_OVERLAP</code> is on, the result match may span across 0721 * <code>position</code>. 0722 * @param strsrch search iterator data struct 0723 * @param position index position the search is to begin at 0724 * @param status for errors if it occurs 0725 * @return The character index of the first match preceding <code>pos</code>, 0726 * or <code>USEARCH_DONE</code> if there are no matches. 0727 * @see #usearch_getOffset 0728 * @see #USEARCH_DONE 0729 * @stable ICU 2.4 0730 */ 0731 U_CAPI int32_t U_EXPORT2 usearch_preceding(UStringSearch *strsrch, 0732 int32_t position, 0733 UErrorCode *status); 0734 0735 /** 0736 * Returns the index of the next point at which the string text matches the 0737 * search pattern, starting from the current position. 0738 * The iterator is adjusted so that its current 0739 * index (as returned by <code>usearch_getOffset</code>) is the match position if 0740 * one was found. 0741 * If a match is not found, <code>USEARCH_DONE</code> will be returned and 0742 * the iterator will be adjusted to the index <code>USEARCH_DONE</code> 0743 * @param strsrch search iterator data struct 0744 * @param status for errors if it occurs 0745 * @return The index of the next match after the current position, or 0746 * <code>USEARCH_DONE</code> if there are no more matches. 0747 * @see #usearch_first 0748 * @see #usearch_getOffset 0749 * @see #USEARCH_DONE 0750 * @stable ICU 2.4 0751 */ 0752 U_CAPI int32_t U_EXPORT2 usearch_next(UStringSearch *strsrch, 0753 UErrorCode *status); 0754 0755 /** 0756 * Returns the index of the previous point at which the string text matches 0757 * the search pattern, starting at the current position. 0758 * The iterator is adjusted so that its current 0759 * index (as returned by <code>usearch_getOffset</code>) is the match position if 0760 * one was found. 0761 * If a match is not found, <code>USEARCH_DONE</code> will be returned and 0762 * the iterator will be adjusted to the index <code>USEARCH_DONE</code> 0763 * @param strsrch search iterator data struct 0764 * @param status for errors if it occurs 0765 * @return The index of the previous match before the current position, 0766 * or <code>USEARCH_DONE</code> if there are no more matches. 0767 * @see #usearch_last 0768 * @see #usearch_getOffset 0769 * @see #USEARCH_DONE 0770 * @stable ICU 2.4 0771 */ 0772 U_CAPI int32_t U_EXPORT2 usearch_previous(UStringSearch *strsrch, 0773 UErrorCode *status); 0774 0775 /** 0776 * Reset the iteration. 0777 * Search will begin at the start of the text string if a forward iteration 0778 * is initiated before a backwards iteration. Otherwise if a backwards 0779 * iteration is initiated before a forwards iteration, the search will begin 0780 * at the end of the text string. 0781 * @param strsrch search iterator data struct 0782 * @see #usearch_first 0783 * @stable ICU 2.4 0784 */ 0785 U_CAPI void U_EXPORT2 usearch_reset(UStringSearch *strsrch); 0786 0787 #ifndef U_HIDE_INTERNAL_API 0788 /** 0789 * Simple forward search for the pattern, starting at a specified index, 0790 * and using a default set search options. 0791 * 0792 * This is an experimental function, and is not an official part of the 0793 * ICU API. 0794 * 0795 * The collator options, such as UCOL_STRENGTH and UCOL_NORMALIZTION, are honored. 0796 * 0797 * The UStringSearch options USEARCH_CANONICAL_MATCH, USEARCH_OVERLAP and 0798 * any Break Iterator are ignored. 0799 * 0800 * Matches obey the following constraints: 0801 * 0802 * Characters at the start or end positions of a match that are ignorable 0803 * for collation are not included as part of the match, unless they 0804 * are part of a combining sequence, as described below. 0805 * 0806 * A match will not include a partial combining sequence. Combining 0807 * character sequences are considered to be inseparable units, 0808 * and either match the pattern completely, or are considered to not match 0809 * at all. Thus, for example, an A followed a combining accent mark will 0810 * not be found when searching for a plain (unaccented) A. (unless 0811 * the collation strength has been set to ignore all accents). 0812 * 0813 * When beginning a search, the initial starting position, startIdx, 0814 * is assumed to be an acceptable match boundary with respect to 0815 * combining characters. A combining sequence that spans across the 0816 * starting point will not suppress a match beginning at startIdx. 0817 * 0818 * Characters that expand to multiple collation elements 0819 * (German sharp-S becoming 'ss', or the composed forms of accented 0820 * characters, for example) also must match completely. 0821 * Searching for a single 's' in a string containing only a sharp-s will 0822 * find no match. 0823 * 0824 * 0825 * @param strsrch the UStringSearch struct, which references both 0826 * the text to be searched and the pattern being sought. 0827 * @param startIdx The index into the text to begin the search. 0828 * @param matchStart An out parameter, the starting index of the matched text. 0829 * This parameter may be NULL. 0830 * A value of -1 will be returned if no match was found. 0831 * @param matchLimit Out parameter, the index of the first position following the matched text. 0832 * The matchLimit will be at a suitable position for beginning a subsequent search 0833 * in the input text. 0834 * This parameter may be NULL. 0835 * A value of -1 will be returned if no match was found. 0836 * 0837 * @param status Report any errors. Note that no match found is not an error. 0838 * @return true if a match was found, false otherwise. 0839 * 0840 * @internal 0841 */ 0842 U_CAPI UBool U_EXPORT2 usearch_search(UStringSearch *strsrch, 0843 int32_t startIdx, 0844 int32_t *matchStart, 0845 int32_t *matchLimit, 0846 UErrorCode *status); 0847 0848 /** 0849 * Simple backwards search for the pattern, starting at a specified index, 0850 * and using using a default set search options. 0851 * 0852 * This is an experimental function, and is not an official part of the 0853 * ICU API. 0854 * 0855 * The collator options, such as UCOL_STRENGTH and UCOL_NORMALIZTION, are honored. 0856 * 0857 * The UStringSearch options USEARCH_CANONICAL_MATCH, USEARCH_OVERLAP and 0858 * any Break Iterator are ignored. 0859 * 0860 * Matches obey the following constraints: 0861 * 0862 * Characters at the start or end positions of a match that are ignorable 0863 * for collation are not included as part of the match, unless they 0864 * are part of a combining sequence, as described below. 0865 * 0866 * A match will not include a partial combining sequence. Combining 0867 * character sequences are considered to be inseparable units, 0868 * and either match the pattern completely, or are considered to not match 0869 * at all. Thus, for example, an A followed a combining accent mark will 0870 * not be found when searching for a plain (unaccented) A. (unless 0871 * the collation strength has been set to ignore all accents). 0872 * 0873 * When beginning a search, the initial starting position, startIdx, 0874 * is assumed to be an acceptable match boundary with respect to 0875 * combining characters. A combining sequence that spans across the 0876 * starting point will not suppress a match beginning at startIdx. 0877 * 0878 * Characters that expand to multiple collation elements 0879 * (German sharp-S becoming 'ss', or the composed forms of accented 0880 * characters, for example) also must match completely. 0881 * Searching for a single 's' in a string containing only a sharp-s will 0882 * find no match. 0883 * 0884 * 0885 * @param strsrch the UStringSearch struct, which references both 0886 * the text to be searched and the pattern being sought. 0887 * @param startIdx The index into the text to begin the search. 0888 * @param matchStart An out parameter, the starting index of the matched text. 0889 * This parameter may be NULL. 0890 * A value of -1 will be returned if no match was found. 0891 * @param matchLimit Out parameter, the index of the first position following the matched text. 0892 * The matchLimit will be at a suitable position for beginning a subsequent search 0893 * in the input text. 0894 * This parameter may be NULL. 0895 * A value of -1 will be returned if no match was found. 0896 * 0897 * @param status Report any errors. Note that no match found is not an error. 0898 * @return true if a match was found, false otherwise. 0899 * 0900 * @internal 0901 */ 0902 U_CAPI UBool U_EXPORT2 usearch_searchBackwards(UStringSearch *strsrch, 0903 int32_t startIdx, 0904 int32_t *matchStart, 0905 int32_t *matchLimit, 0906 UErrorCode *status); 0907 #endif /* U_HIDE_INTERNAL_API */ 0908 0909 #endif /* #if !UCONFIG_NO_COLLATION && !UCONFIG_NO_BREAK_ITERATION */ 0910 0911 #endif
[ Source navigation ] | [ Diff markup ] | [ Identifier search ] | [ general search ] |
This page was automatically generated by the 2.3.7 LXR engine. The LXR team |