Back to home page

EIC code displayed by LXR

 
 

    


Warning, file /include/unicode/usearch.h was not indexed or was modified since last indexation (in which case cross-reference links may be missing, inaccurate or erroneous).

0001 // © 2016 and later: Unicode, Inc. and others.
0002 // License & terms of use: http://www.unicode.org/copyright.html
0003 /*
0004 **********************************************************************
0005 *   Copyright (C) 2001-2011,2014 IBM and others. All rights reserved.
0006 **********************************************************************
0007 *   Date        Name        Description
0008 *  06/28/2001   synwee      Creation.
0009 **********************************************************************
0010 */
0011 #ifndef USEARCH_H
0012 #define USEARCH_H
0013 
0014 #include "unicode/utypes.h"
0015 
0016 #if !UCONFIG_NO_COLLATION && !UCONFIG_NO_BREAK_ITERATION
0017 
0018 #include "unicode/ucol.h"
0019 #include "unicode/ucoleitr.h"
0020 #include "unicode/ubrk.h"
0021 
0022 #if U_SHOW_CPLUSPLUS_API
0023 #include "unicode/localpointer.h"
0024 #endif   // U_SHOW_CPLUSPLUS_API
0025 
0026 /**
0027  * \file
0028  * \brief C API: StringSearch
0029  *
0030  * C APIs for an engine that provides language-sensitive text searching based 
0031  * on the comparison rules defined in a <code>UCollator</code> data struct,
0032  * see <code>ucol.h</code>. This ensures that language eccentricity can be 
0033  * handled, e.g. for the German collator, characters &szlig; and SS will be matched 
0034  * if case is chosen to be ignored. 
0035  * See the <a href="https://htmlpreview.github.io/?https://github.com/unicode-org/icu-docs/blob/main/design/collation/ICU_collation_design.htm">
0036  * "ICU Collation Design Document"</a> for more information.
0037  * <p> 
0038  * As of ICU4C 4.0 / ICU4J 53, the implementation uses a linear search. In previous versions,
0039  * a modified form of the Boyer-Moore searching algorithm was used. For more information
0040  * on the modified Boyer-Moore algorithm see
0041  * <a href="http://icu-project.org/docs/papers/efficient_text_searching_in_java.html">
0042  * "Efficient Text Searching in Java"</a>, published in <i>Java Report</i> 
0043  * in February, 1999.
0044  * <p>
0045  * There are 2 match options for selection:<br>
0046  * Let S' be the sub-string of a text string S between the offsets start and 
0047  * end <start, end>.
0048  * <br>
0049  * A pattern string P matches a text string S at the offsets <start, end> 
0050  * if
0051  * <pre> 
0052  * option 1. Some canonical equivalent of P matches some canonical equivalent 
0053  *           of S'
0054  * option 2. P matches S' and if P starts or ends with a combining mark, 
0055  *           there exists no non-ignorable combining mark before or after S' 
0056  *           in S respectively. 
0057  * </pre>
0058  * Option 2. will be the default.
0059  * <p>
0060  * This search has APIs similar to that of other text iteration mechanisms 
0061  * such as the break iterators in <code>ubrk.h</code>. Using these 
0062  * APIs, it is easy to scan through text looking for all occurrences of 
0063  * a given pattern. This search iterator allows changing of direction by 
0064  * calling a <code>reset</code> followed by a <code>next</code> or <code>previous</code>. 
0065  * Though a direction change can occur without calling <code>reset</code> first,  
0066  * this operation comes with some speed penalty.
0067  * Generally, match results in the forward direction will match the result 
0068  * matches in the backwards direction in the reverse order
0069  * <p>
0070  * <code>usearch.h</code> provides APIs to specify the starting position 
0071  * within the text string to be searched, e.g. <code>usearch_setOffset</code>,
0072  * <code>usearch_preceding</code> and <code>usearch_following</code>. Since the 
0073  * starting position will be set as it is specified, please take note that 
0074  * there are some dangerous positions which the search may render incorrect 
0075  * results:
0076  * <ul>
0077  * <li> The midst of a substring that requires normalization.
0078  * <li> If the following match is to be found, the position should not be the
0079  *      second character which requires to be swapped with the preceding 
0080  *      character. Vice versa, if the preceding match is to be found, 
0081  *      position to search from should not be the first character which 
0082  *      requires to be swapped with the next character. E.g certain Thai and
0083  *      Lao characters require swapping.
0084  * <li> If a following pattern match is to be found, any position within a 
0085  *      contracting sequence except the first will fail. Vice versa if a 
0086  *      preceding pattern match is to be found, a invalid starting point 
0087  *      would be any character within a contracting sequence except the last.
0088  * </ul>
0089  * <p>
0090  * A breakiterator can be used if only matches at logical breaks are desired.
0091  * Using a breakiterator will only give you results that exactly matches the
0092  * boundaries given by the breakiterator. For instance the pattern "e" will
0093  * not be found in the string "\u00e9" if a character break iterator is used.
0094  * <p>
0095  * Options are provided to handle overlapping matches. 
0096  * E.g. In English, overlapping matches produces the result 0 and 2 
0097  * for the pattern "abab" in the text "ababab", where else mutually 
0098  * exclusive matches only produce the result of 0.
0099  * <p>
0100  * Options are also provided to implement "asymmetric search" as described in
0101  * <a href="http://www.unicode.org/reports/tr10/#Asymmetric_Search">
0102  * UTS #10 Unicode Collation Algorithm</a>, specifically the USearchAttribute
0103  * USEARCH_ELEMENT_COMPARISON and its values.
0104  * <p>
0105  * Though collator attributes will be taken into consideration while 
0106  * performing matches, there are no APIs here for setting and getting the 
0107  * attributes. These attributes can be set by getting the collator
0108  * from <code>usearch_getCollator</code> and using the APIs in <code>ucol.h</code>.
0109  * Lastly to update String Search to the new collator attributes, 
0110  * usearch_reset() has to be called.
0111  * <p> 
0112  * Restriction: <br>
0113  * Currently there are no composite characters that consists of a
0114  * character with combining class > 0 before a character with combining 
0115  * class == 0. However, if such a character exists in the future, the 
0116  * search mechanism does not guarantee the results for option 1.
0117  * 
0118  * <p>
0119  * Example of use:<br>
0120  * <pre><code>
0121  * char *tgtstr = "The quick brown fox jumped over the lazy fox";
0122  * char *patstr = "fox";
0123  * UChar target[64];
0124  * UChar pattern[16];
0125  * UErrorCode status = U_ZERO_ERROR;
0126  * u_uastrcpy(target, tgtstr);
0127  * u_uastrcpy(pattern, patstr);
0128  *
0129  * UStringSearch *search = usearch_open(pattern, -1, target, -1, "en_US", 
0130  *                                  NULL, &status);
0131  * if (U_SUCCESS(status)) {
0132  *     for (int pos = usearch_first(search, &status); 
0133  *          pos != USEARCH_DONE; 
0134  *          pos = usearch_next(search, &status))
0135  *     {
0136  *         printf("Found match at %d pos, length is %d\n", pos, 
0137  *                                        usearch_getMatchedLength(search));
0138  *     }
0139  * }
0140  *
0141  * usearch_close(search);
0142  * </code></pre>
0143  * @stable ICU 2.4
0144  */
0145 
0146 /**
0147 * DONE is returned by previous() and next() after all valid matches have 
0148 * been returned, and by first() and last() if there are no matches at all.
0149 * @stable ICU 2.4
0150 */
0151 #define USEARCH_DONE -1
0152 
0153 /**
0154 * Data structure for searching
0155 * @stable ICU 2.4
0156 */
0157 struct UStringSearch;
0158 /**
0159 * Data structure for searching
0160 * @stable ICU 2.4
0161 */
0162 typedef struct UStringSearch UStringSearch;
0163 
0164 /**
0165 * @stable ICU 2.4
0166 */
0167 typedef enum {
0168     /**
0169      * Option for overlapping matches
0170      * @stable ICU 2.4
0171      */
0172     USEARCH_OVERLAP = 0,
0173 #ifndef U_HIDE_DEPRECATED_API
0174     /** 
0175      * Option for canonical matches; option 1 in header documentation.
0176      * The default value will be USEARCH_OFF.
0177      * Note: Setting this option to USEARCH_ON currently has no effect on
0178      * search behavior, and this option is deprecated. Instead, to control
0179      * canonical match behavior, you must set UCOL_NORMALIZATION_MODE
0180      * appropriately (to UCOL_OFF or UCOL_ON) in the UCollator used by
0181      * the UStringSearch object.
0182      * @see usearch_openFromCollator 
0183      * @see usearch_getCollator
0184      * @see usearch_setCollator
0185      * @see ucol_getAttribute
0186      * @deprecated ICU 53
0187      */
0188     USEARCH_CANONICAL_MATCH = 1,
0189 #endif  /* U_HIDE_DEPRECATED_API */
0190     /** 
0191      * Option to control how collation elements are compared.
0192      * The default value will be USEARCH_STANDARD_ELEMENT_COMPARISON.
0193      * @stable ICU 4.4
0194      */
0195     USEARCH_ELEMENT_COMPARISON = 2,
0196 
0197 #ifndef U_HIDE_DEPRECATED_API
0198     /**
0199      * One more than the highest normal USearchAttribute value.
0200      * @deprecated ICU 58 The numeric value may change over time, see ICU ticket #12420.
0201      */
0202     USEARCH_ATTRIBUTE_COUNT = 3
0203 #endif  /* U_HIDE_DEPRECATED_API */
0204 } USearchAttribute;
0205 
0206 /**
0207 * @stable ICU 2.4
0208 */
0209 typedef enum {
0210     /** 
0211      * Default value for any USearchAttribute
0212      * @stable ICU 2.4
0213      */
0214     USEARCH_DEFAULT = -1,
0215     /**
0216      * Value for USEARCH_OVERLAP and USEARCH_CANONICAL_MATCH
0217      * @stable ICU 2.4
0218      */
0219     USEARCH_OFF, 
0220     /**
0221      * Value for USEARCH_OVERLAP and USEARCH_CANONICAL_MATCH
0222      * @stable ICU 2.4
0223      */
0224     USEARCH_ON,
0225     /** 
0226      * Value (default) for USEARCH_ELEMENT_COMPARISON;
0227      * standard collation element comparison at the specified collator
0228      * strength.
0229      * @stable ICU 4.4
0230      */
0231     USEARCH_STANDARD_ELEMENT_COMPARISON,
0232     /** 
0233      * Value for USEARCH_ELEMENT_COMPARISON;
0234      * collation element comparison is modified to effectively provide
0235      * behavior between the specified strength and strength - 1. Collation
0236      * elements in the pattern that have the base weight for the specified
0237      * strength are treated as "wildcards" that match an element with any
0238      * other weight at that collation level in the searched text. For
0239      * example, with a secondary-strength English collator, a plain 'e' in
0240      * the pattern will match a plain e or an e with any diacritic in the
0241      * searched text, but an e with diacritic in the pattern will only
0242      * match an e with the same diacritic in the searched text.
0243      *
0244      * This supports "asymmetric search" as described in
0245      * <a href="http://www.unicode.org/reports/tr10/#Asymmetric_Search">
0246      * UTS #10 Unicode Collation Algorithm</a>.
0247      *
0248      * @stable ICU 4.4
0249      */
0250     USEARCH_PATTERN_BASE_WEIGHT_IS_WILDCARD,
0251     /** 
0252      * Value for USEARCH_ELEMENT_COMPARISON.
0253      * collation element comparison is modified to effectively provide
0254      * behavior between the specified strength and strength - 1. Collation
0255      * elements in either the pattern or the searched text that have the
0256      * base weight for the specified strength are treated as "wildcards"
0257      * that match an element with any other weight at that collation level.
0258      * For example, with a secondary-strength English collator, a plain 'e'
0259      * in the pattern will match a plain e or an e with any diacritic in the
0260      * searched text, but an e with diacritic in the pattern will only
0261      * match an e with the same diacritic or a plain e in the searched text.
0262      *
0263      * This option is similar to "asymmetric search" as described in
0264      * [UTS #10 Unicode Collation Algorithm](http://www.unicode.org/reports/tr10/#Asymmetric_Search),
0265      * but also allows unmarked characters in the searched text to match
0266      * marked or unmarked versions of that character in the pattern.
0267      *
0268      * @stable ICU 4.4
0269      */
0270     USEARCH_ANY_BASE_WEIGHT_IS_WILDCARD,
0271 
0272 #ifndef U_HIDE_DEPRECATED_API
0273     /**
0274      * One more than the highest normal USearchAttributeValue value.
0275      * @deprecated ICU 58 The numeric value may change over time, see ICU ticket #12420.
0276      */
0277     USEARCH_ATTRIBUTE_VALUE_COUNT
0278 #endif  /* U_HIDE_DEPRECATED_API */
0279 } USearchAttributeValue;
0280 
0281 /* open and close ------------------------------------------------------ */
0282 
0283 /**
0284 * Creates a String Search iterator data struct using the argument locale language
0285 * rule set. A collator will be created in the process, which will be owned by
0286 * this String Search and will be deleted in <code>usearch_close</code>.
0287 *
0288 * The UStringSearch retains a pointer to both the pattern and text strings.
0289 * The caller must not modify or delete them while using the UStringSearch.
0290 *
0291 * @param pattern for matching
0292 * @param patternlength length of the pattern, -1 for null-termination
0293 * @param text text string
0294 * @param textlength length of the text string, -1 for null-termination
0295 * @param locale name of locale for the rules to be used
0296 * @param breakiter A BreakIterator that will be used to restrict the points
0297 *                  at which matches are detected. If a match is found, but 
0298 *                  the match's start or end index is not a boundary as 
0299 *                  determined by the <code>BreakIterator</code>, the match will 
0300 *                  be rejected and another will be searched for. 
0301 *                  If this parameter is <code>NULL</code>, no break detection is 
0302 *                  attempted.
0303 * @param status for errors if it occurs. If pattern or text is NULL, or if
0304 *               patternlength or textlength is 0 then an 
0305 *               U_ILLEGAL_ARGUMENT_ERROR is returned.
0306 * @return search iterator data structure, or NULL if there is an error.
0307 * @stable ICU 2.4
0308 */
0309 U_CAPI UStringSearch * U_EXPORT2 usearch_open(const UChar    *pattern,
0310                                               int32_t         patternlength,
0311                                         const UChar          *text,
0312                                               int32_t         textlength,
0313                                         const char           *locale,
0314                                               UBreakIterator *breakiter,
0315                                               UErrorCode     *status);
0316 
0317 /**
0318 * Creates a String Search iterator data struct using the argument collator language
0319 * rule set. Note, user retains the ownership of this collator, thus the
0320 * responsibility of deletion lies with the user.
0321 
0322 * NOTE: String Search cannot be instantiated from a collator that has
0323 * collate digits as numbers (CODAN) turned on (UCOL_NUMERIC_COLLATION).
0324 *
0325 * The UStringSearch retains a pointer to both the pattern and text strings.
0326 * The caller must not modify or delete them while using the UStringSearch.
0327 *
0328 * @param pattern for matching
0329 * @param patternlength length of the pattern, -1 for null-termination
0330 * @param text text string
0331 * @param textlength length of the text string, -1 for null-termination
0332 * @param collator used for the language rules
0333 * @param breakiter A BreakIterator that will be used to restrict the points
0334 *                  at which matches are detected. If a match is found, but
0335 *                  the match's start or end index is not a boundary as
0336 *                  determined by the <code>BreakIterator</code>, the match will
0337 *                  be rejected and another will be searched for.
0338 *                  If this parameter is <code>NULL</code>, no break detection is
0339 *                  attempted.
0340 * @param status for errors if it occurs. If collator, pattern or text is NULL,
0341 *               or if patternlength or textlength is 0 then an
0342 *               U_ILLEGAL_ARGUMENT_ERROR is returned.
0343 * @return search iterator data structure, or NULL if there is an error.
0344 * @stable ICU 2.4
0345 */
0346 U_CAPI UStringSearch * U_EXPORT2 usearch_openFromCollator(
0347                                          const UChar          *pattern,
0348                                                int32_t         patternlength,
0349                                          const UChar          *text,
0350                                                int32_t         textlength,
0351                                          const UCollator      *collator,
0352                                                UBreakIterator *breakiter,
0353                                                UErrorCode     *status);
0354 
0355 /**
0356  * Destroys and cleans up the String Search iterator data struct.
0357  * If a collator was created in <code>usearch_open</code>, then it will be destroyed here.
0358  * @param searchiter The UStringSearch to clean up
0359  * @stable ICU 2.4
0360  */
0361 U_CAPI void U_EXPORT2 usearch_close(UStringSearch *searchiter);
0362 
0363 #if U_SHOW_CPLUSPLUS_API
0364 
0365 U_NAMESPACE_BEGIN
0366 
0367 /**
0368  * \class LocalUStringSearchPointer
0369  * "Smart pointer" class, closes a UStringSearch via usearch_close().
0370  * For most methods see the LocalPointerBase base class.
0371  *
0372  * @see LocalPointerBase
0373  * @see LocalPointer
0374  * @stable ICU 4.4
0375  */
0376 U_DEFINE_LOCAL_OPEN_POINTER(LocalUStringSearchPointer, UStringSearch, usearch_close);
0377 
0378 U_NAMESPACE_END
0379 
0380 #endif
0381 
0382 /* get and set methods -------------------------------------------------- */
0383 
0384 /**
0385 * Sets the current position in the text string which the next search will 
0386 * start from. Clears previous states. 
0387 * This method takes the argument index and sets the position in the text 
0388 * string accordingly without checking if the index is pointing to a 
0389 * valid starting point to begin searching. 
0390 * Search positions that may render incorrect results are highlighted in the
0391 * header comments
0392 * @param strsrch search iterator data struct
0393 * @param position position to start next search from. If position is less
0394 *          than or greater than the text range for searching, 
0395 *          an U_INDEX_OUTOFBOUNDS_ERROR will be returned
0396 * @param status error status if any.
0397 * @stable ICU 2.4
0398 */
0399 U_CAPI void U_EXPORT2 usearch_setOffset(UStringSearch *strsrch,
0400                                         int32_t        position,
0401                                         UErrorCode    *status);
0402 
0403 /**
0404 * Return the current index in the string text being searched.
0405 * If the iteration has gone past the end of the text (or past the beginning 
0406 * for a backwards search), <code>USEARCH_DONE</code> is returned.
0407 * @param strsrch search iterator data struct
0408 * @see #USEARCH_DONE
0409 * @stable ICU 2.4
0410 */
0411 U_CAPI int32_t U_EXPORT2 usearch_getOffset(const UStringSearch *strsrch);
0412     
0413 /**
0414 * Sets the text searching attributes located in the enum USearchAttribute
0415 * with values from the enum USearchAttributeValue.
0416 * <code>USEARCH_DEFAULT</code> can be used for all attributes for resetting.
0417 * @param strsrch search iterator data struct
0418 * @param attribute text attribute to be set
0419 * @param value text attribute value
0420 * @param status for errors if it occurs
0421 * @see #usearch_getAttribute
0422 * @stable ICU 2.4
0423 */
0424 U_CAPI void U_EXPORT2 usearch_setAttribute(UStringSearch         *strsrch,
0425                                            USearchAttribute       attribute,
0426                                            USearchAttributeValue  value,
0427                                            UErrorCode            *status);
0428 
0429 /**    
0430 * Gets the text searching attributes.
0431 * @param strsrch search iterator data struct
0432 * @param attribute text attribute to be retrieve
0433 * @return text attribute value
0434 * @see #usearch_setAttribute
0435 * @stable ICU 2.4
0436 */
0437 U_CAPI USearchAttributeValue U_EXPORT2 usearch_getAttribute(
0438                                          const UStringSearch    *strsrch,
0439                                                USearchAttribute  attribute);
0440 
0441 /**
0442 * Returns the index to the match in the text string that was searched.
0443 * This call returns a valid result only after a successful call to 
0444 * <code>usearch_first</code>, <code>usearch_next</code>, <code>usearch_previous</code>, 
0445 * or <code>usearch_last</code>.
0446 * Just after construction, or after a searching method returns 
0447 * <code>USEARCH_DONE</code>, this method will return <code>USEARCH_DONE</code>.
0448 * <p>
0449 * Use <code>usearch_getMatchedLength</code> to get the matched string length.
0450 * @param strsrch search iterator data struct
0451 * @return index to a substring within the text string that is being 
0452 *         searched.
0453 * @see #usearch_first
0454 * @see #usearch_next
0455 * @see #usearch_previous
0456 * @see #usearch_last
0457 * @see #USEARCH_DONE
0458 * @stable ICU 2.4
0459 */
0460 U_CAPI int32_t U_EXPORT2 usearch_getMatchedStart(
0461                                                const UStringSearch *strsrch);
0462     
0463 /**
0464 * Returns the length of text in the string which matches the search pattern. 
0465 * This call returns a valid result only after a successful call to 
0466 * <code>usearch_first</code>, <code>usearch_next</code>, <code>usearch_previous</code>, 
0467 * or <code>usearch_last</code>.
0468 * Just after construction, or after a searching method returns 
0469 * <code>USEARCH_DONE</code>, this method will return 0.
0470 * @param strsrch search iterator data struct
0471 * @return The length of the match in the string text, or 0 if there is no 
0472 *         match currently.
0473 * @see #usearch_first
0474 * @see #usearch_next
0475 * @see #usearch_previous
0476 * @see #usearch_last
0477 * @see #USEARCH_DONE
0478 * @stable ICU 2.4
0479 */
0480 U_CAPI int32_t U_EXPORT2 usearch_getMatchedLength(
0481                                                const UStringSearch *strsrch);
0482 
0483 /**
0484 * Returns the text that was matched by the most recent call to 
0485 * <code>usearch_first</code>, <code>usearch_next</code>, <code>usearch_previous</code>, 
0486 * or <code>usearch_last</code>.
0487 * If the iterator is not pointing at a valid match (e.g. just after 
0488 * construction or after <code>USEARCH_DONE</code> has been returned, returns
0489 * an empty string. If result is not large enough to store the matched text,
0490 * result will be filled with the partial text and an U_BUFFER_OVERFLOW_ERROR 
0491 * will be returned in status. result will be null-terminated whenever 
0492 * possible. If the buffer fits the matched text exactly, a null-termination 
0493 * is not possible, then a U_STRING_NOT_TERMINATED_ERROR set in status.
0494 * Pre-flighting can be either done with length = 0 or the API 
0495 * <code>usearch_getMatchedLength</code>.
0496 * @param strsrch search iterator data struct
0497 * @param result UChar buffer to store the matched string
0498 * @param resultCapacity length of the result buffer
0499 * @param status error returned if result is not large enough
0500 * @return exact length of the matched text, not counting the null-termination
0501 * @see #usearch_first
0502 * @see #usearch_next
0503 * @see #usearch_previous
0504 * @see #usearch_last
0505 * @see #USEARCH_DONE
0506 * @stable ICU 2.4
0507 */
0508 U_CAPI int32_t U_EXPORT2 usearch_getMatchedText(const UStringSearch *strsrch, 
0509                                             UChar         *result, 
0510                                             int32_t        resultCapacity, 
0511                                             UErrorCode    *status);
0512 
0513 #if !UCONFIG_NO_BREAK_ITERATION
0514 
0515 /**
0516 * Set the BreakIterator that will be used to restrict the points at which 
0517 * matches are detected.
0518 * @param strsrch search iterator data struct
0519 * @param breakiter A BreakIterator that will be used to restrict the points
0520 *                  at which matches are detected. If a match is found, but 
0521 *                  the match's start or end index is not a boundary as 
0522 *                  determined by the <code>BreakIterator</code>, the match will 
0523 *                  be rejected and another will be searched for. 
0524 *                  If this parameter is <code>NULL</code>, no break detection is 
0525 *                  attempted.
0526 * @param status for errors if it occurs
0527 * @see #usearch_getBreakIterator
0528 * @stable ICU 2.4
0529 */
0530 U_CAPI void U_EXPORT2 usearch_setBreakIterator(UStringSearch  *strsrch, 
0531                                                UBreakIterator *breakiter,
0532                                                UErrorCode     *status);
0533 
0534 /**
0535 * Returns the BreakIterator that is used to restrict the points at which 
0536 * matches are detected. This will be the same object that was passed to the 
0537 * constructor or to <code>usearch_setBreakIterator</code>. Note that 
0538 * <code>NULL</code> 
0539 * is a legal value; it means that break detection should not be attempted.
0540 * @param strsrch search iterator data struct
0541 * @return break iterator used
0542 * @see #usearch_setBreakIterator
0543 * @stable ICU 2.4
0544 */
0545 U_CAPI const UBreakIterator * U_EXPORT2 usearch_getBreakIterator(
0546                                               const UStringSearch *strsrch);
0547     
0548 #endif
0549 
0550 /**
0551 * Set the string text to be searched. Text iteration will hence begin at the 
0552 * start of the text string. This method is useful if you want to re-use an 
0553 * iterator to search for the same pattern within a different body of text.
0554 *
0555 * The UStringSearch retains a pointer to the text string. The caller must not
0556 * modify or delete the string while using the UStringSearch.
0557 *
0558 * @param strsrch search iterator data struct
0559 * @param text new string to look for match
0560 * @param textlength length of the new string, -1 for null-termination
0561 * @param status for errors if it occurs. If text is NULL, or textlength is 0 
0562 *               then an U_ILLEGAL_ARGUMENT_ERROR is returned with no change
0563 *               done to strsrch.
0564 * @see #usearch_getText
0565 * @stable ICU 2.4
0566 */
0567 U_CAPI void U_EXPORT2 usearch_setText(      UStringSearch *strsrch, 
0568                                       const UChar         *text,
0569                                             int32_t        textlength,
0570                                             UErrorCode    *status);
0571 
0572 /**
0573 * Return the string text to be searched.
0574 * @param strsrch search iterator data struct
0575 * @param length returned string text length
0576 * @return string text 
0577 * @see #usearch_setText
0578 * @stable ICU 2.4
0579 */
0580 U_CAPI const UChar * U_EXPORT2 usearch_getText(const UStringSearch *strsrch, 
0581                                                int32_t       *length);
0582 
0583 /**
0584 * Gets the collator used for the language rules. 
0585 * <p>
0586 * Deleting the returned <code>UCollator</code> before calling 
0587 * <code>usearch_close</code> would cause the string search to fail.
0588 * <code>usearch_close</code> will delete the collator if this search owns it.
0589 * @param strsrch search iterator data struct
0590 * @return collator
0591 * @stable ICU 2.4
0592 */
0593 U_CAPI UCollator * U_EXPORT2 usearch_getCollator(
0594                                                const UStringSearch *strsrch);
0595 
0596 /**
0597 * Sets the collator used for the language rules. User retains the ownership 
0598 * of this collator, thus the responsibility of deletion lies with the user.
0599 * This method causes internal data such as the pattern collation elements
0600 * and shift tables to be recalculated, but the iterator's position is unchanged.
0601 * @param strsrch search iterator data struct
0602 * @param collator to be used
0603 * @param status for errors if it occurs
0604 * @stable ICU 2.4
0605 */
0606 U_CAPI void U_EXPORT2 usearch_setCollator(      UStringSearch *strsrch, 
0607                                           const UCollator     *collator,
0608                                                 UErrorCode    *status);
0609 
0610 /**
0611 * Sets the pattern used for matching.
0612 * Internal data like the pattern collation elements will be recalculated, but the 
0613 * iterator's position is unchanged.
0614 *
0615 * The UStringSearch retains a pointer to the pattern string. The caller must not
0616 * modify or delete the string while using the UStringSearch.
0617 *
0618 * @param strsrch search iterator data struct
0619 * @param pattern string
0620 * @param patternlength pattern length, -1 for null-terminated string
0621 * @param status for errors if it occurs. If text is NULL, or textlength is 0 
0622 *               then an U_ILLEGAL_ARGUMENT_ERROR is returned with no change
0623 *               done to strsrch.
0624 * @stable ICU 2.4
0625 */
0626 U_CAPI void U_EXPORT2 usearch_setPattern(      UStringSearch *strsrch, 
0627                                          const UChar         *pattern,
0628                                                int32_t        patternlength,
0629                                                UErrorCode    *status);
0630 
0631 /**
0632 * Gets the search pattern
0633 * @param strsrch search iterator data struct
0634 * @param length return length of the pattern, -1 indicates that the pattern 
0635 *               is null-terminated
0636 * @return pattern string
0637 * @stable ICU 2.4
0638 */
0639 U_CAPI const UChar * U_EXPORT2 usearch_getPattern(
0640                                                const UStringSearch *strsrch, 
0641                                                      int32_t       *length);
0642 
0643 /* methods ------------------------------------------------------------- */
0644 
0645 /**
0646 * Returns the first index at which the string text matches the search 
0647 * pattern.  
0648 * The iterator is adjusted so that its current index (as returned by 
0649 * <code>usearch_getOffset</code>) is the match position if one was found.
0650 * If a match is not found, <code>USEARCH_DONE</code> will be returned and
0651 * the iterator will be adjusted to the index <code>USEARCH_DONE</code>.
0652 * @param strsrch search iterator data struct
0653 * @param status for errors if it occurs
0654 * @return The character index of the first match, or 
0655 * <code>USEARCH_DONE</code> if there are no matches.
0656 * @see #usearch_getOffset
0657 * @see #USEARCH_DONE
0658 * @stable ICU 2.4
0659 */
0660 U_CAPI int32_t U_EXPORT2 usearch_first(UStringSearch *strsrch, 
0661                                            UErrorCode    *status);
0662 
0663 /**
0664 * Returns the first index equal or greater than <code>position</code> at which
0665 * the string text
0666 * matches the search pattern. The iterator is adjusted so that its current 
0667 * index (as returned by <code>usearch_getOffset</code>) is the match position if 
0668 * one was found.
0669 * If a match is not found, <code>USEARCH_DONE</code> will be returned and
0670 * the iterator will be adjusted to the index <code>USEARCH_DONE</code>
0671 * <p>
0672 * Search positions that may render incorrect results are highlighted in the
0673 * header comments. If position is less than or greater than the text range 
0674 * for searching, an U_INDEX_OUTOFBOUNDS_ERROR will be returned
0675 * @param strsrch search iterator data struct
0676 * @param position to start the search at
0677 * @param status for errors if it occurs
0678 * @return The character index of the first match following <code>pos</code>,
0679 *         or <code>USEARCH_DONE</code> if there are no matches.
0680 * @see #usearch_getOffset
0681 * @see #USEARCH_DONE
0682 * @stable ICU 2.4
0683 */
0684 U_CAPI int32_t U_EXPORT2 usearch_following(UStringSearch *strsrch, 
0685                                                int32_t    position, 
0686                                                UErrorCode    *status);
0687     
0688 /**
0689 * Returns the last index in the target text at which it matches the search 
0690 * pattern. The iterator is adjusted so that its current 
0691 * index (as returned by <code>usearch_getOffset</code>) is the match position if 
0692 * one was found.
0693 * If a match is not found, <code>USEARCH_DONE</code> will be returned and
0694 * the iterator will be adjusted to the index <code>USEARCH_DONE</code>.
0695 * @param strsrch search iterator data struct
0696 * @param status for errors if it occurs
0697 * @return The index of the first match, or <code>USEARCH_DONE</code> if there 
0698 *         are no matches.
0699 * @see #usearch_getOffset
0700 * @see #USEARCH_DONE
0701 * @stable ICU 2.4
0702 */
0703 U_CAPI int32_t U_EXPORT2 usearch_last(UStringSearch *strsrch, 
0704                                           UErrorCode    *status);
0705 
0706 /**
0707 * Returns the first index less than <code>position</code> at which the string text 
0708 * matches the search pattern. The iterator is adjusted so that its current 
0709 * index (as returned by <code>usearch_getOffset</code>) is the match position if 
0710 * one was found.
0711 * If a match is not found, <code>USEARCH_DONE</code> will be returned and
0712 * the iterator will be adjusted to the index <code>USEARCH_DONE</code>
0713 * <p>
0714 * Search positions that may render incorrect results are highlighted in the
0715 * header comments. If position is less than or greater than the text range 
0716 * for searching, an U_INDEX_OUTOFBOUNDS_ERROR will be returned.
0717 * <p>
0718 * When <code>USEARCH_OVERLAP</code> option is off, the last index of the
0719 * result match is always less than <code>position</code>.
0720 * When <code>USERARCH_OVERLAP</code> is on, the result match may span across
0721 * <code>position</code>.
0722 * @param strsrch search iterator data struct
0723 * @param position index position the search is to begin at
0724 * @param status for errors if it occurs
0725 * @return The character index of the first match preceding <code>pos</code>,
0726 *         or <code>USEARCH_DONE</code> if there are no matches.
0727 * @see #usearch_getOffset
0728 * @see #USEARCH_DONE
0729 * @stable ICU 2.4
0730 */
0731 U_CAPI int32_t U_EXPORT2 usearch_preceding(UStringSearch *strsrch, 
0732                                                int32_t    position, 
0733                                                UErrorCode    *status);
0734     
0735 /**
0736 * Returns the index of the next point at which the string text matches the
0737 * search pattern, starting from the current position.
0738 * The iterator is adjusted so that its current 
0739 * index (as returned by <code>usearch_getOffset</code>) is the match position if 
0740 * one was found.
0741 * If a match is not found, <code>USEARCH_DONE</code> will be returned and
0742 * the iterator will be adjusted to the index <code>USEARCH_DONE</code>
0743 * @param strsrch search iterator data struct
0744 * @param status for errors if it occurs
0745 * @return The index of the next match after the current position, or 
0746 *         <code>USEARCH_DONE</code> if there are no more matches.
0747 * @see #usearch_first
0748 * @see #usearch_getOffset
0749 * @see #USEARCH_DONE
0750 * @stable ICU 2.4
0751 */
0752 U_CAPI int32_t U_EXPORT2 usearch_next(UStringSearch *strsrch, 
0753                                           UErrorCode    *status);
0754 
0755 /**
0756 * Returns the index of the previous point at which the string text matches
0757 * the search pattern, starting at the current position.
0758 * The iterator is adjusted so that its current 
0759 * index (as returned by <code>usearch_getOffset</code>) is the match position if 
0760 * one was found.
0761 * If a match is not found, <code>USEARCH_DONE</code> will be returned and
0762 * the iterator will be adjusted to the index <code>USEARCH_DONE</code>
0763 * @param strsrch search iterator data struct
0764 * @param status for errors if it occurs
0765 * @return The index of the previous match before the current position,
0766 *         or <code>USEARCH_DONE</code> if there are no more matches.
0767 * @see #usearch_last
0768 * @see #usearch_getOffset
0769 * @see #USEARCH_DONE
0770 * @stable ICU 2.4
0771 */
0772 U_CAPI int32_t U_EXPORT2 usearch_previous(UStringSearch *strsrch, 
0773                                               UErrorCode    *status);
0774     
0775 /** 
0776 * Reset the iteration.
0777 * Search will begin at the start of the text string if a forward iteration 
0778 * is initiated before a backwards iteration. Otherwise if a backwards 
0779 * iteration is initiated before a forwards iteration, the search will begin
0780 * at the end of the text string.
0781 * @param strsrch search iterator data struct
0782 * @see #usearch_first
0783 * @stable ICU 2.4
0784 */
0785 U_CAPI void U_EXPORT2 usearch_reset(UStringSearch *strsrch);
0786 
0787 #ifndef U_HIDE_INTERNAL_API
0788 /**
0789   *  Simple forward search for the pattern, starting at a specified index,
0790   *     and using a default set search options.
0791   *
0792   *  This is an experimental function, and is not an official part of the
0793   *      ICU API.
0794   *
0795   *  The collator options, such as UCOL_STRENGTH and UCOL_NORMALIZTION, are honored.
0796   *
0797   *  The UStringSearch options USEARCH_CANONICAL_MATCH, USEARCH_OVERLAP and
0798   *  any Break Iterator are ignored.
0799   *
0800   *  Matches obey the following constraints:
0801   *
0802   *      Characters at the start or end positions of a match that are ignorable
0803   *      for collation are not included as part of the match, unless they
0804   *      are part of a combining sequence, as described below.
0805   *
0806   *      A match will not include a partial combining sequence.  Combining
0807   *      character sequences  are considered to be  inseparable units,
0808   *      and either match the pattern completely, or are considered to not match
0809   *      at all.  Thus, for example, an A followed a combining accent mark will 
0810   *      not be found when searching for a plain (unaccented) A.   (unless
0811   *      the collation strength has been set to ignore all accents).
0812   *
0813   *      When beginning a search, the initial starting position, startIdx,
0814   *      is assumed to be an acceptable match boundary with respect to
0815   *      combining characters.  A combining sequence that spans across the
0816   *      starting point will not suppress a match beginning at startIdx.
0817   *
0818   *      Characters that expand to multiple collation elements
0819   *      (German sharp-S becoming 'ss', or the composed forms of accented
0820   *      characters, for example) also must match completely.
0821   *      Searching for a single 's' in a string containing only a sharp-s will 
0822   *      find no match.
0823   *
0824   *
0825   *  @param strsrch    the UStringSearch struct, which references both
0826   *                    the text to be searched  and the pattern being sought.
0827   *  @param startIdx   The index into the text to begin the search.
0828   *  @param matchStart An out parameter, the starting index of the matched text.
0829   *                    This parameter may be NULL.
0830   *                    A value of -1 will be returned if no match was found.
0831   *  @param matchLimit Out parameter, the index of the first position following the matched text.
0832   *                    The matchLimit will be at a suitable position for beginning a subsequent search
0833   *                    in the input text.
0834   *                    This parameter may be NULL.
0835   *                    A value of -1 will be returned if no match was found.
0836   *          
0837   *  @param status     Report any errors.  Note that no match found is not an error.
0838   *  @return           true if a match was found, false otherwise.
0839   *
0840   *  @internal
0841   */
0842 U_CAPI UBool U_EXPORT2 usearch_search(UStringSearch *strsrch,
0843                                           int32_t        startIdx,
0844                                           int32_t        *matchStart,
0845                                           int32_t        *matchLimit,
0846                                           UErrorCode     *status);
0847 
0848 /**
0849   *  Simple backwards search for the pattern, starting at a specified index,
0850   *     and using using a default set search options.
0851   *
0852   *  This is an experimental function, and is not an official part of the
0853   *      ICU API.
0854   *
0855   *  The collator options, such as UCOL_STRENGTH and UCOL_NORMALIZTION, are honored.
0856   *
0857   *  The UStringSearch options USEARCH_CANONICAL_MATCH, USEARCH_OVERLAP and
0858   *  any Break Iterator are ignored.
0859   *
0860   *  Matches obey the following constraints:
0861   *
0862   *      Characters at the start or end positions of a match that are ignorable
0863   *      for collation are not included as part of the match, unless they
0864   *      are part of a combining sequence, as described below.
0865   *
0866   *      A match will not include a partial combining sequence.  Combining
0867   *      character sequences  are considered to be  inseparable units,
0868   *      and either match the pattern completely, or are considered to not match
0869   *      at all.  Thus, for example, an A followed a combining accent mark will 
0870   *      not be found when searching for a plain (unaccented) A.   (unless
0871   *      the collation strength has been set to ignore all accents).
0872   *
0873   *      When beginning a search, the initial starting position, startIdx,
0874   *      is assumed to be an acceptable match boundary with respect to
0875   *      combining characters.  A combining sequence that spans across the
0876   *      starting point will not suppress a match beginning at startIdx.
0877   *
0878   *      Characters that expand to multiple collation elements
0879   *      (German sharp-S becoming 'ss', or the composed forms of accented
0880   *      characters, for example) also must match completely.
0881   *      Searching for a single 's' in a string containing only a sharp-s will 
0882   *      find no match.
0883   *
0884   *
0885   *  @param strsrch    the UStringSearch struct, which references both
0886   *                    the text to be searched  and the pattern being sought.
0887   *  @param startIdx   The index into the text to begin the search.
0888   *  @param matchStart An out parameter, the starting index of the matched text.
0889   *                    This parameter may be NULL.
0890   *                    A value of -1 will be returned if no match was found.
0891   *  @param matchLimit Out parameter, the index of the first position following the matched text.
0892   *                    The matchLimit will be at a suitable position for beginning a subsequent search
0893   *                    in the input text.
0894   *                    This parameter may be NULL.
0895   *                    A value of -1 will be returned if no match was found.
0896   *          
0897   *  @param status     Report any errors.  Note that no match found is not an error.
0898   *  @return           true if a match was found, false otherwise.
0899   *
0900   *  @internal
0901   */
0902 U_CAPI UBool U_EXPORT2 usearch_searchBackwards(UStringSearch *strsrch,
0903                                                    int32_t        startIdx,
0904                                                    int32_t        *matchStart,
0905                                                    int32_t        *matchLimit,
0906                                                    UErrorCode     *status);
0907 #endif  /* U_HIDE_INTERNAL_API */
0908 
0909 #endif /* #if !UCONFIG_NO_COLLATION  && !UCONFIG_NO_BREAK_ITERATION */
0910 
0911 #endif