|
||||
Warning, file /include/unicode/ushape.h was not indexed or was modified since last indexation (in which case cross-reference links may be missing, inaccurate or erroneous).
0001 // © 2016 and later: Unicode, Inc. and others. 0002 // License & terms of use: http://www.unicode.org/copyright.html 0003 /* 0004 ****************************************************************************** 0005 * 0006 * Copyright (C) 2000-2012, International Business Machines 0007 * Corporation and others. All Rights Reserved. 0008 * 0009 ****************************************************************************** 0010 * file name: ushape.h 0011 * encoding: UTF-8 0012 * tab size: 8 (not used) 0013 * indentation:4 0014 * 0015 * created on: 2000jun29 0016 * created by: Markus W. Scherer 0017 */ 0018 0019 #ifndef __USHAPE_H__ 0020 #define __USHAPE_H__ 0021 0022 #include "unicode/utypes.h" 0023 0024 /** 0025 * \file 0026 * \brief C API: Arabic shaping 0027 * 0028 */ 0029 0030 /** 0031 * Shape Arabic text on a character basis. 0032 * 0033 * <p>This function performs basic operations for "shaping" Arabic text. It is most 0034 * useful for use with legacy data formats and legacy display technology 0035 * (simple terminals). All operations are performed on Unicode characters.</p> 0036 * 0037 * <p>Text-based shaping means that some character code points in the text are 0038 * replaced by others depending on the context. It transforms one kind of text 0039 * into another. In comparison, modern displays for Arabic text select 0040 * appropriate, context-dependent font glyphs for each text element, which means 0041 * that they transform text into a glyph vector.</p> 0042 * 0043 * <p>Text transformations are necessary when modern display technology is not 0044 * available or when text needs to be transformed to or from legacy formats that 0045 * use "shaped" characters. Since the Arabic script is cursive, connecting 0046 * adjacent letters to each other, computers select images for each letter based 0047 * on the surrounding letters. This usually results in four images per Arabic 0048 * letter: initial, middle, final, and isolated forms. In Unicode, on the other 0049 * hand, letters are normally stored abstract, and a display system is expected 0050 * to select the necessary glyphs. (This makes searching and other text 0051 * processing easier because the same letter has only one code.) It is possible 0052 * to mimic this with text transformations because there are characters in 0053 * Unicode that are rendered as letters with a specific shape 0054 * (or cursive connectivity). They were included for interoperability with 0055 * legacy systems and codepages, and for unsophisticated display systems.</p> 0056 * 0057 * <p>A second kind of text transformations is supported for Arabic digits: 0058 * For compatibility with legacy codepages that only include European digits, 0059 * it is possible to replace one set of digits by another, changing the 0060 * character code points. These operations can be performed for either 0061 * Arabic-Indic Digits (U+0660...U+0669) or Eastern (Extended) Arabic-Indic 0062 * digits (U+06f0...U+06f9).</p> 0063 * 0064 * <p>Some replacements may result in more or fewer characters (code points). 0065 * By default, this means that the destination buffer may receive text with a 0066 * length different from the source length. Some legacy systems rely on the 0067 * length of the text to be constant. They expect extra spaces to be added 0068 * or consumed either next to the affected character or at the end of the 0069 * text.</p> 0070 * 0071 * <p>For details about the available operations, see the description of the 0072 * <code>U_SHAPE_...</code> options.</p> 0073 * 0074 * @param source The input text. 0075 * 0076 * @param sourceLength The number of UChars in <code>source</code>. 0077 * 0078 * @param dest The destination buffer that will receive the results of the 0079 * requested operations. It may be <code>NULL</code> only if 0080 * <code>destSize</code> is 0. The source and destination must not 0081 * overlap. 0082 * 0083 * @param destSize The size (capacity) of the destination buffer in UChars. 0084 * If <code>destSize</code> is 0, then no output is produced, 0085 * but the necessary buffer size is returned ("preflighting"). 0086 * 0087 * @param options This is a 32-bit set of flags that specify the operations 0088 * that are performed on the input text. If no error occurs, 0089 * then the result will always be written to the destination 0090 * buffer. 0091 * 0092 * @param pErrorCode must be a valid pointer to an error code value, 0093 * which must not indicate a failure before the function call. 0094 * 0095 * @return The number of UChars written to the destination buffer. 0096 * If an error occurred, then no output was written, or it may be 0097 * incomplete. If <code>U_BUFFER_OVERFLOW_ERROR</code> is set, then 0098 * the return value indicates the necessary destination buffer size. 0099 * @stable ICU 2.0 0100 */ 0101 U_CAPI int32_t U_EXPORT2 0102 u_shapeArabic(const UChar *source, int32_t sourceLength, 0103 UChar *dest, int32_t destSize, 0104 uint32_t options, 0105 UErrorCode *pErrorCode); 0106 0107 /** 0108 * Memory option: allow the result to have a different length than the source. 0109 * Affects: LamAlef options 0110 * @stable ICU 2.0 0111 */ 0112 #define U_SHAPE_LENGTH_GROW_SHRINK 0 0113 0114 /** 0115 * Memory option: allow the result to have a different length than the source. 0116 * Affects: LamAlef options 0117 * This option is an alias to U_SHAPE_LENGTH_GROW_SHRINK 0118 * @stable ICU 4.2 0119 */ 0120 #define U_SHAPE_LAMALEF_RESIZE 0 0121 0122 /** 0123 * Memory option: the result must have the same length as the source. 0124 * If more room is necessary, then try to consume spaces next to modified characters. 0125 * @stable ICU 2.0 0126 */ 0127 #define U_SHAPE_LENGTH_FIXED_SPACES_NEAR 1 0128 0129 /** 0130 * Memory option: the result must have the same length as the source. 0131 * If more room is necessary, then try to consume spaces next to modified characters. 0132 * Affects: LamAlef options 0133 * This option is an alias to U_SHAPE_LENGTH_FIXED_SPACES_NEAR 0134 * @stable ICU 4.2 0135 */ 0136 #define U_SHAPE_LAMALEF_NEAR 1 0137 0138 /** 0139 * Memory option: the result must have the same length as the source. 0140 * If more room is necessary, then try to consume spaces at the end of the text. 0141 * @stable ICU 2.0 0142 */ 0143 #define U_SHAPE_LENGTH_FIXED_SPACES_AT_END 2 0144 0145 /** 0146 * Memory option: the result must have the same length as the source. 0147 * If more room is necessary, then try to consume spaces at the end of the text. 0148 * Affects: LamAlef options 0149 * This option is an alias to U_SHAPE_LENGTH_FIXED_SPACES_AT_END 0150 * @stable ICU 4.2 0151 */ 0152 #define U_SHAPE_LAMALEF_END 2 0153 0154 /** 0155 * Memory option: the result must have the same length as the source. 0156 * If more room is necessary, then try to consume spaces at the beginning of the text. 0157 * @stable ICU 2.0 0158 */ 0159 #define U_SHAPE_LENGTH_FIXED_SPACES_AT_BEGINNING 3 0160 0161 /** 0162 * Memory option: the result must have the same length as the source. 0163 * If more room is necessary, then try to consume spaces at the beginning of the text. 0164 * Affects: LamAlef options 0165 * This option is an alias to U_SHAPE_LENGTH_FIXED_SPACES_AT_BEGINNING 0166 * @stable ICU 4.2 0167 */ 0168 #define U_SHAPE_LAMALEF_BEGIN 3 0169 0170 0171 /** 0172 * Memory option: the result must have the same length as the source. 0173 * Shaping Mode: For each LAMALEF character found, expand LAMALEF using space at end. 0174 * If there is no space at end, use spaces at beginning of the buffer. If there 0175 * is no space at beginning of the buffer, use spaces at the near (i.e. the space 0176 * after the LAMALEF character). 0177 * If there are no spaces found, an error U_NO_SPACE_AVAILABLE (as defined in utypes.h) 0178 * will be set in pErrorCode 0179 * 0180 * Deshaping Mode: Perform the same function as the flag equals U_SHAPE_LAMALEF_END. 0181 * Affects: LamAlef options 0182 * @stable ICU 4.2 0183 */ 0184 #define U_SHAPE_LAMALEF_AUTO 0x10000 0185 0186 /** Bit mask for memory options. @stable ICU 2.0 */ 0187 #define U_SHAPE_LENGTH_MASK 0x10003 /* Changed old value 3 */ 0188 0189 0190 /** 0191 * Bit mask for LamAlef memory options. 0192 * @stable ICU 4.2 0193 */ 0194 #define U_SHAPE_LAMALEF_MASK 0x10003 /* updated */ 0195 0196 /** Direction indicator: the source is in logical (keyboard) order. @stable ICU 2.0 */ 0197 #define U_SHAPE_TEXT_DIRECTION_LOGICAL 0 0198 0199 /** 0200 * Direction indicator: 0201 * the source is in visual RTL order, 0202 * the rightmost displayed character stored first. 0203 * This option is an alias to U_SHAPE_TEXT_DIRECTION_LOGICAL 0204 * @stable ICU 4.2 0205 */ 0206 #define U_SHAPE_TEXT_DIRECTION_VISUAL_RTL 0 0207 0208 /** 0209 * Direction indicator: 0210 * the source is in visual LTR order, 0211 * the leftmost displayed character stored first. 0212 * @stable ICU 2.0 0213 */ 0214 #define U_SHAPE_TEXT_DIRECTION_VISUAL_LTR 4 0215 0216 /** Bit mask for direction indicators. @stable ICU 2.0 */ 0217 #define U_SHAPE_TEXT_DIRECTION_MASK 4 0218 0219 0220 /** Letter shaping option: do not perform letter shaping. @stable ICU 2.0 */ 0221 #define U_SHAPE_LETTERS_NOOP 0 0222 0223 /** Letter shaping option: replace abstract letter characters by "shaped" ones. @stable ICU 2.0 */ 0224 #define U_SHAPE_LETTERS_SHAPE 8 0225 0226 /** Letter shaping option: replace "shaped" letter characters by abstract ones. @stable ICU 2.0 */ 0227 #define U_SHAPE_LETTERS_UNSHAPE 0x10 0228 0229 /** 0230 * Letter shaping option: replace abstract letter characters by "shaped" ones. 0231 * The only difference with U_SHAPE_LETTERS_SHAPE is that Tashkeel letters 0232 * are always "shaped" into the isolated form instead of the medial form 0233 * (selecting code points from the Arabic Presentation Forms-B block). 0234 * @stable ICU 2.0 0235 */ 0236 #define U_SHAPE_LETTERS_SHAPE_TASHKEEL_ISOLATED 0x18 0237 0238 0239 /** Bit mask for letter shaping options. @stable ICU 2.0 */ 0240 #define U_SHAPE_LETTERS_MASK 0x18 0241 0242 0243 /** Digit shaping option: do not perform digit shaping. @stable ICU 2.0 */ 0244 #define U_SHAPE_DIGITS_NOOP 0 0245 0246 /** 0247 * Digit shaping option: 0248 * Replace European digits (U+0030...) by Arabic-Indic digits. 0249 * @stable ICU 2.0 0250 */ 0251 #define U_SHAPE_DIGITS_EN2AN 0x20 0252 0253 /** 0254 * Digit shaping option: 0255 * Replace Arabic-Indic digits by European digits (U+0030...). 0256 * @stable ICU 2.0 0257 */ 0258 #define U_SHAPE_DIGITS_AN2EN 0x40 0259 0260 /** 0261 * Digit shaping option: 0262 * Replace European digits (U+0030...) by Arabic-Indic digits if the most recent 0263 * strongly directional character is an Arabic letter 0264 * (<code>u_charDirection()</code> result <code>U_RIGHT_TO_LEFT_ARABIC</code> [AL]).<br> 0265 * The direction of "preceding" depends on the direction indicator option. 0266 * For the first characters, the preceding strongly directional character 0267 * (initial state) is assumed to be not an Arabic letter 0268 * (it is <code>U_LEFT_TO_RIGHT</code> [L] or <code>U_RIGHT_TO_LEFT</code> [R]). 0269 * @stable ICU 2.0 0270 */ 0271 #define U_SHAPE_DIGITS_ALEN2AN_INIT_LR 0x60 0272 0273 /** 0274 * Digit shaping option: 0275 * Replace European digits (U+0030...) by Arabic-Indic digits if the most recent 0276 * strongly directional character is an Arabic letter 0277 * (<code>u_charDirection()</code> result <code>U_RIGHT_TO_LEFT_ARABIC</code> [AL]).<br> 0278 * The direction of "preceding" depends on the direction indicator option. 0279 * For the first characters, the preceding strongly directional character 0280 * (initial state) is assumed to be an Arabic letter. 0281 * @stable ICU 2.0 0282 */ 0283 #define U_SHAPE_DIGITS_ALEN2AN_INIT_AL 0x80 0284 0285 /** Not a valid option value. May be replaced by a new option. @stable ICU 2.0 */ 0286 #define U_SHAPE_DIGITS_RESERVED 0xa0 0287 0288 /** Bit mask for digit shaping options. @stable ICU 2.0 */ 0289 #define U_SHAPE_DIGITS_MASK 0xe0 0290 0291 0292 /** Digit type option: Use Arabic-Indic digits (U+0660...U+0669). @stable ICU 2.0 */ 0293 #define U_SHAPE_DIGIT_TYPE_AN 0 0294 0295 /** Digit type option: Use Eastern (Extended) Arabic-Indic digits (U+06f0...U+06f9). @stable ICU 2.0 */ 0296 #define U_SHAPE_DIGIT_TYPE_AN_EXTENDED 0x100 0297 0298 /** Not a valid option value. May be replaced by a new option. @stable ICU 2.0 */ 0299 #define U_SHAPE_DIGIT_TYPE_RESERVED 0x200 0300 0301 /** Bit mask for digit type options. @stable ICU 2.0 */ 0302 #define U_SHAPE_DIGIT_TYPE_MASK 0x300 /* I need to change this from 0x3f00 to 0x300 */ 0303 0304 /** 0305 * Tashkeel aggregation option: 0306 * Replaces any combination of U+0651 with one of 0307 * U+064C, U+064D, U+064E, U+064F, U+0650 with 0308 * U+FC5E, U+FC5F, U+FC60, U+FC61, U+FC62 consecutively. 0309 * @stable ICU 3.6 0310 */ 0311 #define U_SHAPE_AGGREGATE_TASHKEEL 0x4000 0312 /** Tashkeel aggregation option: do not aggregate tashkeels. @stable ICU 3.6 */ 0313 #define U_SHAPE_AGGREGATE_TASHKEEL_NOOP 0 0314 /** Bit mask for tashkeel aggregation. @stable ICU 3.6 */ 0315 #define U_SHAPE_AGGREGATE_TASHKEEL_MASK 0x4000 0316 0317 /** 0318 * Presentation form option: 0319 * Don't replace Arabic Presentation Forms-A and Arabic Presentation Forms-B 0320 * characters with 0+06xx characters, before shaping. 0321 * @stable ICU 3.6 0322 */ 0323 #define U_SHAPE_PRESERVE_PRESENTATION 0x8000 0324 /** Presentation form option: 0325 * Replace Arabic Presentation Forms-A and Arabic Presentationo Forms-B with 0326 * their unshaped correspondents in range 0+06xx, before shaping. 0327 * @stable ICU 3.6 0328 */ 0329 #define U_SHAPE_PRESERVE_PRESENTATION_NOOP 0 0330 /** Bit mask for preserve presentation form. @stable ICU 3.6 */ 0331 #define U_SHAPE_PRESERVE_PRESENTATION_MASK 0x8000 0332 0333 /* Seen Tail option */ 0334 /** 0335 * Memory option: the result must have the same length as the source. 0336 * Shaping mode: The SEEN family character will expand into two characters using space near 0337 * the SEEN family character(i.e. the space after the character). 0338 * If there are no spaces found, an error U_NO_SPACE_AVAILABLE (as defined in utypes.h) 0339 * will be set in pErrorCode 0340 * 0341 * De-shaping mode: Any Seen character followed by Tail character will be 0342 * replaced by one cell Seen and a space will replace the Tail. 0343 * Affects: Seen options 0344 * @stable ICU 4.2 0345 */ 0346 #define U_SHAPE_SEEN_TWOCELL_NEAR 0x200000 0347 0348 /** 0349 * Bit mask for Seen memory options. 0350 * @stable ICU 4.2 0351 */ 0352 #define U_SHAPE_SEEN_MASK 0x700000 0353 0354 /* YehHamza option */ 0355 /** 0356 * Memory option: the result must have the same length as the source. 0357 * Shaping mode: The YEHHAMZA character will expand into two characters using space near it 0358 * (i.e. the space after the character 0359 * If there are no spaces found, an error U_NO_SPACE_AVAILABLE (as defined in utypes.h) 0360 * will be set in pErrorCode 0361 * 0362 * De-shaping mode: Any Yeh (final or isolated) character followed by Hamza character will be 0363 * replaced by one cell YehHamza and space will replace the Hamza. 0364 * Affects: YehHamza options 0365 * @stable ICU 4.2 0366 */ 0367 #define U_SHAPE_YEHHAMZA_TWOCELL_NEAR 0x1000000 0368 0369 0370 /** 0371 * Bit mask for YehHamza memory options. 0372 * @stable ICU 4.2 0373 */ 0374 #define U_SHAPE_YEHHAMZA_MASK 0x3800000 0375 0376 /* New Tashkeel options */ 0377 /** 0378 * Memory option: the result must have the same length as the source. 0379 * Shaping mode: Tashkeel characters will be replaced by spaces. 0380 * Spaces will be placed at beginning of the buffer 0381 * 0382 * De-shaping mode: N/A 0383 * Affects: Tashkeel options 0384 * @stable ICU 4.2 0385 */ 0386 #define U_SHAPE_TASHKEEL_BEGIN 0x40000 0387 0388 /** 0389 * Memory option: the result must have the same length as the source. 0390 * Shaping mode: Tashkeel characters will be replaced by spaces. 0391 * Spaces will be placed at end of the buffer 0392 * 0393 * De-shaping mode: N/A 0394 * Affects: Tashkeel options 0395 * @stable ICU 4.2 0396 */ 0397 #define U_SHAPE_TASHKEEL_END 0x60000 0398 0399 /** 0400 * Memory option: allow the result to have a different length than the source. 0401 * Shaping mode: Tashkeel characters will be removed, buffer length will shrink. 0402 * De-shaping mode: N/A 0403 * 0404 * Affect: Tashkeel options 0405 * @stable ICU 4.2 0406 */ 0407 #define U_SHAPE_TASHKEEL_RESIZE 0x80000 0408 0409 /** 0410 * Memory option: the result must have the same length as the source. 0411 * Shaping mode: Tashkeel characters will be replaced by Tatweel if it is connected to adjacent 0412 * characters (i.e. shaped on Tatweel) or replaced by space if it is not connected. 0413 * 0414 * De-shaping mode: N/A 0415 * Affects: YehHamza options 0416 * @stable ICU 4.2 0417 */ 0418 #define U_SHAPE_TASHKEEL_REPLACE_BY_TATWEEL 0xC0000 0419 0420 /** 0421 * Bit mask for Tashkeel replacement with Space or Tatweel memory options. 0422 * @stable ICU 4.2 0423 */ 0424 #define U_SHAPE_TASHKEEL_MASK 0xE0000 0425 0426 0427 /* Space location Control options */ 0428 /** 0429 * This option affect the meaning of BEGIN and END options. if this option is not used the default 0430 * for BEGIN and END will be as following: 0431 * The Default (for both Visual LTR, Visual RTL and Logical Text) 0432 * 1. BEGIN always refers to the start address of physical memory. 0433 * 2. END always refers to the end address of physical memory. 0434 * 0435 * If this option is used it will swap the meaning of BEGIN and END only for Visual LTR text. 0436 * 0437 * The effect on BEGIN and END Memory Options will be as following: 0438 * A. BEGIN For Visual LTR text: This will be the beginning (right side) of the visual text( 0439 * corresponding to the physical memory address end for Visual LTR text, Same as END in 0440 * default behavior) 0441 * B. BEGIN For Logical text: Same as BEGIN in default behavior. 0442 * C. END For Visual LTR text: This will be the end (left side) of the visual text (corresponding 0443 * to the physical memory address beginning for Visual LTR text, Same as BEGIN in default behavior. 0444 * D. END For Logical text: Same as END in default behavior). 0445 * Affects: All LamAlef BEGIN, END and AUTO options. 0446 * @stable ICU 4.2 0447 */ 0448 #define U_SHAPE_SPACES_RELATIVE_TO_TEXT_BEGIN_END 0x4000000 0449 0450 /** 0451 * Bit mask for swapping BEGIN and END for Visual LTR text 0452 * @stable ICU 4.2 0453 */ 0454 #define U_SHAPE_SPACES_RELATIVE_TO_TEXT_MASK 0x4000000 0455 0456 /** 0457 * If this option is used, shaping will use the new Unicode code point for TAIL (i.e. 0xFE73). 0458 * If this option is not specified (Default), old unofficial Unicode TAIL code point is used (i.e. 0x200B) 0459 * De-shaping will not use this option as it will always search for both the new Unicode code point for the 0460 * TAIL (i.e. 0xFE73) or the old unofficial Unicode TAIL code point (i.e. 0x200B) and de-shape the 0461 * Seen-Family letter accordingly. 0462 * 0463 * Shaping Mode: Only shaping. 0464 * De-shaping Mode: N/A. 0465 * Affects: All Seen options 0466 * @stable ICU 4.8 0467 */ 0468 #define U_SHAPE_TAIL_NEW_UNICODE 0x8000000 0469 0470 /** 0471 * Bit mask for new Unicode Tail option 0472 * @stable ICU 4.8 0473 */ 0474 #define U_SHAPE_TAIL_TYPE_MASK 0x8000000 0475 0476 #endif
[ Source navigation ] | [ Diff markup ] | [ Identifier search ] | [ general search ] |
This page was automatically generated by the 2.3.7 LXR engine. The LXR team |