Opened 8 years ago

Closed 2 years ago

#46 closed defect (fixed)

Text does not wrap properly for DBCS

Reported by: Lewis Rosenthal Owned by:
Priority: major Milestone:
Component: NewView Version: 2.19
Keywords: Cc:

Description

DBCS systems have problems with text wrapping. This apparently makes New View nearly unusable on these systems.

Alex can provide some examples.

Change History (8)

comment:1 Changed 8 years ago by ataylor

The issue is that NewView apparently only wraps on whitespace (space/tab/newline). This is a problem because written Chinese, Japanese, and Thai don't have spaces between words.

The standard PM MLE handles this by just allowing wrapping on every single character when using a DBCS codepage. This is ugly but at least it is kind of readable. A better approach (which the IBM VIEW.EXE seems to use) is to observe the text wrapping rules for East Asian languages.

I guess NewView uses codepage text handling rather than Unicode. So I would suggest the following logic.

Check the current codepage.

  • For codepage 932, 942, or 943 (Japanese):
    • Allow wrapping on space/tab/newline
    • Allow wrapping before any single-byte character above 0xA0 except: 0xA1, 0xA3, 0xA4, 0xA5, 0xA7, 0xA8, 0xA9, 0xAA, 0xAB, 0xAC, 0xAD, 0xAE, 0xAF, 0xB0, 0xDE, 0xDF
    • Allow wrapping before any double-byte character except: 0x8141, 0x8142, 0x8145, 0x8148, 0x8149, 0x814A, 0x814B, 0x8158, 0x815A, 0x815B, 0x8160, 0x8168, 0x816C, 0x8172, 0x8174, 0x8176, 0x8178, 0x817A, 0x818B, 0x818E, 0x818F, 0x8190, 0x8191, 0x8192, 0x8193, 0x829F, 0x82A1, 0x82A3, 0x82A5, 0x82A7, 0x82C1, 0x82E1, 0x82E3, 0x82E5, 0x8340, 0x8342, 0x8344, 0x8346, 0x8348, 0x8362, 0x8383, 0x8385, 0x8387
  • For codepage 1381 or 1386 (Simplified Chinese):
    • Allow wrapping on space/tab/newline
    • Allow wrapping before any double-byte character except: 0xA1A4, 0xA1B1, 0xA3A7, 0xA1A5, 0xA1B3, 0xA3A9, 0xA1A6, 0xA1B5, 0xA3AC, 0xA1A7, 0xA1B7, 0xA3AE, 0xA1A8, 0xA1B9, 0xA3BA, 0xA1A9, 0xA1BB, 0xA3BB, 0xA1AA, 0xA1BD, 0xA3BF, 0xA1AB, 0xA1BF, 0xA3DD, 0xA1AC, 0xA1C3, 0xA3E0, 0xA1A2, 0xA1AD, 0xA3A1, 0xA3FC, 0xA1A3, 0xA1AF, 0xA3A2, 0xA3FD
  • For codepage 950 (Traditional Chinese):
    • Allow wrapping on space/tab/newline
    • Allow wrapping before any double-byte character except: 0xA147, 0xA156, 0xA16E, 0xA148, 0xA157, 0xA170, 0xA149, 0xA158, 0xA172, 0xA14A, 0xA159, 0xA174, 0xA14B, 0xA15A, 0xA176, 0xA14C, 0xA15B, 0xA178, 0xA14D, 0xA15C, 0xA17A, 0xA14E, 0xA15E, 0xA17C, 0xA14F, 0xA160, 0xA17E, 0xA141, 0xA150, 0xA162, 0xA1A2, 0xA142, 0xA151, 0xA164, 0xA1A4, 0xA143, 0xA152, 0xA166, 0xA1A6, 0xA144, 0xA153, 0xA168, 0xA1A8, 0xA145, 0xA154, 0xA16A, 0xA1AA, 0xA146, 0xA155, 0xA16C, 0xA1AC

comment:2 Changed 8 years ago by ataylor

As for Thai, I'm not sure if proper word-wrapping logic is possible because it would actually require dictionary-based word boundary detection. (Thai not only doesn't use spaces, apparently it doesn't even use punctuation.) In the case of Thai (codepage 874), I would suggest simply to allow wrapping before any character between 0xA1 and 0xFB.

comment:3 Changed 6 years ago by ataylor

Looking at the source, a possible, very simple workaround occurs to me.

In HelpWindowUnit?.pas, try changing:

AtLeastOneWordBeforeWrap := true; // not sure

to

AtLeastOneWordBeforeWrap := false; 

I would test this myself, except that I've been unable to set up a working NewView build environment.


EDIT: In retrospect, no, this would not work. The code would still attempt to break characters on byte boundaries, which is obviously fatal to DBCS text.

Last edited 5 years ago by ataylor (previous) (diff)

comment:4 Changed 5 years ago by ataylor

The logic that needs modifying is mainly in components\RichTextLayoutUnit?.pas (part of the CompLib? rich text control). I know broadly what kind of logic has to be added; however, I cannot implement this myself because I lack the necessary knowledge of both Object Pascal and the program internals.

comment:5 Changed 5 years ago by ataylor

Well, after poking away at it for a while, I think I've come up with a maybe-fix for the DBCS text wrapping. It seems to work in my testing, anyway (with Simplified Chinese, Korean and Japanese). Some of the logic hopefully also works for Thai, but that hasn't been tested yet.

Note that I have not (yet?) tried to write the kind of language-specific intelligent word break rules that I described earlier. At the moment it simply allows wrapping after every double-byte (or single-byte Thai or Japanese) character. This looks a little odd, but it's the same thing the standard PM edit control does, so users will presumably be accustomed to it.

Changes have been committed; SVN r418 (rich text control) and r419 (NewView DBCS font default).

Note that cursor movement and horizontal marking are still badly broken with DBCS text (as they always have been). I tried to address those as well, but without success so far.

DBCS character width calculation seems a bit off; with some fonts it seems to wrap slightly sooner than it needs to, but not to an extent that I'd consider it a major problem. (At least it does wrap now...)

Last edited 5 years ago by ataylor (previous) (diff)

comment:6 Changed 5 years ago by ataylor

r420 seems to have fixed the character width calculation, so the wrap margins are better now.

comment:7 Changed 5 years ago by ataylor

r421 adds logic for coping with cursor positioning calculations in DBCS (or mixed) text. This seems to address the problems with marking and cursor movement, more or less, but I have not tested heavily yet.

Actual drawing of the cursor in DBCS text seems a bit wonky sometimes, at least when using a Truetype font for text. Hopefully that's just a cosmetic thing, and may be font-dependent. I haven't seen the issue so far when using default fonts.

(Unrelated: I also added a small tweak to try and fix the long-standing problem with junk appended to text copied to the clipboard. Jury is still out on whether it works, since the actual error was always somewhat random.)

comment:8 Changed 2 years ago by ataylor

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.