Opened 8 years ago

Closed 3 years ago

#46 closed defect (fixed)

Text does not wrap properly for DBCS

Reported by: Lewis Rosenthal Owned by:
Priority: major Milestone:
Component: NewView Version: 2.19
Keywords: Cc:

Description

DBCS systems have problems with text wrapping. This apparently makes New View nearly unusable on these systems.

Alex can provide some examples.

Change History (8)

comment:1 by ataylor, 8 years ago

The issue is that NewView apparently only wraps on whitespace (space/tab/newline). This is a problem because written Chinese, Japanese, and Thai don't have spaces between words.

The standard PM MLE handles this by just allowing wrapping on every single character when using a DBCS codepage. This is ugly but at least it is kind of readable. A better approach (which the IBM VIEW.EXE seems to use) is to observe the text wrapping rules for East Asian languages.

I guess NewView uses codepage text handling rather than Unicode. So I would suggest the following logic.

Check the current codepage.

  • For codepage 932, 942, or 943 (Japanese):
    • Allow wrapping on space/tab/newline
    • Allow wrapping before any single-byte character above 0xA0 except: 0xA1, 0xA3, 0xA4, 0xA5, 0xA7, 0xA8, 0xA9, 0xAA, 0xAB, 0xAC, 0xAD, 0xAE, 0xAF, 0xB0, 0xDE, 0xDF
    • Allow wrapping before any double-byte character except: 0x8141, 0x8142, 0x8145, 0x8148, 0x8149, 0x814A, 0x814B, 0x8158, 0x815A, 0x815B, 0x8160, 0x8168, 0x816C, 0x8172, 0x8174, 0x8176, 0x8178, 0x817A, 0x818B, 0x818E, 0x818F, 0x8190, 0x8191, 0x8192, 0x8193, 0x829F, 0x82A1, 0x82A3, 0x82A5, 0x82A7, 0x82C1, 0x82E1, 0x82E3, 0x82E5, 0x8340, 0x8342, 0x8344, 0x8346, 0x8348, 0x8362, 0x8383, 0x8385, 0x8387
  • For codepage 1381 or 1386 (Simplified Chinese):
    • Allow wrapping on space/tab/newline
    • Allow wrapping before any double-byte character except: 0xA1A4, 0xA1B1, 0xA3A7, 0xA1A5, 0xA1B3, 0xA3A9, 0xA1A6, 0xA1B5, 0xA3AC, 0xA1A7, 0xA1B7, 0xA3AE, 0xA1A8, 0xA1B9, 0xA3BA, 0xA1A9, 0xA1BB, 0xA3BB, 0xA1AA, 0xA1BD, 0xA3BF, 0xA1AB, 0xA1BF, 0xA3DD, 0xA1AC, 0xA1C3, 0xA3E0, 0xA1A2, 0xA1AD, 0xA3A1, 0xA3FC, 0xA1A3, 0xA1AF, 0xA3A2, 0xA3FD
  • For codepage 950 (Traditional Chinese):
    • Allow wrapping on space/tab/newline
    • Allow wrapping before any double-byte character except: 0xA147, 0xA156, 0xA16E, 0xA148, 0xA157, 0xA170, 0xA149, 0xA158, 0xA172, 0xA14A, 0xA159, 0xA174, 0xA14B, 0xA15A, 0xA176, 0xA14C, 0xA15B, 0xA178, 0xA14D, 0xA15C, 0xA17A, 0xA14E, 0xA15E, 0xA17C, 0xA14F, 0xA160, 0xA17E, 0xA141, 0xA150, 0xA162, 0xA1A2, 0xA142, 0xA151, 0xA164, 0xA1A4, 0xA143, 0xA152, 0xA166, 0xA1A6, 0xA144, 0xA153, 0xA168, 0xA1A8, 0xA145, 0xA154, 0xA16A, 0xA1AA, 0xA146, 0xA155, 0xA16C, 0xA1AC

comment:2 by ataylor, 8 years ago

As for Thai, I'm not sure if proper word-wrapping logic is possible because it would actually require dictionary-based word boundary detection. (Thai not only doesn't use spaces, apparently it doesn't even use punctuation.) In the case of Thai (codepage 874), I would suggest simply to allow wrapping before any character between 0xA1 and 0xFB.

comment:3 by ataylor, 7 years ago

Looking at the source, a possible, very simple workaround occurs to me.

In HelpWindowUnit.pas, try changing:

AtLeastOneWordBeforeWrap := true; // not sure

to

AtLeastOneWordBeforeWrap := false; 

I would test this myself, except that I've been unable to set up a working NewView build environment.


EDIT: In retrospect, no, this would not work. The code would still attempt to break characters on byte boundaries, which is obviously fatal to DBCS text.

Last edited 6 years ago by ataylor (previous) (diff)

comment:4 by ataylor, 6 years ago

The logic that needs modifying is mainly in components\RichTextLayoutUnit.pas (part of the CompLib rich text control). I know broadly what kind of logic has to be added; however, I cannot implement this myself because I lack the necessary knowledge of both Object Pascal and the program internals.

comment:5 by ataylor, 6 years ago

Well, after poking away at it for a while, I think I've come up with a maybe-fix for the DBCS text wrapping. It seems to work in my testing, anyway (with both Korean and Japanese).

Changes have been committed; SVN r419.

Note that cursor movement and horizontal marking are still badly broken with DBCS text (as they always have been). I tried to address those as well, but without success so far.

DBCS character width calculation seems a bit off; it seems to wrap slightly sooner than it needs to, but not to an extent that I'd consider it a major problem. (At least it does wrap now...)

Version 0, edited 6 years ago by ataylor (next)

comment:6 by ataylor, 6 years ago

r420 seems to have fixed the character width calculation, so the wrap margins are better now.

comment:7 by ataylor, 6 years ago

r421 adds logic for coping with cursor positioning calculations in DBCS (or mixed) text. This seems to address the problems with marking and cursor movement, more or less, but I have not tested heavily yet.

Actual drawing of the cursor in DBCS text seems a bit wonky sometimes, at least when using a Truetype font for text. Hopefully that's just a cosmetic thing, and may be font-dependent. I haven't seen the issue so far when using default fonts.

(Unrelated: I also added a small tweak to try and fix the long-standing problem with junk appended to text copied to the clipboard. Jury is still out on whether it works, since the actual error was always somewhat random.)

comment:8 by ataylor, 3 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.