Opened 8 years ago
Closed 3 years ago
#46 closed defect (fixed)
Text does not wrap properly for DBCS
Reported by: | Lewis Rosenthal | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | NewView | Version: | 2.19 |
Keywords: | Cc: |
Description
DBCS systems have problems with text wrapping. This apparently makes New View nearly unusable on these systems.
Alex can provide some examples.
Change History (8)
comment:1 by , 8 years ago
comment:2 by , 8 years ago
As for Thai, I'm not sure if proper word-wrapping logic is possible because it would actually require dictionary-based word boundary detection. (Thai not only doesn't use spaces, apparently it doesn't even use punctuation.) In the case of Thai (codepage 874), I would suggest simply to allow wrapping before any character between 0xA1 and 0xFB.
comment:3 by , 7 years ago
Looking at the source, a possible, very simple workaround occurs to me.
In HelpWindowUnit.pas, try changing:
AtLeastOneWordBeforeWrap := true; // not sure
to
AtLeastOneWordBeforeWrap := false;
I would test this myself, except that I've been unable to set up a working NewView build environment.
EDIT: In retrospect, no, this would not work. The code would still attempt to break characters on byte boundaries, which is obviously fatal to DBCS text.
comment:4 by , 6 years ago
The logic that needs modifying is mainly in components\RichTextLayoutUnit.pas (part of the CompLib rich text control). I know broadly what kind of logic has to be added; however, I cannot implement this myself because I lack the necessary knowledge of both Object Pascal and the program internals.
comment:5 by , 6 years ago
Well, after poking away at it for a while, I think I've come up with a maybe-fix for the DBCS text wrapping. It seems to work in my testing, anyway (with Simplified Chinese, Korean and Japanese). Some of the logic hopefully also works for Thai, but that hasn't been tested yet.
Note that I have not (yet?) tried to write the kind of language-specific intelligent word break rules that I described earlier. At the moment it simply allows wrapping after every double-byte (or single-byte Thai or Japanese) character. This looks a little odd, but it's the same thing the standard PM edit control does, so users will presumably be accustomed to it.
Changes have been committed; SVN r418 (rich text control) and r419 (NewView DBCS font default).
Note that cursor movement and horizontal marking are still badly broken with DBCS text (as they always have been). I tried to address those as well, but without success so far.
DBCS character width calculation seems a bit off; with some fonts it seems to wrap slightly sooner than it needs to, but not to an extent that I'd consider it a major problem. (At least it does wrap now...)
comment:6 by , 6 years ago
r420 seems to have fixed the character width calculation, so the wrap margins are better now.
comment:7 by , 6 years ago
r421 adds logic for coping with cursor positioning calculations in DBCS (or mixed) text. This seems to address the problems with marking and cursor movement, more or less, but I have not tested heavily yet.
Actual drawing of the cursor in DBCS text seems a bit wonky sometimes, at least when using a Truetype font for text. Hopefully that's just a cosmetic thing, and may be font-dependent. I haven't seen the issue so far when using default fonts.
(Unrelated: I also added a small tweak to try and fix the long-standing problem with junk appended to text copied to the clipboard. Jury is still out on whether it works, since the actual error was always somewhat random.)
comment:8 by , 3 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
The issue is that NewView apparently only wraps on whitespace (space/tab/newline). This is a problem because written Chinese, Japanese, and Thai don't have spaces between words.
The standard PM MLE handles this by just allowing wrapping on every single character when using a DBCS codepage. This is ugly but at least it is kind of readable. A better approach (which the IBM VIEW.EXE seems to use) is to observe the text wrapping rules for East Asian languages.
I guess NewView uses codepage text handling rather than Unicode. So I would suggest the following logic.
Check the current codepage.