Opened 6 years ago

Last modified 4 years ago

#514 assigned defect

Fix "double-free" error

Reported by: jbs Owned by: jbs
Priority: minor Milestone: Release_3.25
Component: fm/2 base Version: 3.21
Keywords: Cc:

Description (last modified by jbs)

Occasionally FM/2 displays a error message regarding an attempt to free an item twice.

For example: FreeCnrItemData attempting to free 00ba0680 data twice, fileName was D:\netlabs\FM2\trunk\warpin\fm2.wis.in

One way to trigger this message is: 1) Use Alt-click in a directory container to rename a file. 2) Have FM/2 rescan the directory. 3) Message pops up.

However if one uses CTRL+r to rename the file, the rescan does not pop up the message.

Note: I do not think renaming is the only way to trigger this message.

Attachments (4)

filldirs.snip (721 bytes) - added by jbs 6 years ago.
Snippet from filldirs.c
filldir.snip2 (4.9 KB) - added by jbs 6 years ago.
Code snippet of pci-tracing debug code
0404bn.log (77.0 KB) - added by jbs 6 years ago.
Log showing double free and its cause
0820attachment.log (41.9 KB) - added by jbs 4 years ago.
Edited, annotated log showing relink error

Download all attachments as: .zip

Change History (18)

comment:1 Changed 6 years ago by jbs

  • Description modified (diff)

comment:2 Changed 6 years ago by gyoung

I think I know what is happening and why it is so hard to reproduce.

From watcom for realloc: Description: When the value of the old_blk argument is NULL, a new block of memory of size bytes is allocated.

If the value of size is zero, the corresponding free function is called to release the memory pointed to by old_blk.

Otherwise, the realloc function re-allocates space for an object of size bytes by either:

  • shrinking the allocated size of the allocated memory block old_blk when size is sufficiently smaller than the size of old_blk.
  • extending the allocated size of the allocated memory block old_blk if there is a large enough block of unallocated memory immediately following old_blk.
  • allocating a new block and copying the contents of old_blk to the new block.

Because it is possible that a new block will be allocated, any pointers into the old memory should not be maintained. These pointers will point to freed memory, with possible disastrous results, when a new block is allocated.

This will free pointers almost randomly because it is dependent on the size of the new memory relative to the old and the size of free memory if any following the old block. If I remember correctly we use realloc with the Alt key container editting. Here is where we use (x)realocate misc.c and strutils.c look like strong candidates to start with.

arccnrs.c:3111: xrealloc(li->list, y * sizeof(CHAR *), pszSrcFile, command.c:628: pszCommandsList = xrealloc(pszCommandsList, command.c:742: pszCommandsList = xrealloc(pszCommandsList, ulSizeCommandsList + strlen(info->title) + 1, dirsize.c:30: 13 Aug 07 SHL ProcessDir?: remove unneeded reallocs. Sanitize code draglist.c:50:#include "wrappers.h" xrealloc draglist.c:317: xrealloc(paDImgIcons, sizeof(DRAGIMAGE) * (ulNumDIAlloc + 4L), draglist.c:336: xrealloc(ppDItem, sizeof(DRAGITEM *) * (ulNumDIAlloc + 4L), draglist.c:424: xrealloc(ppDItem, sizeof(DRAGITEM *) * (ulNumDIAlloc + 5L), draglist.c:647: xrealloc(paDImgIcons, sizeof(DRAGIMAGE) * (ulNumDIAlloc + 4L), draglist.c:666: xrealloc(ppDItem, sizeof(DRAGITEM *) * (ulNumDIAlloc + 4L), droplist.c:42:#include "wrappers.h" xrealloc droplist.c:426: xrealloc(files, numalloc * sizeof(CHAR *), pszSrcFile, LINE); droplist.c:431: xrealloc(pcbFile, numalloc * sizeof(ULONGLONG), pszSrcFile, LINE); droplist.c:436: xrealloc(pulitemID, numalloc * sizeof(ULONG), pszSrcFile, LINE); filldir.c:1044: paffbTemp = xrealloc(paffbTotal, makelist.c:125: Use plain realloc for speed makelist.c:130: pflArray = realloc(*list, (*pnumalloced + 6) * sizeof(FILELIST *)); makelist.c:180: Use plain realloc for speed makelist.c:184: ppsz = realloc(*list, (*pnumalloced + 6) * sizeof(CHAR *)); makelist.c:186: Runtime_Error(pszSrcFile, LINE, "realloc"); makelist.c:232: Use plain realloc for speed makelist.c:237: test = realloc(list, sizeof(CHAR *) * (numfiles + 1)); misc.c:40: 21 Jan 08 GKY Stop reallocating NullStr? by direct editing of empty subject and longname strings. misc.c:875: pci->pszSubject = xrealloc(pci->pszSubject, retlen + 1, pszSrcFile, LINE); misc.c:935: pci->pszLongName = xrealloc(pci->pszLongName, retlen + 1, pszSrcFile, LINE); misc.c:964: pci->pszFileName = xrealloc(pci->pszFileName, sizeof(szData), pszSrcFile, LINE); newview.c:1330: temp = xrealloc(ad->lines, sizeof(CHAR *) * newview.c:1368: xrealloc(ad->lines, sizeof(CHAR *) * ad->numlines, seeall.c:1571: xrealloc(ad->afhead, (ad->afalloc + 1) * sizeof(ALLFILES), seeall.c:1580: xrealloc(ad->afindex, seeall.c:1979: xrealloc(pAD->afhead, pAD->afheadcnt * sizeof(ALLFILES), pszSrcFile, seeall.c:1986: xrealloc(pAD->afindex, pAD->afheadcnt * sizeof(ALLFILES *), pszSrcFile, seeall.c:2079: temp = xrealloc(ad->afhead, (ad->afalloc + 1000) * seeall.c:2193: xrealloc(ad->afhead, sizeof(ALLFILES) * ad->afheadcnt, pszSrcFile, seeall.c:2200: xrealloc(ad->afindex, sizeof(ALLFILES *) * ad->afheadcnt, pszSrcFile, select.c:665: xrealloc(Cnrs, (numwindows + 1) * sizeof(struct Cnr), pszSrcFile, select.c:703: xrealloc(Cnrs[z].ss, (x + 1) * sizeof(struct SS), pszSrcFile, strutil.c:182: PSZ psz2 = xrealloc(psz, curLen + l + 1, pszSrcFile, LINE); strutil.c:228: pNewLoadedStrings = xrealloc(pLoadedStrings, strutil.c:233: Runtime_Error(pszSrcFile, LINE, "realloc failed"); uudecode.c:387: xrealloc(list, sizeof(CHAR *) * (numfiles + 1), pszSrcFile,

comment:3 Changed 6 years ago by gyoung

misc.c:964: pci->pszFileName = xrealloc(pci->pszFileName, sizeof(szData), pszSrcFile, LINE); Isn't pci->pszDisplayName often set equal to pszFileName ?

comment:4 Changed 6 years ago by jbs

In my earlier description of how to replicate the error I did not mention that I use "Details view" for all my directory containers. And, FWIW, I also have the details view display the subject EA's.

I do not know if the type of view or the presence of Subjects are factors. But it might explain why others are having trouble replicating the error.

comment:5 Changed 6 years ago by stevenhl

  • Milestone set to Release_3.23

Changed 6 years ago by jbs

Snippet from filldirs.c

comment:6 Changed 6 years ago by jbs

I have looking to see if I can find the cause of this bug (and perhaps fix it). In my investigation I have found some code that I do not understand and some of it looks suspicious.

So I have attached an edited-for-readability snippet of the code in question. It is from filldirs.c, in the RemoveCnrItems function at or near line 2020.

My questions are: 1) Why the nested while loops? 2) What is the timer code about and does it need some kind of "stop"?

comment:7 Changed 6 years ago by stevenhl

The code you have found has problems. It looks like it was implemented while the programmer was not really paying attention. The nested loop is not needed. That said while the code both non-optimal and confusing, it's not going to break anything.

IdleIfNeeded() reduces the thread priority when an operation takes too long. This improves UI responsiveness. SleepIfNeeded() exists for the same reason. Which one is best for a given operation is a judgement call.

Last edited 6 years ago by stevenhl (previous) (diff)

comment:8 Changed 6 years ago by stevenhl

RemoveCnrItems() has been cleaned up with [1759]

comment:9 Changed 6 years ago by gyoung

RemoveCnrItems? was modified to use the inner loop again to avoid checking the timer on each loop after the thread is switched to idle. A second break which was missing was added. Its absence could in part explain a double free error. I can no longer reproduce this error.[1763]

Last edited 4 years ago by gyoung (previous) (diff)

Changed 6 years ago by jbs

Code snippet of pci-tracing debug code

Changed 6 years ago by jbs

Log showing double free and its cause

comment:10 Changed 6 years ago by jbs

I have uploaded a log of FM/2 that seems to show 1) that the double free error is real 2) the cause of the double free.

I have tried to annotate the file so that it can be read and interpreted more easily. Therefore I will only point out the highlights:

Lines 688-696 show a container record being deleted (as a result of direct editing).

Lines 734-789 show RemoveCnrItems deleting all the records from that same container (because FM/2 is closing down). RemoveCnrItems uses preccNextRecord to iterate through the records in the container. As it does so it encounters the previously deleted record and tries to free the "extra bytes" pointers which had already been freed in lines 638-696 (hence the double free).

There is no evidence in the log to suggest that the deleted pci had been reallocated. Nor was any action taken which would cause another pci to be allocated. (FM/2 was closed immediately after the direct edit.)

In addition to the double free, RemoveCnrItems stops iterating through the records in the container after encountering the deleted record, even though there are more records in the container, because the deleted record's preccNextRecord pointer has been set to zero. So the "extra byte" pointers, like pszFileName, are not freed for all the remaining records in the container (a minor memory leak).

Also, when RemoveCnrItems is changed to use CM_QUERYRECORD with CMA_NEXT to iterate through the records in the container, the deleted record is not re-encountered, there is no double free and the list is fully traversed.

This all seems to suggest that any place where FM/2 iterates through the records in a container and it is important to access all the records and/or important not to encounter deleted records, that we choose between: a) Find a way to ensure that the preccNextRecord pointers are accurate; or b) Stop using preccNextRecord and use CM_QUERYRECORD with CMA_NEXT instead. c) Other?

comment:11 Changed 4 years ago by gyoung

  • Milestone changed from Release_3.23 to Release_3.24

Ticket retargeted after milestone closed

comment:12 Changed 4 years ago by jbs

  • Owner set to jbs
  • Status changed from new to assigned

comment:13 Changed 4 years ago by jbs

What I found was that the direct edit code would first delete the record being edited and then later delete and refill the entire container. When deleting the entire list of records RemoveContainerItems() used preccNextRecord to traverse through the list of container records. The problem was that as it traversed the list it would encounter the single record which had been deleted earlier!? This caused two errors: 1) The second attempt to free that record (i.e. the double free); and 2) The traversal would stop at that record (because its preccNextRecord was null) and not free the remaining items from the list (i.e. a memory leak).

Since I did not know how to guarantee that the preccNextRecord which pointed to the previously deleted record was relinked to the record following the deleted one, I used a simple (but maybe not ideal) solution: Switch from using preccNextRecord in RemoveContainerItems() to sending CM_QUERYRECORD + CMA_NEXT messages to traverse the list. This fixes both errors.

Changed 4 years ago by jbs

Edited, annotated log showing relink error

comment:14 Changed 4 years ago by gyoung

  • Milestone changed from Release_3.24 to Release_3.25
Note: See TracTickets for help on using tickets.