Opened 10 years ago

Closed 4 years ago

#514 closed defect (fixed)

Fix "double-free" error

Reported by: John Small Owned by: Steven Levine
Priority: minor Milestone: Release_3.25
Component: fm/2 base Version: 3.21
Keywords: Cc:

Description (last modified by John Small)

Occasionally FM/2 displays a error message regarding an attempt to free an item twice.

For example: FreeCnrItemData attempting to free 00ba0680 data twice, fileName was D:\netlabs\FM2\trunk\warpin\fm2.wis.in

One way to trigger this message is: 1) Use Alt-click in a directory container to rename a file. 2) Have FM/2 rescan the directory. 3) Message pops up.

However if one uses CTRL+r to rename the file, the rescan does not pop up the message.

Note: I do not think renaming is the only way to trigger this message.

Attachments (4)

filldirs.snip (721 bytes) - added by John Small 10 years ago.
Snippet from filldirs.c
filldir.snip2 (4.9 KB) - added by John Small 10 years ago.
Code snippet of pci-tracing debug code
0404bn.log (77.0 KB) - added by John Small 10 years ago.
Log showing double free and its cause
0820attachment.log (41.9 KB) - added by John Small 9 years ago.
Edited, annotated log showing relink error

Download all attachments as: .zip

Change History (20)

comment:1 Changed 10 years ago by John Small

Description: modified (diff)

comment:2 Changed 10 years ago by Gregg Young

I think I know what is happening and why it is so hard to reproduce.

From watcom for realloc: Description: When the value of the old_blk argument is NULL, a new block of memory of size bytes is allocated.

If the value of size is zero, the corresponding free function is called to release the memory pointed to by old_blk.

Otherwise, the realloc function re-allocates space for an object of size bytes by either:

  • shrinking the allocated size of the allocated memory block old_blk when size is sufficiently smaller than the size of old_blk.
  • extending the allocated size of the allocated memory block old_blk if there is a large enough block of unallocated memory immediately following old_blk.
  • allocating a new block and copying the contents of old_blk to the new block.

Because it is possible that a new block will be allocated, any pointers into the old memory should not be maintained. These pointers will point to freed memory, with possible disastrous results, when a new block is allocated.

This will free pointers almost randomly because it is dependent on the size of the new memory relative to the old and the size of free memory if any following the old block. If I remember correctly we use realloc with the Alt key container editting. Here is where we use (x)realocate misc.c and strutils.c look like strong candidates to start with.

arccnrs.c:3111: xrealloc(li->list, y * sizeof(CHAR *), pszSrcFile, command.c:628: pszCommandsList = xrealloc(pszCommandsList, command.c:742: pszCommandsList = xrealloc(pszCommandsList, ulSizeCommandsList + strlen(info->title) + 1, dirsize.c:30: 13 Aug 07 SHL ProcessDir?: remove unneeded reallocs. Sanitize code draglist.c:50:#include "wrappers.h" xrealloc draglist.c:317: xrealloc(paDImgIcons, sizeof(DRAGIMAGE) * (ulNumDIAlloc + 4L), draglist.c:336: xrealloc(ppDItem, sizeof(DRAGITEM *) * (ulNumDIAlloc + 4L), draglist.c:424: xrealloc(ppDItem, sizeof(DRAGITEM *) * (ulNumDIAlloc + 5L), draglist.c:647: xrealloc(paDImgIcons, sizeof(DRAGIMAGE) * (ulNumDIAlloc + 4L), draglist.c:666: xrealloc(ppDItem, sizeof(DRAGITEM *) * (ulNumDIAlloc + 4L), droplist.c:42:#include "wrappers.h" xrealloc droplist.c:426: xrealloc(files, numalloc * sizeof(CHAR *), pszSrcFile, LINE); droplist.c:431: xrealloc(pcbFile, numalloc * sizeof(ULONGLONG), pszSrcFile, LINE); droplist.c:436: xrealloc(pulitemID, numalloc * sizeof(ULONG), pszSrcFile, LINE); filldir.c:1044: paffbTemp = xrealloc(paffbTotal, makelist.c:125: Use plain realloc for speed makelist.c:130: pflArray = realloc(*list, (*pnumalloced + 6) * sizeof(FILELIST *)); makelist.c:180: Use plain realloc for speed makelist.c:184: ppsz = realloc(*list, (*pnumalloced + 6) * sizeof(CHAR *)); makelist.c:186: Runtime_Error(pszSrcFile, LINE, "realloc"); makelist.c:232: Use plain realloc for speed makelist.c:237: test = realloc(list, sizeof(CHAR *) * (numfiles + 1)); misc.c:40: 21 Jan 08 GKY Stop reallocating NullStr? by direct editing of empty subject and longname strings. misc.c:875: pci->pszSubject = xrealloc(pci->pszSubject, retlen + 1, pszSrcFile, LINE); misc.c:935: pci->pszLongName = xrealloc(pci->pszLongName, retlen + 1, pszSrcFile, LINE); misc.c:964: pci->pszFileName = xrealloc(pci->pszFileName, sizeof(szData), pszSrcFile, LINE); newview.c:1330: temp = xrealloc(ad->lines, sizeof(CHAR *) * newview.c:1368: xrealloc(ad->lines, sizeof(CHAR *) * ad->numlines, seeall.c:1571: xrealloc(ad->afhead, (ad->afalloc + 1) * sizeof(ALLFILES), seeall.c:1580: xrealloc(ad->afindex, seeall.c:1979: xrealloc(pAD->afhead, pAD->afheadcnt * sizeof(ALLFILES), pszSrcFile, seeall.c:1986: xrealloc(pAD->afindex, pAD->afheadcnt * sizeof(ALLFILES *), pszSrcFile, seeall.c:2079: temp = xrealloc(ad->afhead, (ad->afalloc + 1000) * seeall.c:2193: xrealloc(ad->afhead, sizeof(ALLFILES) * ad->afheadcnt, pszSrcFile, seeall.c:2200: xrealloc(ad->afindex, sizeof(ALLFILES *) * ad->afheadcnt, pszSrcFile, select.c:665: xrealloc(Cnrs, (numwindows + 1) * sizeof(struct Cnr), pszSrcFile, select.c:703: xrealloc(Cnrs[z].ss, (x + 1) * sizeof(struct SS), pszSrcFile, strutil.c:182: PSZ psz2 = xrealloc(psz, curLen + l + 1, pszSrcFile, LINE); strutil.c:228: pNewLoadedStrings = xrealloc(pLoadedStrings, strutil.c:233: Runtime_Error(pszSrcFile, LINE, "realloc failed"); uudecode.c:387: xrealloc(list, sizeof(CHAR *) * (numfiles + 1), pszSrcFile,

comment:3 Changed 10 years ago by Gregg Young

misc.c:964: pci->pszFileName = xrealloc(pci->pszFileName, sizeof(szData), pszSrcFile, LINE); Isn't pci->pszDisplayName often set equal to pszFileName ?

comment:4 Changed 10 years ago by John Small

In my earlier description of how to replicate the error I did not mention that I use "Details view" for all my directory containers. And, FWIW, I also have the details view display the subject EA's.

I do not know if the type of view or the presence of Subjects are factors. But it might explain why others are having trouble replicating the error.

comment:5 Changed 10 years ago by Steven Levine

Milestone: Release_3.23

Changed 10 years ago by John Small

Attachment: filldirs.snip added

Snippet from filldirs.c

comment:6 Changed 10 years ago by John Small

I have looking to see if I can find the cause of this bug (and perhaps fix it). In my investigation I have found some code that I do not understand and some of it looks suspicious.

So I have attached an edited-for-readability snippet of the code in question. It is from filldirs.c, in the RemoveCnrItems function at or near line 2020.

My questions are: 1) Why the nested while loops? 2) What is the timer code about and does it need some kind of "stop"?

comment:7 Changed 10 years ago by Steven Levine

The code you have found has problems. It looks like it was implemented while the programmer was not really paying attention. The nested loop is not needed and the timer needs to initialized with InitITimer(). That said while the code is hardly optimal, it's not going to break anything.

IdleIfNeeded() reduces the thread priority when an operation takes too long. This improves UI responsiveness. SleepIfNeeded() exists for the same reason. Which one is best for a given operation is a judgement call.

Version 0, edited 10 years ago by Steven Levine (next)

comment:8 Changed 10 years ago by Steven Levine

RemoveCnrItems() has been cleaned up with [1759]

comment:9 Changed 10 years ago by Gregg Young

RemoveCnrItems? was modified to use the inner loop again to avoid checking the timer on each loop after the thread is switched to idle. A second break which was missing was added. Its absence could in part explain a double free error. I can no longer reproduce this error.[1763]

Last edited 9 years ago by Gregg Young (previous) (diff)

Changed 10 years ago by John Small

Attachment: filldir.snip2 added

Code snippet of pci-tracing debug code

Changed 10 years ago by John Small

Attachment: 0404bn.log added

Log showing double free and its cause

comment:10 Changed 10 years ago by John Small

I have uploaded a log of FM/2 that seems to show 1) that the double free error is real 2) the cause of the double free.

I have tried to annotate the file so that it can be read and interpreted more easily. Therefore I will only point out the highlights:

Lines 688-696 show a container record being deleted (as a result of direct editing).

Lines 734-789 show RemoveCnrItems deleting all the records from that same container (because FM/2 is closing down). RemoveCnrItems uses preccNextRecord to iterate through the records in the container. As it does so it encounters the previously deleted record and tries to free the "extra bytes" pointers which had already been freed in lines 638-696 (hence the double free).

There is no evidence in the log to suggest that the deleted pci had been reallocated. Nor was any action taken which would cause another pci to be allocated. (FM/2 was closed immediately after the direct edit.)

In addition to the double free, RemoveCnrItems stops iterating through the records in the container after encountering the deleted record, even though there are more records in the container, because the deleted record's preccNextRecord pointer has been set to zero. So the "extra byte" pointers, like pszFileName, are not freed for all the remaining records in the container (a minor memory leak).

Also, when RemoveCnrItems is changed to use CM_QUERYRECORD with CMA_NEXT to iterate through the records in the container, the deleted record is not re-encountered, there is no double free and the list is fully traversed.

This all seems to suggest that any place where FM/2 iterates through the records in a container and it is important to access all the records and/or important not to encounter deleted records, that we choose between: a) Find a way to ensure that the preccNextRecord pointers are accurate; or b) Stop using preccNextRecord and use CM_QUERYRECORD with CMA_NEXT instead. c) Other?

comment:11 Changed 9 years ago by Gregg Young

Milestone: Release_3.23Release_3.24

Ticket retargeted after milestone closed

comment:12 Changed 9 years ago by John Small

Owner: set to John Small
Status: newassigned

comment:13 Changed 9 years ago by John Small

What I found was that the direct edit code would first delete the record being edited and then later delete and refill the entire container. When deleting the entire list of records RemoveContainerItems() used preccNextRecord to traverse through the list of container records. The problem was that as it traversed the list it would encounter the single record which had been deleted earlier!? This caused two errors: 1) The second attempt to free that record (i.e. the double free); and 2) The traversal would stop at that record (because its preccNextRecord was null) and not free the remaining items from the list (i.e. a memory leak).

Since I did not know how to guarantee that the preccNextRecord which pointed to the previously deleted record was relinked to the record following the deleted one, I used a simple (but maybe not ideal) solution: Switch from using preccNextRecord in RemoveContainerItems() to sending CM_QUERYRECORD + CMA_NEXT messages to traverse the list. This fixes both errors.

Changed 9 years ago by John Small

Attachment: 0820attachment.log added

Edited, annotated log showing relink error

comment:14 Changed 9 years ago by Gregg Young

Milestone: Release_3.24Release_3.25

comment:15 Changed 4 years ago by Steven Levine

Owner: changed from John Small to Steven Levine
Status: assignednew

comment:16 Changed 4 years ago by Steven Levine

Resolution: fixed
Status: newclosed

Changesets 1890 and 1891 should resolve these errors.

Note: See TracTickets for help on using tickets.