Opened 13 years ago

Closed 12 years ago

#43 closed enhancement (fixed)

Lucide selection of internal plug-ins

Reported by: StuUpdike@… Owned by: eros2
Priority: major Milestone:
Component: Lucide Core Version: 1.0 Beta 1.1
Keywords: Cc:

Description

I have noted that Lucide cannot select the correct internal plug-in (gives error message) when my browser hands it a file without the expected extension. For example, one of my employer's web sites hands off a pdf file with the name showdocument.cgi. I have my browser set to use Lucide as a helper application when either pdf or cgi is the extension. This is the same setup which I used with Acrobat 3.0. Apparently Lucide relies solely upon the extension for plug-in selection. However,if Lucide were to look inside the file when the extension is missing, incorrect, or not understood, it would find that PDF files all begin with something like "%PDF-1.4" and all DjVU files begin with something like "AT&TFORM". These phrases inside the file could then be used to select the proper plug-in.

Thank you,

Stu Updike Bedford, Texas USA

Change History (8)

comment:1 Changed 13 years ago by eros2

  • Type changed from defect to enhancement

comment:2 Changed 12 years ago by eros2

  • Resolution set to fixed
  • Status changed from new to closed

(In [215]) - If the filename does not have an extension or the extension is unknown, it will check file data to determine a suitable plugin (closes #43)

comment:3 follow-up: Changed 12 years ago by guest

  • Resolution fixed deleted
  • Status changed from closed to reopened

Hi,

just had a look at how this is resolved. I found that a plugin can return magic numbers for file type detection via getCheckStruct() and the core checks the file content.

This might be sufficient for simple file formats but could cause problems with file formats that have to be checked at different locations in the file (e.g. reading from back rather than front) or even need more complex detection schemes. Also when plugins want to use 3rd party libraries they usually don't want to care about their internal mechanism to detect the correct file type (e.g. using libpng, libtiff, ...). Also the GBM library refers the detection to the relevant decoder.

Wouldn't it be better to transfer the task of detecting the file type to the plugins also in Lucide (rule: separation of concern)? How about an isFileSupported(filename)/loadFile(filename) couple?

BTW, how affects the implementation change existing plugins (e.g. GBM)?

The GBM plugin can already now detect the file type automatically independent of the file extension in loadFile(). It is only a matter of passing the filename to it. As the detection mechanism can require multiple steps (in worst case full decoding), it would be difficult to implement getCheckStruct() while isFileSupported(filename) would be relatively easy to do.

Heiko

BTW, if you modify the plugin API, would you please also update plugin toolkit available from the Wiki page? Thanks.

comment:4 in reply to: ↑ 3 ; follow-up: Changed 12 years ago by eros2

Replying to Heiko:

Hi,

This might be sufficient for simple file formats but could cause problems with file formats that have to be checked at different locations in the file (e.g. reading from back rather than front) or even need more complex detection schemes. Also when plugins want to use 3rd party libraries they usually don't want to care about their internal mechanism to detect the correct file type (e.g. using libpng, libtiff, ...). Also the GBM library refers the detection to the relevant decoder.

You right, I also figured this out, (after commit :) and tonight will make structures more complex and improved.

Wouldn't it be better to transfer the task of detecting the file type to the plugins also in Lucide (rule: separation of concern)? How about an isFileSupported(filename)/loadFile(filename) couple?

I dislike this...
This means Lucide must create object of every supported type, call isFileSupported method and destroy object if file not supported.
However, isFileSupported can be optional exported function in dll.

BTW, how affects the implementation change existing plugins (e.g. GBM)?

Not affects, getCheckStruct() used only if it present, so it's optional.

BTW, if you modify the plugin API, would you please also update plugin toolkit available from the Wiki page? Thanks.

I not updated toolkit as getCheckStruct() and it's structures raw and unfinished.
I'll update toolkit when improvements will be finished.

comment:5 in reply to: ↑ 4 ; follow-up: Changed 12 years ago by guest

Replying to eros2:

Wouldn't it be better to transfer the task of detecting the file type to the plugins also in Lucide (rule: separation of concern)? How about an isFileSupported(filename)/loadFile(filename) couple?

I dislike this...
This means Lucide must create object of every supported type, call isFileSupported method and destroy object if file not supported.
However, isFileSupported can be optional exported function in dll.

Yeah, just like getSupportedExtensions().

But if the export is optional, you will never get rid of the issue for files on the internet. They often have no and sometimes the wrong file extension. That's what I figured out during writing the Mozilla GBM-Plugin. Some silly sites put images behind links that end with .jpg but are in reality TIFFs, text or whatever. So you can't count on the extension when you want to wrap Lucide into a Mozilla plugin later on. Only mime types work somehow reliable but also not always.

So if you can figure out the mime type of a file in the core (maybe there exists a lib for this), you could ask the plugins beside the extensions also for the supported mime types. That is how the Mozilla plugin interface seems to operate. If the browser knows about the mime type of the file (server provided or by file check), it uses it for the plugin lookup. If the mime type is not available, it tries to use the file extension (fallback). Mozilla plugins always register for mime types and extensions (can be also *). Just try about:plugins in Firefox or SeaMonkey?.

So I guess a reliable detection scheme could be:

  1. Try to detect the mime type of the file.
  2. If not found, try via the extension but also ask the plugin check the file.
  3. If step 2 fails, ask every plugin to check directly if it supports the file.

The first step could be used once you want to wrap Lucide into a Mozilla plugin. You will need the mime type of the Lucide plugins there because the Mozilla plugin API requires this info. Otherwise your plugin instance will never be created. At least that's my knowledge level at the moment, based on the experiences with the Mozilla GBM plugin. I dislike in the Mozilla plugin API that there is no way to really let the plugin detect whether it supports the file format. The plugin is always a slave that has no chance to workaround a bad browser file detection. If they would also do step 3 (see above), many empty areas on some sites would be correctly rendered because usually there is a plugin that could handle the file format but it is simply not asked to do so. Maybe this works as designed for security reasons.

I guess until you want to go the Mozilla plugin path (if ever), the last step covers most aspects. So I'd be fine with the isFileSupported() export.

BTW, the current Lucide GBM plugin file format detection implements the above steps 2 and 3. So it will find out the correct decoder even if the file has the wrong extension. So far it just has to be an extension that it has registered at Lucide due to the plugin lookup. So you can rename a tif file to gif and it will still show the image correctly.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 12 years ago by eros2

Replying to Heiko:

Well, as Lucide is not a browser plugin, it can't use mime types.

In my opinion, good sequence to find suitable plugin is:

  1. Check file content for signatures (optional, if getCheckStruct() exported)
  2. If previous step omitted or fails - Check extension
  3. If previous step fails - As a last resort use isFileSupported() (if exported)

I try minimize file access on detection stage, as it may take some time. It's not important if file on HDD, bit if file resides on CD or DVD - access may take some time.

I even prefer use step 2 as first step, but, as you told right before, file extension may be wrong. Not a problem if no extension, but to workaround wrong extension - we need check file content for signatures at first.

At second, check the extension, if previous step fails. This step will helps if file have broken header (but still displayabe) but correct extension.

At last, if previous steps fails, call isFileSupported(). As it may read at file end, such seeks will slow on CD/DVD, so we use this as a last resort.

comment:7 in reply to: ↑ 6 Changed 12 years ago by guest

Replying to eros2:

Well, as Lucide is not a browser plugin, it can't use mime types.

Somewhere I read an enhancement request about this. So I thought it might be important. If you don't plan for it, fine.

In my opinion, good sequence to find suitable plugin is:

  1. Check file content for signatures (optional, if getCheckStruct() exported)
  2. If previous step omitted or fails - Check extension

If this includes a trial load, fine. If not, what would fail?

  1. If previous step fails - As a last resort use isFileSupported() (if exported)

I try minimize file access on detection stage, as it may take some time. It's not important if file on HDD, bit if file resides on CD or DVD - access may take some time.

I even prefer use step 2 as first step, but, as you told right before, file extension may be wrong. Not a problem if no extension, but to workaround wrong extension - we need check file content for signatures at first.

Not necessarily, please see below.

I couldn't implement getCheckStruct() for GBM plugin because the required information is hidden in the format decoders and there is no simple way to get it from there. Most libs hide the detection to stay open for format changes, so I don't think that reverse engineering the magic format patterns and hard-coding them for getCheckStruct() is really a viable option. So for me the detection would start at proposed step 2 which does not mean other plugins couldn't make use of it.

Let me summarize whether I got your proposal right:

To minimize file accesses you optimize for the usual case:

  1. Optionally check magic patterns provided by getCheckStruct(). If not implemented by the plugin or no pattern matches, continue with step 2.

  2. Check the extension first because in more than 90% it will be correct (as today). Select a plugin based on it. Then simply try to load the file. The plugin will return with an error if the file format is not supported.

  3. In case step 2 fails you go to the fallback and lookup if any other plugin can read it via isFileSupported().

Correct?

If yes, I think you can also do step 2 before step 1. This would save an additional file access.

Heiko

comment:8 Changed 12 years ago by eros2

  • Resolution set to fixed
  • Status changed from reopened to closed

Improved in changeset [217].

Note: See TracTickets for help on using tickets.