close
Comments you submit will be routed for moderation. If you have an account, please log in first.
Modify

Opened 5 years ago

Closed 4 years ago

#133 closed task (wontfix)

Integrate an automatic index backup mechanism

Reported by: damato Owned by:
Priority: normal Milestone: YAM 2.7
Component: MIME handling Version:
Severity: major Keywords:
Cc: OS Platform:
Blocked By: Blocking:
Release Notes:

Description (last modified by tboeckel)

It might be worth to implement a kinda automatic index backup mechanism. This mechanism should e.g. automatically rename an index file right before regenerating an index or even before every write operation of an index file. (e.g. rename it to .index.bakX where X stands for a counter).

Then if an operation failed or if an index rebuild was aborted by the user the latest backup can be restored and the old index being used from now on.

However, this mechanism would probably require a new index version as an index might have to contain a variable where the total amount of mails and a kind of checksum is stored, hence the scheduled task for 2.7+

Attachments (0)

Change History (5)

comment:1 Changed 5 years ago by damato

  • Status changed from new to accepted

comment:2 Changed 4 years ago by tboeckel

  • Description modified (diff)

Having read this ticket now by a hundred times I really wonder how that could be accomplished. The task is easy: rename the existing index file before creating a new one. If loading an index file is not possible fall back to the backed up one.

The big issue I see here is the consistency of the backed up data. How can we make sure the backed up index really still represents the current state?

Imagine the following: the current index is backed up, because some mails were deleted from a folder. Before YAM is able to save the new index the machine crashes or is switched off or etc, but the deleted mails don't exist any more. The next time YAM is started it will use the backed up index because the normal index doesn't exist. But the backed up index still contains the formerly deleted mails, although the respective files don't exist anymore and accessing these "phantom" mails will generate further errors.

In this situation only a complete rescan will yield the correct information about the still existing mail files. And this is exactly what we already have right now. Or do I miss something here?

One solution would be to make sure that the most recent backed up index file really represents the current state. But that would require to save the index after each modification. And for folders with lots of mails this must be considered a very expensive approach in terms of CPU load and unresponsiveness of the application. Furthermore this approach would make a backed up index absolutely useless as there is nothing to backup.

Even the proposed checksum does not help very much from my point of view. To be able to compare the saved checksum against a checksum of the current state again requires all mails to be scanned to generate the checksum. So where is the advantage here? Ok, the checksum could be a solution if only such information is taken into account which can be obtained without having to deeply scan each file, i.e. compute the CRC just from the file's name and size which can be obtained quite fast with a simple directory scan.

Ideas are welcome!

comment:3 Changed 4 years ago by tboeckel

I performed some tests in the mean time with a folder containing approx 45000 mails. A complete directory scan including some CRC calculation takes about one second on my MicroA1 running OS4.1. I'd say this is a very good base for further development. Especially if one keeps in mind that most people's folders will contain much less than that amount of mails in the average case.

In any case this approach is fast enough to check whether an index file is outdated or not.

comment:4 Changed 4 years ago by tboeckel

Although I was quite optimistic when I did my test I must say I have come to the conclusion that an automatic index back mechanism cannot work.

The reason is quite simple. When YAM "expires" a folder's index it deletes the corresponding .index file. The backup mechanism would save this file to be able to restore it later in case the new .index file is found invalid/corrupted/missing. But here is the pitfall. When YAM has to expire an index this happens because the information in the saved index file is no longer valid, because a mail file has been renamed (note: we store the mail status in the mail's file name). But this also means that restoring a backed up index file will not be successful either, because the old index file will contain the old mail file name and hence inconsistent information would be restored, which is even worse.

Thus, from my point of view the suggested backup mechanism cannot work. At least I don't know how to implement it and make it work in such a way that rescans will be avoided.

Does anybody have a better idea?

comment:5 Changed 4 years ago by damato

  • Resolution set to wontfix
  • Status changed from accepted to closed

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.

This list contains all users that will be notified about changes made to this ticket.

These roles will be notified: Reporter, Owner, Subscriber

  • Jens Maus(Reporter, Participant)