From: Artem Bityutskiy Date: Thu, 20 Oct 2011 13:53:10 +0000 (+0300) Subject: UBIFS: describe the unstable bit issue X-Git-Url: https://www.infradead.org/git/?a=commitdiff_plain;h=85a778192c77c540271a47e1d593f3cc9722e033;p=mtd-www.git UBIFS: describe the unstable bit issue Signed-off-by: Artem Bityutskiy --- diff --git a/doc/ubifs.xml b/doc/ubifs.xml index 05351ea..9fcfe44 100644 --- a/doc/ubifs.xml +++ b/doc/ubifs.xml @@ -16,6 +16,7 @@
  • Overview
  • Power-cuts tolerance
  • UBIFS and MLC NAND flash
  • +
  • The unstable bits issue
  • Source code
  • Mailing list
  • User-space tools
  • @@ -298,10 +299,136 @@ some specific aspects of MLC NAND flashes:

    emulation, then use the integck test for testing. After all the issues are fixed, a real power-cut tests could be carried out.

    + +
  • [NEED WORK] The "unstable bits issue", which is not + MLC-specific, described + here.
  • +

    The unstable bits issue

    + +

    In the MTD community the "unstable bits" term is used to describe data +instabilities caused by power cuts while writing ore erasing. The unstable bits +issue is still not resolved in UBI and UBIFS, and it was reported several times +in the MTD mailing list. In theory, this issue should be visible in any flash, +but for some reason back at the times when we developed UBI/UBIFS and +extensively tested them on a robust SLC NAND, we did not observe it. No one +reported about this issue for NOR flash yet. However, on modern SLC and MLC +flashes this problem is reproducible.

    + +

    The unstable bits are the result of a power cut during the program or erase +operation. Depending on when the power cut has happened, they can corrupt the +data or the free space. Consider the following 4 situations:

    + +
      +
    1. The power cut happens just before the NAND page program operation + finishes. After the reboot the page may be read correctly and without + a single bit-flip say, 2 times, and the 3rd time you may get an ECC + error. This happens because the page contain a number of unstable bits + which are sometimes read correctly and sometimes not.
    2. + +
    3. The power cut happens just after the NAND page program operation + starts. After the reboot the page may be read correctly (return all + 0xFFs) most of the time, but sometimes you may get some bits set to + zero. Moreover, if you then program this page, it also may be sometimes + read correctly, but sometimes return ECC error. The reason is again the + unstable bits in the NAND page.
    4. + +
    5. The power cut happens just before the eraseblock erase operation + finishes. After the reboot the eraseblock may contain unstable bits and + the data in this eraseblock may suddenly become corrupted.
    6. + +
    7. The power cut happens just after the eraseblock erase operation + starts. After the reboot the eraseblock may contain unstable bits and + sometimes return zero bits on read, or corrupted data if you program + it.
    8. +
    + +

    Here is an example scenario how UBIFS may fail. UBIFS writes data node A to +the journal LEB, and a power cut of type 1 happens. After the reboot, UBIFS +recovery code reads that LEB, no bit-flips are reported by MTD, all the CRCs +match, everything looks fine. UBIFS just assume that this LEB is all-right and +the free space at the end of this LEB can be used for writing more data. UBIFS +performs the commit operations, writes more user data, and everything works +fine until the user reads node A by reading the corresponding file: an ECC +error happens and the user gets the EIO error.

    + +

    The EIO may be what the user gets instead of his/her data also +if a type 2 power cut happens, and UBIFS re-uses the corrupted free space for +writing new nodes, and then these nodes are read.

    + +

    The solution is to teach UBIFS to erase-cycle any LEB which could potentially +be written to when the power cut happened. This is not only about the +journal LEBs, but also LPT, log, master and orphan LEBs. This means that the +valid data from this LEB has to be read (and only once!) and then it should be +written back to this LEB using the +atomic LEB change UBI operation. +This has to be done even if the LEB look all-right - no corruptions, all 0xFFs +at the end.

    + +

    Similarly, UBI has to erase-cycle every eraseblock which could potentially be +erased when the power cut happened.

    + +

    The other requirement is that during the recovery UBI/UBIFS should read data +from the media only once. This is easy to demonstrate on the delayed recovery +example. The delayed recovery happens when after a power cut the file-system is +mounted R/O, in which case UBIFS must not write anything to the flash, and the +real recovery is delayed until the FS is re-mounted R/W. Currently UBIFS just +scans the journal during mounting R/O, drops (or "remembers") corrupted nodes, +and "does not let" users to read them. But there is no guarantee that UBIFS +spots all the corrupted nodes during the first scanning, so users may get +EIO while reading data from the R/O-mounted FS.

    + +

    When UBIFS is then remounted R/W, it actually drops the corrupted nodes from +the flash media by erase-cycling the corresponding LEBs. And UBIFS re-reads +all the LEB data again. And there is no guarantee that UBIFS will get the same +corruptions again.

    + +

    So it is important to make sure that the corrupted LEBs are read only once. +E.g., we can cache the results of the first scanning, and then use that data +when running the delayed recovery, instead of re-reading the data. Probably we +may remember only the last NAND page containing valid nodes, not whole LEB, +since for the journal only unstable bits of type 1 and 2 are relevant.

    + +

    There are similar double-read issues in UBI scanning - when it finds 2 PEBs +belonging to the same LEB and it has to find out which one is newer. The volume +table has to be erase-cycled as well in UBI.

    + +

    There are more issues related to unstable bits of type 2 and 3 in UBI, I +think. This all needs a very careful look, and this is not trivial to fix +because of the complexity: UBIFS as any file-system has many interfaces and a +lot of states. The best strategy to attack this problem would be:

    + +
      +
    1. Improve the existing power cut emulation infrastructure in UBIFS + and start emulating unstable bits. Start with emulating only one type + of unstable bits, e.g., type 1.
    2. + +
    3. Use the integck test to stress the file-system with + power cut emulation enabled - the test can re-start when an emulated + power cut happens. This will allow you to very quickly emulate hundreds + of power cuts in interesting places. Fix all the bugs. Make sure it is + rock solid. Of course, if you have various independent issues, you may + temporary hack the power cut emulation code to emulate unstable bits + only at certain places, to temporarily limit the amount of problems you + have to simultaneously deal with.
    4. + +
    5. Start emulating other types of unstable bits, and fix all the + issues one-by-one.
    6. + +
    7. Go down to UBI and add a similar power cut emulation + infrastructure. But emulate unstable bits only in UBI-specific on-flash + data structures - the EC/VID headers and the volume table. Improve the + integck test to support that infrastructure and fix all the + issues.
    8. + +
    9. Run real power cut tests on real hardware.
    10. +
    + + +

    Source code

    UBIFS is in mainline since 17 July 2008 and the first kernel release which