From 9f15d480aff5ae3365bab9c801a9e530f951f068 Mon Sep 17 00:00:00 2001 From: "Matthew L. Creech" Date: Wed, 27 Jul 2011 00:32:56 -0400 Subject: [PATCH] UBIFS FAQ: clarify NAND "disturb" errors Signed-off-by: Matthew L. Creech Signed-off-by: Artem Bityutskiy --- faq/ubifs.xml | 78 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 50 insertions(+), 28 deletions(-) diff --git a/faq/ubifs.xml b/faq/ubifs.xml index 82942bb..b6b0b14 100644 --- a/faq/ubifs.xml +++ b/faq/ubifs.xml @@ -68,8 +68,10 @@ some specific aspects of MLC NAND flashes:

  • MLC NAND flashes are more "faulty" than SLC, so they use stronger - ECC codes which occupy whole OOB area; this is not a problem - for UBI/UBIFS, because neither UBIFS nor UBI use OOB area;
  • + ECC codes; these ECC codes often occupy whole OOB area (as do the + ECC codes on some newer SLC flashes, which are more error-prone than + previous generations of flash); this is not a problem for UBI/UBIFS, + because neither UBIFS nor UBI use OOB area;
  • when the data are written to an eraseblock, they have to be written sequentially, from the beginning of the eraseblock to the end of it; @@ -81,22 +83,54 @@ some specific aspects of MLC NAND flashes:

    deterministic wear-leveling algorithm (see this section);
  • -
  • MLC flashes have so called "read-disturb" property, which means - that NAND page read operation may introduce a permanent bit change; the - ECC code would fix it, but more read operations may introduce more bit - changes and soft ECC errors may turn into hard ECC errors; well, even - SLC NAND flashes have this property, but the probability of bit changes - is much lower in SLC NAND; however, this should not be a problem - because UBI is doing scrubbing; in other words, once UBI notices that - there is a correctable bit-flip in an eraseblock, it moves the contents - of this physical eraseblock to a different physical eraseblocks, and - re-maps corresponding logical eraseblocks to the new physical - eraseblock; so UBI refreshes the data and gets rid of bit-flips, thus - improving data integrity.
  • +
  • MLC flashes exhibit bit flips as a result of "program disturb" and + "read disturb" errors (see + here). + Note that SLC flashes have these same errors, but they are much more + common on MLC: +
      +
    • NAND flashes have a so called "read-disturb" property, which + means that a NAND page read operation may introduce a permanent + bit change; the ECC code would fix it, but more read operations + may introduce more bit changes and soft ECC errors may turn + into hard ECC errors; however, when these errors occur on the + same page that is being read, this should not be a problem + because UBI is doing scrubbing; in other words, once UBI notices + that there is a correctable bit-flip in an eraseblock, it moves + the contents of this physical eraseblock to a different physical + eraseblock, and re-maps the corresponding logical eraseblock to + the new physical eraseblock; so UBI refreshes the data and gets + rid of bit-flips, thus improving data integrity.
    • + +
    • "Read-disturb" errors can also occur on a page other + that the one being read, but which is within the same + eraseblock. This is not a problem if the read operations are + spread around somewhat evenly within the eraseblock, since the + bit-flip will soon be detected and corrected through the + "scrubbing" process described above. However if a particular + page within a block is rarely read, scrubbing will not have a + chance to fix errors, and they may accumulate over time until + they are unfixable. This is very similar the next problem:
    • + +
    • NAND flashes also have a "program-disturb" property, + which means that if you program a NAND page, you may introduce + a bit-flip in a different NAND page. The bit change can be + fixed by ECC, but with time the changes may accumulate + and become unfixable. Current UBI bit-flip handling only + partially helps here, because it is passive, which means that + UBI notices bit-flips only when performing users' read requests. + So if you never read the NAND page which accumulates bit-flips, + UBI will never notice this. One solution to these problems is + to implement a kind of "flash crawler" which would read all of + the NAND pages in the background from time to time, making UBI + notice and fix bit-flips. However, this is not implemented + today. +
    • +
-

However, there are 2 other aspects which may need closer attention. The -first one is the "paired pages" problem (e.g., see +

There is another aspect of MLC flashes which may need closer attention: the +"paired pages" problem (e.g., see this Power Point presentation). Namely, MLC NAND pages are coupled in a sense that if you cut power while writing to a page, you corrupt not only this page, @@ -107,18 +141,6 @@ distances). So if you write data to, say, page 3 and cut the power, you may end up with corrupted data in page 0. UBIFS is not ready to handle this problem at the moment and this needs some work.

-

The second aspect is the "program-disturb" MLC NAND property (see -here), -which means that if you program an MLC NAND page, you may introduce a bit-change -in a different NAND page. Well, the bit change will be fixed by ECC, but with time -the changes may accumulate and become unfixable. Current UBI bit-flip handling -only partially helps here, because it is passive, which means that UBI notices -bit-flips only when performing users read requests, so if you never read the -MLC NAND area which accumulates bit-flips, UBI will never notice this. However, -it is not difficult to implement a kind of "flash crawler" which would read the -flash in background from time to time and make UBI notice and fix -bit-flips.

-

Nevertheless, UBIFS authors never worked with real raw MLC NAND flash, so we might have missed or misinterpreted some MLC NAND aspects. Any feed-back is appreciated.

-- 2.49.0