Comay In-Drive-UPS Technology

Why In-Drive-UPS?


Enterprise, industrial, embedded, avaition, aerospace storage system power supplies are designed according to the highest reliability standards. Designers are careful in selecting the most reliable regulation components. However, the reality is that power to data storage drives does occasionally fail.

Since SSDs mentioned above maintain mission‐critical data, it is unacceptable for previously stored data to be lost or for data in flight to nonvolatile storage to be corrupted.

It is vital that these SSDs are designed to survive power reductions and outages without risk to the data itself.

An effective power failure data protection mechanism needs to function before and after a disruptive power failure in order to
provide comprehensive data protection.

During a clean shutdown, most host systems initiate a command (The STANDBY IMMEDIATE Command) to an SSD
to give it time to prepare for the shutdown. This allows the SSD to save data currently in transition (in temporary
buffers), LBA mapping information, and/or Meta data to the non-volatile NAND media.

 

In-drive-UPS


In event of an unsafe power failure, the SSD abruptly loses power before the host system can initiate the STANDBY
IMMEDIATE Command.

Power loss events range from momentary loss of regulation (transient brown‐out condition) to loss of all power for an extended period of time. Such events can be caused by failure of the facility’s supply grid or UPS
unit, failure of the system’s power supply (including fusing and cabling), failure of the SSD’s voltage regulation components, or mechanical failure of PCBs or connectors due to vibration, heat, or impact. Power failure risk at the SSD level depends only partly on the power delivery redundancy measures in place; power failure can cause system latency (when the drive needs to rebuild mapping tables) or permanent data loss.

 

Why SSD need In-Drive-UPS(Data Loss in Power Failure Scenarios)?


UPS-Comay

When programming a NAND flash page, the program operation must complete to ensure the data is stored reliably within the page. Data is at risk if flash memory cells are in the process of being programmed when power to the drive is lost. The risk is compounded for MLC NAND flash memory, which uses the same physical page of memory cells to store two logical pages of data. When power is lost during program operation of the upper page, valid data already stored in the lower page cells can be damaged. This is typically referred to as lower‐page data corruption.

Solid state drives have three areas of potential data loss or corruption when system power fails:

  • Loss of transit/temporary buffer data: This can occur due to the implementation of write caching (also called “write back” or “write behind”) to achieve peak performance. In this case, the host system is informed that a write operation has completed when in fact it is still in process. If power fails while the controller is “catching up” with the  write operation, the data in the write buffer is not yet hardened and can be lost. When the data is requested later by the host, the controller can either report the data irrecoverable or (depending on the controller design) it can deliver a previous “stale” version of those sectors to the host. In the latter case, this translates to silent data corruption, since the host system is not informed that the data delivered is incorrect.
  • Loss of mapping information/Meta data information: Every SSD controller uses mapping information to translate from the host’s logical LBA addresses to physical flash memory locations. Mapping information must be created and maintained if the data is to be later retrieved from the drive, and must be updated whenever new data is written to a previously written LBA. If the mapping information is lost when power fails, the drive may show data corruption, deliver stale (corrupted) data or may not be capable of supporting logical I/O on the next power up.
  • Lower page corruption: MLC or E‐MLC NAND flash uses each physical page to store the data of two logical

pages; each memory cell represents two bits. The lower page (the logical page addressed by the lower of the two addresses) is programmed first, followed by the upper page. When programming the upper page, programming voltages are applied to the same cells already storing valid data in the lower page. If power fails while the upper page is being programmed, data in that page is lost, and already‐stored data in the lower page is corrupted as well. When that lower page data is requested later by the host, the SSD will report the data irrecoverable.

 

 

 

What kind of technology used by Comay SSD?


  • 1. Avoid external DRAM cache
  • External DRAM cache can improve performance, endurance, etc. However, it is also a trouble-maker due to severe risk of data loss upon power loss.
  • CoreRise avoid external DRAM cache to eliminate such problems to ensure the safety of data and system.
  • 2. High capacity supercapacitor
  • To enterprise SSD, industrial SSD/embedded SSD and some other SSD, the performance isimportant. However, the data safety is critical.
  • Comay SSD utilizes Supercap solution for some of these applications.

The power stored in Supercap is thousands more than general capacitors. It can supply 300~1000 ms for SSD to flush data to Nand.

    3. Bank of discrete array of capacitors

    This approach requires more design expertise, but overcomes the supercapacitor limitations. A discrete capacitor‐based voltage hold‐up circuit employs a bank of discrete capacitors connected in parallel, shown in Figure on the right side. This is the approach employed by another kinds of SSDs.