Write caching

MattBoB · Tue Jun 16, 2009 5:41 pm

Does the current version of Starwind do write caching when RAM caching function is turned on. i.e. would there be a risk of data loss in event of UPS/power failure?

If so is it recomeded to turn on RAM caching for all work loads?

Robert (staff) · Wed Jun 17, 2009 12:46 pm

This will be implemented in later releases of StarWind (Mid-August according to our roadmap, but the real implementation and release timeframes may take longer). We are also planning to implement better communication between StarWind and power supplies (UPS) to prevent any possible data loss.

Aitor_Ibarra · Wed Jun 24, 2009 12:03 pm

I'd like to see how you handle write cacheing in combination with HA.

Basically, I would want a way to force both synchronous replication and cache mirroring. The overall objective is that if an initiator issues a write command, it's not notified that the write is committed until both starwind boxes have it their write cache. Write cacheing is useless if it increas the risk of data corruption, which would be possible if its combined with mirroring in the wrong way!

I guess it should work something like this:

Initiator issues a write to the iSCSI target via it's primary path as defined by MPIO

Starwind gets the write data and sticks it in RAM. It communicates with the other Starwind server, which then also has the data in cache.

When both servers agree that they both have the same data in cache, the write is acknowledged to the initiator.

Independently of each other, the starwind servers write the cached data to disk. Once both servers have written the data successfully, the cached data is marked as safe to delete from RAM, but isn't deleted until either a new write of the same area of disk comes in, or space is needed for other writes. That way if a read request comes in, it can be served from cache without having to come from disk.

As for UPS, that and other conditions (e.g. a shut down command) should automatically disable cacheing on the affected node, which should concentrate on writing the contents of cache to disk. Affected iSCSI targets on the other node should also have cacheing turned off, and replication disabled, until the failing node has shut down completely. There should be a way of limiting the size of the cache so that you can make sure that the UPS will give you enough run-time to write the cache to disk. It might be a good idea for the still running node to do a snapshot when it gets notification that the other node is going to shut down, so that the changes can be resynced more quickly when the node comes back up again.

cheers,

Aitor

Robert (staff) · Thu Jun 25, 2009 12:08 pm

Aitor,

Thank you for your input - we really appreciate it.

Right now HA feature is being actively developed and evaluated. There will be a beta available soon.

As for caching in HA, speaking optimistically, this would really be a great functionality. With caching option implemented in HA and not impacting the productivity by RAM communication can be a challenge for us, however, if we make this happen - we will stand invinsible

Thanks

Aitor_Ibarra · Thu Jun 25, 2009 12:49 pm

At least you can be encouraged by the fact that it's been done before. On Windows. Maybe not for an iSCSI target but...

Many moons ago I had the misfortune to have to buy an EMC CX300 SAN for my company (long story, won't go into that). This was a fiber channel drive array with dual controllers, each with 2GB of mirrored cache. The unit was totally redundant - dual ported drives, dual controllers, dual power supplies, and dual ups for the cache. I was very surprised to see Windows XP OEM stickers on the controller modules. Basically EMC had used a custom PC as a controller, and built their own software on top of XP. I don't know what they used as a communication channel between the two controllers to keep the caches in sync, but wouldn't be surprised if it was ethernet, or possibly another fiber channel interface. The SAN performed well, but EMC's pricing and licensing policies were/are ridiculous.

Anyway, it's definitely possible to do, reliably, on top of Windows. And it will be a huge win on top of HA. Modern server motherboards come with SAS / SAS2 interfaces built in and can take 96GB+ of RAM. It would be impossible to get that much write cache into a storage server using RAID cards (which currently max out at 4GB AFAIK). I don't mind having to buy twice the number of servers and disks to achieve no-single-point-of-failure HA, especially when the mark-up that the likes of EMC put on drives ("the spinny things" as one EMC rep referred to the stuff which kept him in BMWs) is way more than 100%.

Robert (staff) · Fri Jun 26, 2009 11:25 am

With 96 GB of RAM one will be able to create RAM drive devices that will be extremely fast. This is however, not a too cost effective solution, but anyway...

Thanks