Breaking Down the “VSAN Nightmare” Situation
Just recently, a Reddit post became viral among the storage and virtualization community, as it documented a painful outage that caused a VMware VSAN implementation to run amok after a rebuild operation was triggered.
A FUD-less Approach
The post itself was well well laid out, and clearly showed the steps that were followed as the administrator of the system followed the necessary steps to add a VSAN storage node into an existing cluster. At 1 hour past the change, things went sideways, and hours of outages and performance degradation ensued.
The post was done in a way to just highlight a potential issue. Of course, this is also potential fodder for VSAN opponents who could easily have used this as fodder to create a negative buzz around the product. I applaud the author for an honest article despite the challenges that were faced that could have easily triggered a negative post.
Read the original Reddit post “My VSAN Nightmare” followed by the Reddit post “Root cause analysis” here followed by a very well written post by Marcel van den Berg on the difference between being on the HCL, and being tuned for use as per recommendations. Nice work Marcel!