Yesterday we received notice through a security newsletter that there was a
new OpenVZ kernel available that fixed a severe security hole that could
potentially impact our clients and their data. We immediately updated to the
latest kernel on our development nodes and ran some tests to ensure the update
would be compatible with Wyvern and to ensure if there were any issues we could
resolve them with minimal downtime on our production nodes. Testing passed
without any issues or concerns so we updated the kernel on all of our production
nodes and rebooted them around 4PM EDT yesterday and all of the nodes rebooted
fine except our Tampa OpenVZ node (fl1ovz01) which experienced problems prior to
the kernel update.
All nodes (excluding fl1ovz01) were back online within 10 minutes of the
reboot with only a small number of VPSs requiring manual intervention to get
back online. We then noticed that quite a few iptables modules that some of our
clients utilize were not enabled so we needed to recycle the OpenVZ and iptables
service to get them working resulting in a less than 1 minute outage once all of
the VPSs were started.
Unfortunately fl1ovz01 required multiple reboots and a manual reboot of
each VPS on the node to resolve some outstanding issues once the kernel was
updated so fl1ovz01 was unavailable for some VPSs until around 6:30PM EDT (not
all VPSs were offline until 6:30PM). The main problem we experienced on fl1ovz01
was the checkpoint system that OpenVZ utilizes so that instead of shutting off a
VPS when the node is being rebooted, it suspends the VPS which allows the VPS to
come back online faster and with little interruption of running services on that
VPS which is why we needed to manually stop and start each VPS to ensure a clean
reboot of the node and prevent data corruption of the VPSs on that node.
At this time all of your VPSs should be online and all of the nodes are
stable with the latest OpenVZ kernel. We do ask you to please login to your VPSs
and Wyvern to ensure everything is functioning properly. We have found that some
VPSs are showing as suspended/disabled when they are in fact running so please
check the status of your VPS in Wyvern and open a ticket with our Support
department if you see anything out of the ordinary.
We would also like to point out that in addition to the announcement that
was posted while the nodes were being updated, we also posted updates to the
situation including the fl1ovz01 issues on our Twitter so in the future, please
be sure to check our Twitter for any communications as it's the fastest method
for us to post updates and allows us to converse on there if needed.
This incident has us researching the ploop storage method that OpenVZ has
been pushing towards recently which would have prevented the critical exploit
from impacting our clients as well as preventing the checkpoint system problem
that occurred on fl1ovz01 (as well as many other improvements to performance,
functionality, security, and adding more features to the client-side of things).
We have already converted our development nodes to ploop and the results have
been extremely positive. For example, a simple DD test went from around 260MB/s
to 418MB/s without any other changes to the node or VPS. We also like the idea
of being able to add a snapshot feature to Wyvern since it's been requested in
the past.
Thanks for your understanding in the matter, the security of your data is a top priority for us.
Thanks for your understanding in the matter, the security of your data is a top priority for us.
-The Secure Dragon Staff-
Secure Dragon LLC.
www.SecureDragon.net
Secure Dragon LLC.
www.SecureDragon.net
0 comments:
Posting Komentar