Update on Dec. 10, 2011
In addition to the existing locations, we signed a contract with a new SAS 70 certified data center, a modern hosting facility located in San Jose with 40% of the world’s internet traffic passing through the building. This is an impressive figure, which in itself speaks volumes about the level of facility security. We’re smoothly migrating most of our capacity to this new location. As before, your data is now continuously backed up, but now in the best data center available.
Season’s greetings and warmest wishes for the success of all your endeavors in the coming year!
The past couple of days have been very painful for our team. As some of you may have noticed, there were intermittent network connectivity issues. On behalf of our entire team, we bring sincere apologies for the inconveniences and work interruptions this may have caused.
We continuously put our time and our hearts into improving our product, customer service and reliability, so those minutes when the network had problems felt like someone was slowly pulling our nails. We strive to maintain 99.99% uptime, and we strive to make sure that the rest 00.01% comes at the weekend nights with proper announcements. This was a “Black Swan” event for us - the first in five years.
Those of you who follow our news stream on Facebook could see real-time updates on the status of the outage, but we thought it's really important to give a detailed report here as well.
- Our data center internet connectivity had been intermittently unavailable for some clients during short periods of time. During these connectivity problems, our own servers and network were up and running, and our infrastructure was absolutely healthy, but some of the customers couldn't access it.
- The source of the issues was a massive DDoS attack on another company which happened to be another customer of our datacenter. If you're not familiar with DDoS, you can read about the havoc it carries here. The datacenter's staff was trying their best to cope with the attack, but it wasn't enough. When the datacenter shut down the target, the attacks stopped. The datacenter then tightened their defenses, brought that unlucky client back up, and the story repeated. Thus the datacenter had multiple incidents. Today along with other measures, the datacenter completely shut down that client, so that attacks shouldn't reappear. This should solve the problem. We are migrating off of that data center in any case (see below), so this is just to explain what happened.
- On top of the measures taken by the datacenter, our own operations team was working on other solutions to the problem 24hrs a day. We can't fix the internet provider, but we can move from one to another. The problem with migration is potential downtime. Imagine physically moving an office. It's very hard to move all of the computers while the employees are working, all without interrupting the work for a minute. We can't simply start a parallel instance of the database either, because then the data on different instances might get out of sync. We obviously did not want to introduce any down-time from our side, so our operations team is working hard 24 hrs a day now on a new sophisticated infrastructure that will allow future migration in an uninterrupted way. We are also doing a feasibility study for a very complicated replication scheme, where there will always be an isolated and at the same time up to date read-only instance of Wrike, located in a secondary data-center. So that if the primary datacenter is inaccessible (which hopefully will never happen again) due to say, an electrical blackout of the whole coast, the secondary data-center will still be accessible.
We absolutely understand that such performance disruptions are unacceptable. Such a downtime has happened to Wrike for the first time in five years. Google, Microsoft, Salesforce and Amazon have all had to deal with similar problems. We are taking it very seriously and making every possible effort to make sure that the migration to a new datacenter is smooth and unnoticeable to you. Reliability of the service and your trust are key priorities for us.
If you experience any difficulties with Wrike, please contact email@example.com. If it's a broader issue, we post updates in the real time at our Facebook page. Please, take our sincere apologies for the inconvenience. Many thanks for your understanding and continued patience during this situation.