MDD hosting

It seems that MDD Hosting has gone down, got a email from them saying "We are experiencing a major outage across all services at this time. We are aware of the issue and are working to restore services as quickly as possible.

We will provide more detail when we can, however, we are focused on restoring services and diverting all energy to those tasks presently."

teal.

More info on their server status page.

Hi Niko, it’s not looking to good the moment, but I don’t like to see what google has put up when you you try to access my page, it looks bad.

teal.


Don’t worry about that, it’s just google’s campaign against non-https sites. Your site just isn’t up yet with https (or without) :frowning: I see DougW’s site is down too.

I liked the email they sent me today telling me what happen, “” While I was hoping to save some of this for the official RFO [Reason For Outage] - enough people are getting tremendously upset over this that I’m going to spell out what I can now - keeping in mind that I will provide more details when I can.

What happened?

First and foremost - this failure is not something that we planned on or expected. A server administrator, the most experienced administrator we have, made a big mistake. During some routine maintenance where they were supposed to perform a file system trim they mistakenly performed a block discard.

What does this mean?

The server administrator essentially told our storage platform to drop all data rather than simply dropping data that had been marked as deleted by our servers.

Why is restoration taking so long?

Initially we believed that only the primary operating system partition of the servers was damaged - so we worked to bring new machines online to connect to our storage to bring accounts back online. Had our initial belief been correct - we’d have been back online in a few hours at most.

As it turns out our local data was corrupted beyond repair - to the point that we could not even mount the file systems to attempt data recovery.

Normally we would rely on snapshots in our storage platform - simply mounting a snapshot from prior to the incident and booting servers back up. It would have taken minutes - if maybe an hour. We are not sure as of yet, and will need to investigate, but snapshots were disabled. I wish I could tell you why - and I wish I knew why - but we don’t know yet and will have to look into it.

We are working to restore cPanel backups from our off-site backup server in Phoenix Arizona. While you would think the distance and connectivity was the issue - the real issue is the amount of I/O that backup server has available to it. While it is a robust server with 24 drives - it can only read so much data so fast. As these are high capacity spinning drives - they have limits on speed.

Our disaster recovery server is our last resort to restore client data and, as it stands, is the only copy we have remaining of all client data - except that which has already been restored which is back to being stored in triplicate.

What will you do to prevent this in the future?

We have, as we’ve been working on this and running into issues getting things back online quickly, discussing what changes we need to make to ensure that this both doesn’t happen again as well as that we can restore quicker in the future should the need arise. I will go into more detail about this once we are back online.

We are sorry - we don’t want you to be offline any more than you do.

Personally I’m not going to be getting any sleep until every customer affected by this is back online. I wish I could snap my fingers and have everybody back online or that I could go into the past and make a couple of minor changes that would have prevented this. I do wish, now that this has happened, that there was a quick and easy solution.

I understand you’re upset / mad / angry / frustrated. Believe me - I am sitting here listening to each and every one of you about how upset you are - I know you’re upset and I am sorry. We’re human - and we make mistakes. In this case thankfully we do have a last resort disaster recovery that we can pull data from. There are many providers that, having faced this many failures - a perfect storm so to speak - would have simply lost your data entirely.

This is the first major outage we’ve had in over a decade and while this is definitely major - our servers are online and we are actively working as quickly as possible to get all accounts restored and back online. For clarity - the bottleneck here is not a staffing issue. We evaluated numerous options to speed up the process and unfortunately short of copying the data off to faster disks - which we did try - there’s nothing we can do to speed this up. The process of copying the data off to faster disks was going to take just as long, if not longer, than the restoration process is taking on it’s own.

Once everybody is back online - and there are accounts coming online every minute - we will be performing a complete post-mortem on this and will be writing a clear and transparent Reason For Outage [RFO] which we will be making available to all clients.

I hope that you understand that while this restoration process is ongoing there really isn’t much to report beyond, “Accounts are still being restored as quickly as possible.” I wish there was some interesting update I could provide you like, “Suddenly things have sped up 100x!” but that’s not the case.

I am personally doing my best to reach out to clients that have opened tickets are updated as to when their accounts are in the active restoration queue. While we do have thousands of accounts to restore - our disaster recovery system actually transfers data substantially faster with fewer simultaneous transfers. While it sounds counter-intuitive - we’re actively watching the restoration processes and balancing the number of accounts being restored at once against the performance of the disaster recovery system to get as many people back online as quickly as possible.

Most sites are coming back online after restoration without issues, however, if once your account is restored you are still having issues - we are here to help. While we are quite overwhelmed by tickets like, “WHY IS THIS NOT UP YET!?!?!” “WHY ARE YOU DOWN SO LONG!?!??!!” “FIX THIS NOWWWW!” - we are still trying to wade through all of that to help those that have come back online and are having issues - as few and far between as it has been.

If you have any questions - we will definitely answer them - but please understand that while we’re restoring accounts we’re really trying to focus on the restoration of services as well as resolving issues for those that are already resolved.

Again - I am sorry for the trouble this is causing you - we definitely don’t want you offline any more than you do and will have all services restored as quickly as we can.

Sincerely,

Michael Denney
MDDHosting LLC - Professional Web Hosting Solutions
https://www.mddhosting.com/ - Rate us @RateLobby!
Check out our blog and community forums!
Follow us on Twitter and Facebook! “”

I think that someone is in big TROUBLE .

teal.

What a nightmare :roll: At least you know what’s going on.

They’ve also been posting updates (or that they don’t have any) on Twitter.

https://twitter.com/MDDHosting/with_replies

Do not have a twitter account, but I will look on Facebook.

teal.

I only have one because twitter has somehow become a quasi official notification service for the fire and emergency services :frowning: (So has facebook but hell will have to freeze over before I’ll get an account on there. )

Same reason I have a twitter account, Niko. :slight_smile:

MDD is also posting in their forum.
https://forums.mddhosting.com/topic/1582-major-outage-092118-09222018/

Looks like http://www.ardmoreweatherlive.com/ is back :slight_smile:

Yes it is. Was starting to wonder if the next update from MDD was going to be “all data lost”. Everything on my site appears to be there and working correctly. But, I am at work and haven’t had the chance to look more in depth.

Mine is still off line.

teal

Yeah, of the half dozen MDD wx sites that I know of Doug’s is the only one back up. It’s a hopeful sign though :slight_smile:

Pingdom shows how big a problem MDD Hosting has…
http://stats.pingdom.com/hgs5b5n9hkb1
I’m on server s4.supportedns.com
http://stats.pingdom.com/hgs5b5n9hkb1/2015470

Gus
https://www.halethorpeweather.com

Not sure what is going on, if I click on my shortcut (on my PC) for my site on my favorites bar it takes me to this site (https://www.emiliya.com/), if I go to google & put in wilney weather it takes me to a default page (http://www.wilneyweather.co.uk/cgi-sys/defaultwebpage.cgi) god knows were emiliya.com came from. I think that MDD hosting should say on the default page that it is their fault that the site is down. I have tried to log into the forum but it will not let me in, so I have tried to open a new account but that’s not working, I think that we should have next year free of charge :smiley: :smiley: :D, but I don’t think that we happen :frowning: :frowning: :frowning: .

teal.

When I tried going to https://www.wilneyweather.co.uk/ and https://www.halethorpeweather.com on Microsoft Edge, I get a Certificate Error and when I view the certificates for both sites, they show info for emiliya.com. It looks to me that something in the SSL is really, really messed up.

I will see if I can open a ticket about it, as I said I cannot get onto the forum.

teal.

Wow, I fully expected to get up this morning and find you all back on line :frowning:

Teal: Do you mean you can’t read the forum, or can’t post?

There are some comments on the forum about SSL. Something like SSL can’t be fixed until the domain is back up, and if they do anything with it before then it makes getting the domain up more difficult.