0
Fixed

RESOLVED [2019/4/1 0500h - 2019/4/4-1330h PT] ActionTiles Slow Performance & Daily Partial Outages

Terry (ActionTiles) (Co-Founder) 6 years ago in Announcements / Outages updated 6 years ago 55

RESOLVED

Google Cloud Support finally acknowledged that this problem was not caused by ActionTiles's code or database usage - after they decided to test moving us to a different server cluster on Thursday 4/04 afternoon.

We are now escalating a post-mortem review with them to determine why it took 4 days for them to execute this remediation, and why they do not have monitoring processes in place to have automatically detected the severe recurring issues with our allocated server cluster.


NB: We suspect that some Customers may have previously experienced occasional severe slow Performance from time to time starting as early as mid-March. There were a few mentions in this same vein on social media, but no replicable cases or traceable indicators. It is always a relief to find and resolve the likely "root cause" of such mysteries.

We apologize to all affected Customers and Trial users for the recurring inconvenience over the past few days.
We are proud of our reliability and stability record: Maintaining that will always be our highest priority.

If you experience any further difficulties, please do not hesitate to contact Support@ActionTiles.com

Sincerely,

...Terry & Alex.




Incident history below:

Recurring April 3rd & 4th...

Currently investigating numerous reports that this appears to be recurring - likely around the same time every morning.


We are 100% focused on resolving this.


Resolved 2019-04-02 1100h

While the root cause has not been confirmed, normal resource availability was restored by 1100h this morning.


Recurred April 2nd

Currently investigating numerous reports that this appeared to recur.





We received several reports that ActionTiles is responded very slowly or not at all - including possibly not listing any Panels or other objects like Media, Themes, etc..

  • This may have resulted in the app not loading at all, black screens, or empty Panels, empty Location lists.
  • FYI: We may also post updates on https://status.ActionTiles.com and Twitter.


Please "Follow" or visit this Topic Post for any further updates; or change your personal settings to subscribe to this Forum's various Categories, such as the Announcements Category: https://support.actiontiles.com/knowledge-bases/8/articles/1025-how-do-i-reduce-my-forum-notification-email-messages

We apologize for the temporary inconvenience, 

...Terry & Alex.

Answers

Answer
Fixed

RESOLVED

Google Cloud Support finally acknowledged that this problem was not caused by ActionTiles's code or database usage - after they decided to test moving us to a different server cluster on Thursday 4/04 afternoon.

We are now escalating a post-mortem review with them to determine why it took 4 days for them to execute this remediation, and why they do not have monitoring processes in place to have automatically detected the severe recurring issues with our allocated server cluster.


NB: We suspect that some Customers may have previously experienced occasional severe slow Performance from time to time starting as early as mid-March. There were a few mentions in this same vein on social media, but no replicable cases or traceable indicators. It is always a relief to find and resolve the likely "root cause" of such mysteries.

We apologize to all affected Customers and Trial users for the recurring inconvenience over the past few days.
We are proud of our reliability and stability record: Maintaining that will always be our highest priority.

If you experience any further difficulties, please do not hesitate to contact Support@ActionTiles.com

Sincerely,

...Terry & Alex.

PINNED

Update (April 4th 1500h PDT):

We have been working with Google Support to resolve our database latency and outages. They have moved out database to a different server and we are observing up to a 70% improvement in database performance.

While we are working to figure out the root cause of the issue, we are hoping that the database performance has stabilized for the time being.

We are working very hard to resolve this issue. We appreciate your patience and understanding.

Thanks,

Alex

+2
PINNED

Update (April 4th 0600h PDT):

The issue has been confirmed to be recurring on a daily basis each morning approximately as much as 0600h-120ph Pacific Time.


We have not identified the root cause, but hope to avoid or minimize recurrences by continuing intensive research into the issue.

Though this only affects a subset of Customers, we are 100% focused on resolution. We apologize for the inconvenience and will keep this outage Topic updated with any significant information.

+2
Fixed: Monitoring

At the time of this message, the database is slowly coming back to life. You might be able to view your panels.

We will update this page when we have more information.

Mine came back just as you posted that. Thanks!

I’m being effected by this.  Tiles absent.  Locations missing.  Can’t log in to app.actiontiles.com

+1

mine is also back up and running on all devices

+1

mine just came back.

Still affected intermittently, came back for a short while but then began to slow down again, now unable to load any panels or log in 

I cannot access AT on tablet or phone. When I initially access AT it was running in Fully browser but panels were very slow to load and content blank. Reloaded Fully and now I can't access anything in design mode. 

I'm still experiencing this issue. 

I have just tried to use action tiles.  The tiles do not appear.  There is nothing in “My Panels”.

Please help

I also am experiencing the issues again. 

Investigating

Update April 2nd

Currently investigating numerous reports that this appears to be recurring today.

I have no panels also cannot sign in

Any news

At this moment, the panels are slowly coming back (in my account). We don't yet know if the database connection is stable enough.

My panel is not loading either, and having issues signing in at times. I'm a brand new user (just signed up Sunday night) so it's unfortunate to see issues already before I even finished my panel. Fingers crossed it's fixed soon and I can resume my 14 day trial! :)

Certainly unlucky timing, as we've had only 10 outages in over 2 years, and most are very short.

Please do email us at Support@ActionTiles.com to renew your Trial once this is resolved. Thanks!

I've been using action tiles for 2 years now and these are the first issues I've had, bear with it, they're dedicated!

+1

Great to hear all the support in the community. Looking forward to trying it again once restored!

Thanks, Mitch!

We know that it isn't super comforting for us to emphasize our very fortunate high-reliability statistics in the middle of a frustrating outage; but it also helps us focus on the immediate issue and not get stressed out that some component is breaking-down long term.

+1

Also, one of my many panels for your inspiration whilst the issues are rectified! :) 

Fixed: Monitoring

Update (likely as of 2019-04-02 1100h PDT):

There are continuing and increasing indications that service is restored, but may still be slower than usual.

Please try refresh/reload in your browser. You may wish to try rebooting your tablet(s), but that is probably not required.

Still not able to load my panels...

EDIT: now it works.

EDIT: it works slowly..

My panels were working fine when these initial reports started and this post was made.  As of 10am cst, they are no longer even loading. 

Problem happening again.  Blank action tile

problem went away.  But this is the 3rd day in a row where it’s had issues.  I’m guessing it happens between 6am - 11am pdt.  I wonder if the same thing will happen tomorrow morning.

Investigating

Third morning in a row (at the same time range each day!) is very likeky not a coincidence. Now that this is definitely a pattern, we have can emphasize this to Google Cloud Support and it should help them identify the cause.

As predicted the tiles have vanished this morning for the fourth consecutive day...in the morning pacific time.

Looks like it's having problems again today. Main panels list is empty or extremely slow to load for me, and direct panel links are showing slow/no-load behavior as well.

All of my panels are missing. Is there something I need to do on my end to get them to load?

unfortunately agree with previous commenter seems to be same time it occurs 

Yep last few days its up and down, right now down again.

Always in the morning Pacific Time ... Correct? 

+1

Well I'm MST time in PHX but yes seems the mornings are worse and then towards afternoon and evening it seems to be working fine.

Update (as of 2019-04-03 1200h PDT):

There are continuing and increasing indications that service is restored, but may still be slower than usual.


We have not identified the root cause, but hope to avoid or minimize recurrences by continuing intensive research into the issue.

If you currently (i.e., on Wednesday afternoon Pacific Times) are unable to load, then please refresh/reload in your browser. You may wish to try rebooting your tablet(s), but that probably is not required. We do not know the exact number of users affected even during the "peak" of the Outage. It is likely the number of users affected just tapers off until performance is completely normal.

I'm still having occasional issues loading panels (especially since I'm still creating/editing them). I've also noticed an issue loading all of my settings whenever I went to my panel builder this morning. My panels, themes, etc. did not show up until I cleared my cache and reloaded the browser and signed back in. As I continue to edit the panel (on my laptop), sometimes it won't refresh fully on my tablet, where it will "reload" the panel page but won't finish loading all the icons/tiles.

Not a deal breaker for me, but thought I would share there appears to still be some sort of issue going on as you mentioned above.

its April the 4th and it's down again.

I am seei g this again today, prob since about 2pn gmt

Down and up and down again for me.

So mine has disappeared from the app and the web. I have no locations, no panels, nothing. My panel was apparently cached on the tablet so as soon as I rebooted it, it was gone just like in the web version. I cleared cache in the app, rebooted the tablet again, and logged back in, still gone completely. I have nothing. Should I just give up and create a new panel or is this going to get resolved at some point?

+1

Seeing issues on the 4th from before 16:30 BST (8:30 Pacific time). This time I'm seeing completely black screens even after twenty minutes. Previous times I'd see tiles after several mnutes. Might just be different state of my browser though.

Ah just had panel appear at 16:50 BST but only the few static icons, nothing dynamic.

Update:

What I mean by 'static' was that icons for things like URL and panel shortcuts that aren't dependent on status were displaying. The clock was working and some fixed text like 'Mode' and 'Routines' displayed.

 The panels were certainly behaving normally by about 18:00 BST (10:00 Pacific time). 

mine is down. Unable to log in on chrome on phone or device after a reboot

edit logged in now but no pannels

My ActionTiles is down today.  Only getting a blank screen instead of my tiles.  

+2
PINNED

Update (April 4th 0600h PDT):

The issue has been confirmed to be recurring on a daily basis each morning approximately as much as 0600h-120ph Pacific Time.


We have not identified the root cause, but hope to avoid or minimize recurrences by continuing intensive research into the issue.

Though this only affects a subset of Customers, we are 100% focused on resolution. We apologize for the inconvenience and will keep this outage Topic updated with any significant information.

I'm in the UK and it's been hit and miss for about three days now (possibly longer). At first it wouldn't log in, then log in was possible but the tiles were empty, now the tiles are fleshed out but unresponsive. This is on all devices, tablets, phone, PC.

Thanks for the UK perspective, Munter.

From the very little cloud diagnostic information we have so far, we cannot determine how widespread this is. Indications are that "everyone" is likely experiencing at least some degree of slowdown. No data is lost ... it just takes multiple seconds or ... many, many, minutes for data to be returned.

We haven't made any changes to the app, nor had a sudden growth spike, so there is nothing for us to fix on our side unless a diagnostic points to a missing index or bad query. There are many clues, however, that this is an issue that is entirely the responsibility of Google Cloud - e.g., a faulty server, or some new or malignant daily process on their end.

+1

Hi, Terry, FYI, I have returned home from work (17:30 BST) and the system seems to be working fully for now.

...back out on my end again.  East coast, 2pm EST.  Like others, looks like its in and out, in and out.

Thanks for letting us know, Shaefer.

This seems to be following a rather predictable pattern of the relatively the same duration each day; but customer reports which help confirm the start and end times are helpful to ensure we don't make assumptions.

+1

Following thread from day 1, Also located in the UK, 4:30-8:30pm GMT/BST. 


Seems to start with panel response slow down, progresses to complete unresponsiveness, black screens on reload, blank locations and panels on log in (if able to log in at all). Then intermittent panel visibility with very poor response time, until it gradually picks up speed and availability towards the end of the ''episode''. 

If google is to blame, it's like a bad April fools that continues to occur. I'm sure you'll identify, and rectify the issue. You have a very dedicated customer base, myself included, who'll assist in any way possible. 

(new forum user, but have been an action tiles lover for a good 18 months) 


Thank-you Mitch.

Indeed, while we have had short random performance glitches from time to time, this serious issue only occured for the first time on April 1st, and Google acknowledged it as a major issue affecting multiple of their customers - during the exact same morning hours.

Subsequent days, even though the symptoms (for us) are identical to April 1st, they have not announced it as a public outage. Too similar to be a coincidence? Or we are the Fools of April 🤡.

So for me being in AZ, seems about 6am things start to act up, slow screen, half loading or not loading. This continues up to about 1230pm - 1pm AZ time and then its fine the rest of the day and night till pretty much the same times the next day. Then it repeats, been like this since Monday.

That corresponds with the logging information that we have available. The statistics we get from the Google Cloud are delayed, so we can't confirm when utilization returns to normal until 30 to 90 minutes after the fact. This would be super easy to resolve if it happened during any scheduled daily maintenance (backups, etc.) that we perform, or if we were informed of external maintenance - but there's nothing that runs in the morning hours.

+1

To my knowledge, today is the first day I was impacted as I saw no issues the previous days.  Of course I found out today by trying to demo for some friends at work...  

Good luck on finding the fix!

PINNED

Update (April 4th 1500h PDT):

We have been working with Google Support to resolve our database latency and outages. They have moved out database to a different server and we are observing up to a 70% improvement in database performance.

While we are working to figure out the root cause of the issue, we are hoping that the database performance has stabilized for the time being.

We are working very hard to resolve this issue. We appreciate your patience and understanding.

Thanks,

Alex

+2

I haven’t noticed any slow downs today. Will have a good test of it tomorrow during the times it’s previously gone down 

Answer
Fixed

RESOLVED

Google Cloud Support finally acknowledged that this problem was not caused by ActionTiles's code or database usage - after they decided to test moving us to a different server cluster on Thursday 4/04 afternoon.

We are now escalating a post-mortem review with them to determine why it took 4 days for them to execute this remediation, and why they do not have monitoring processes in place to have automatically detected the severe recurring issues with our allocated server cluster.


NB: We suspect that some Customers may have previously experienced occasional severe slow Performance from time to time starting as early as mid-March. There were a few mentions in this same vein on social media, but no replicable cases or traceable indicators. It is always a relief to find and resolve the likely "root cause" of such mysteries.

We apologize to all affected Customers and Trial users for the recurring inconvenience over the past few days.
We are proud of our reliability and stability record: Maintaining that will always be our highest priority.

If you experience any further difficulties, please do not hesitate to contact Support@ActionTiles.com

Sincerely,

...Terry & Alex.

Commenting disabled