Development blog

More updates for Blob backups

I wanted to put some of this into the last update but decided it was best to improve things in smaller stages and run more extensive testing before the release.

Page blobs copied as ranges

When encountering a page blob to backup from Azure to Azure it will now copy only the page ranges with data in them. Of course for large files that are only sparsely populated (such as .vhd files) this should result in a considerable improvement in performance.

SQLite instead of Azure tables for temp file list

As the followers of the service may be aware I used an Azure Table per container in a sync job to store the file list. This means that a large file list doesn’t risk running the system out of memory during a sync. I’ve now moved to a SQLite database on local storage as I’ve been having performance problems on Azure Table in my use case - bumping into the 100 rows per insert was the primary problem.

In testing this replacement has been working well even though local storage on Azure is very slow for IOPS performance - something to be aware of when you start using local storage on Azure.

FQDN no longer needed when entering SQL Azure Server

This was an oversight on my part, probably introduced when I did the redesign a few months. The system should recognise just the SQL Azure Server name without having to append the “.database.windows.net”. It’s a simple fix and thanks for a user for raising it to me yesterday as I’d simply forgotten about it.

Error - temporary copy left behind

It seems that the system left a temporary database copy behind on a customers SQL Azure Server. This seems to have been due to a timing issue when starting the copy. I’ve attempted to fix the issue but this is the first time I’ve seen the error here since starting the service.

Just bear in mind that should a backup fail it’s worth checking the logs in case the temporary database drop has failed.

Really, keep reporting problems you’re having I want this to work as well for you as I can.

Wed, 08 May 2013 10:26:36 +0100

Speed and Usability

Another week and I’ve managed to get some “real” work done on the system. This time mainly concentrated around the speed of transfers from Azure to Amazon and the discoverability of features on the Schedules & History page.

Speed from Azure to Amazon

Initially this started off as an investigation into why backing up databases where the .bacpac was greater than 5Gb failed. It transformed into me beating some speed out of the Amazon SDK. It seems the defaults were “please transfer this file as slowly as you can” so I’ve added a lot of complicated threading and stream splitting which I think has speeded up transfers to Amazon by a considerable amount (tests changed from 4 hours to about 40 minutes). This is only apparent on larger files though.

The reason behind the 5Gb wasn’t to do with the uploading (I discovered after several days) but rather setting metadata on the target file. So I’ve had to refactor my internal API so that I set the metadata on upload which seems to be working just fine.

The risk of this is that I may have introduced some memory leak but I have a lot unit and integration tests and live system monitoring so I hope if there is an issue I can pounce on it instantly.

Listing of files

Another area of concern is simply listing the files in the source and destination containers prior to sync. I’ve hopefully made this at least twice as fast by listing both source and destination at once and also doing a lot more asynchronous calls to try and reduce time spent blocking.

I still think there is more work needed here as I use SQL Azure tables to store the file list and that now seems to be slowing me down.

Auto snapshot of .vhd files

Before attempting to copy from a .VHD file the system will take a snapshot of the file at that point in time and sync that file. Once that is complete the snapshot is removed. This should assist customers trying to backup active virtual disks.

More work in this area is needed so I only copy parts of the page file that have been used.

History page improvements

As mentioned before I’ve been taking support requests for people finding it hard to work out how to edit schedules. In an effort to make this more discoverable I’ve modified the history page slightly. Hopefully this should make things more intuitive for new users.

Bulk pricing - you need to opt-in

Just a reminder in order to benefit from bulk pricing you need to opt-in.

This can be doing by going to Manage Subscription and click on Update. You can confirm the new payment by viewing clicking on Payment Details.

That’s all folks

That’s all for now, again keep pressuring me for more features you want I love hearing from all of you.

Mon, 29 Apr 2013 14:09:03 +0100

Bulk Pricing (finally)

This has been the top requested feature for a long time now and I’m finally happy I’ve got a solution to it that should be fair to all around.

If you use less than 6 subscriptions there is no change in price it’s still simply $10, €8 or £7 per month.

If you use more than 5 subscriptions then there are some serious discounts available. It works out as

First 5 subscriptions                 $10.00

Next 5 subscriptions (6-10)       $6.00 - 40% discount

Next 5 subscriptions (11-15)     $4.00 - 60% discount

All subsequent subscriptions    $2.00 - 80% discount

There is a rather cool calculator on the pricing page which I’m very proud of, have a play with the spinnies and you’ll see the unit price drop the more subscriptions you purchase.

Invoices easily available

Also, I get a lot of requests for people wanting a way to get access to their invoices. So now when you go to the manage subscription page you should see a section which contains quick links to all of your invoices.

What next?

Well firstly I’m going to let the new pricing settle down and I’m probably going to do some more work on the Schedules & History page to improve discoverability of the features of the page as it’s not performing as well as I’d hope.

Apart from that please keep the suggestions coming.

Wed, 17 Apr 2013 08:03:13 +0100

Sync to single container, table backup retention and more

It’s that time again, it’s update time and there are a couple of little treats for you people this time around.

Sync to single container

It’s been requested by a few users so now you can easily sync from multiple source containers into one target container (or bucket). This means it’s really easy to ensure your Azure storage is available via Amazon and all you have to do is flip a URL should the worst happen and you can carry on with files copied during your last sync operation.

Table backup file retention

Just like you’re used to for SQL Azure Backup files you can now control the retention policy for your table backups. This has proved slightly more difficult as I had to take into account the regex support in table backup.

As you can see from the screenshot I also use this for ensure my performance logs don’t get too out of hand.

Username at the top of the page

Seems a simple thing but if you have more than one account (as I do) it can be difficult to work out which one is currently logged in. No longer. As I put the username at the top of the page. This went missing during the redesign but now it’s back.

Amazon S3 EU/Other region should work

Amazon have a lovely API so that if you’re targeting S3 storage that isn’t in the US you have to use very different urls. This was spotted by users and after some rather horrible coding I’ve made it work.

If you’re interested in how to make it work in a not too terrible way drop me a line and I’ll share the code.

Slow sync operations

I’ve made some changes to try to increase the copy speed during sync operations. I’ll keep an eye on this but may have to do more work in this area. My belief is that I may have been maxing out the bandwidth for my scheduler worker role so I’ve upped the instance size and put more performance counters to track it.

That’s about it

There are a few minor changes, wording here and there that sort of thing. For now I think it’s good

The next thing on my list is to give an alternative pricing model for people with lots of small databases, if you’ve got opinions please drop me a line at: richard.mitchell@red-gate.com

As always keep your opinions flooding in. I make this software to take away your pain, so tell me your pain.

Wed, 13 Mar 2013 09:03:44 +0000

Restore Azure Table

Restore Azure Table

Well a couple of weeks after the introduction of Backup Azure Table I released the first version of a Restore Azure Table.

image

This can read a .json.gz file as created by Red Gate Cloud Services and creates a table with the data contained. You can also choose to upsert the data into an existing table. This is done via the InsertOrMerge functionality so any rows that have been modified before the restore happens may have modified properties overwritten by those in the backup.

File format changes

I’ve also had to make a few changes to the format of the backup file as before there wasn’t enough type information stored to completely match the source information, for example if a string looks like a Guid - it would get restored as a guid type, or if value e.g. 42 was stored as a long it would get restored as an int. The new format preserves type information for all properties but please if you spot anything let me know.

Backup files taken previously should restore as best they can with the above provisos.

Cancel jobs - finally

It has taken a while but a user prompted me about cancelling jobs again. I’ve tried to do this in past and the code has ended up so nasty I’ve had to revert. This time however everything clicked and within a couple of hours we (finally) have a way of cancelling running jobs.

When you mark a job for cancellation the job running will check intermittently for that flag and finish off the job then and there. This can take a few minutes so please have patience as I didn’t want to create extra overhead for the jobs constantly checking. In fact they check for cancellation on when writing extra information to the progress log.

Speed on history page

I’ve been told about performance problems retrieving the data on the history page. I’ve attempted to make some improvements to this but I’m not sure I’ve got to the bottom of it yet. You will have to refresh your history page if you have it loading in the background as the data format returned from the server has changed.

Nasty 30Mb timeout

A couple of users were having problems with sync jobs when files were around 30Mb. This was due to a timeout within Azure that I’ve increased (with the help of the Microsoft support team tracking it down). Above this size the file is split up and the timeout wasn’t hit, smaller than this and the file would copy within the timeout.

Job timeout

Due to larger and larger jobs happening in the system I may have to introduce a 24hour duration maximum for jobs. This will mainly affect very large sync operations some of which take several days to complete currently. This is just a heads up that I’ll be doing this.

Wed, 13 Feb 2013 08:41:00 +0000

Largest update to date - Azure table backup & Metro

Yesterday was a big day. I put live some work that has been going on in the background for the last couple of months.

NoSQL Azure table backup

Due to a rush of demand I’ve put the first version of a tool to back up Azure table data. This will export an entire table to a JSON.gz file, it can also remove old rows during the backup process. The data format is simple enough to be read and understood although due to limitations of JSON the format is not type preserving - for example there is not a “guid” type in javascript (although they’re easy to recognise) and you’d find it difficult to tell if a small number was stored as a Int32 or Int64.

Also due to limitations of Azure tables I can’t really give a meaningful percentage of  the table backed up as there is no way to get a row count without reading all the data first (or getting the size of the Azure table as far as I’m aware).

This is a first pass at backing up Azure table data and I’d love to hear from you if you find it useful or how you’d like it to work.

Windows 8 UI (Metro)

The most obvious change is the complete redesign of the UI, inspired by the Windows 8 theme and the new Azure portal. This has involved changes to every single UI file and not a small amount of re-factoring if you spot any glitches please let me know.

The thing I’m most proud of is the new history page. This is based on JQuery data tables and allows both sorting and filtering of the last 200 jobs to be run. Hope you like it, compared to the old version it’s far more feature rich and data dense.

What’s next?

Obviously with such a large change there are going to be issues - although to be fair it seemed pretty good overnight - I’ll be keeping a close eye on it and making gradual improvements where I can.

NoSQL Azure table backup will develop as people tell me what they need from such a solution.

Beyond that we’ll see. Hope you like the new site.

Fri, 25 Jan 2013 07:20:27 +0000

Azure table backup in development

I just backed up 5.7Gb of Azure Table to a 174Mb .json.gz file in about 7 hours on my local machine. Not as quick as I’d like but not bad. (No UI for it yet though)

What do you think? Please let me know.

Thu, 10 Jan 2013 07:34:18 +0000

A few smaller releases and what next...

I noticed yesterday I’ve not really put much up on the blog about a few small releases we’ve done recently. So I think it’s time to bring everybody up to speed on what we’ve done and where we’re going.

Resumable backup job

The SQL Azure job is now fully resumable, this means if the scheduler service goes down for any reason it can pick up a job that was in progress and continue it as if nothing had happened. We did this work to the Restore job a while ago and we let it settle before implementing the same logic in the Backup job. Although with other changes our scheduling stability is now excellent so we don’t see a restarting scheduling service anything like as frequently as a few months ago.

The  ” - ” separator in files returned

It looks like as part of the work to make jobs resumable we started putting spaces in backup filenames again. I released a fix yesterday for this. The problem with spaces in the backup filenames is that the Microsoft Azure Portal can’t cope with them so to make it more likely that it can work we add no spaces beyond your control. The other parts of a backup are the filename and the timestamp both of which are under your control.

12 hour default timeout for SQL Azure backup/restore

There is now a timeout to cope with stalled jobs in Microsoft’s bacpac import/export service. If a backup or restore job takes longer than this time the job would fail (even if the export may eventually complete). We had been seeing jobs that would remain “pending” for several days stopping backups from happening, this feature prevents that from happening. You can configure this via the settings upping it to at most 24 hours.

I’m also thinking about limiting the duration of file sync jobs in a similar manner, currently we have jobs spanning several days.

You can remove a running jobs schedule

Does exactly what it says on the tin. You can delete a schedule from a job even if it’s running.

Deleted files during sync no longer fail sync job

If files are deleted during a sync job (after the file list has been created) this no longer causes a problem. Previously the job would file, now it just reported the file as having been deleted and carries on.

What next?

There are currently two things on the go at the moment.

Azure table backup

The requests are coming in thick and fast so I’m looking at doing some form of Azure table backup. This is likely going to be initially just a dump of the given table to a .zip file in blob storage (or possibly a copy to another Azure table). From there it’s up to what people want from it. So far we’ve got requests for differential backups, retention policy of rows ( removing rows older than X days ), backing up to e.g. Amazon SimpleDB. I’m keen on your thoughts so please drop me a line.

Windows 8 redesign

I’ve also started work a while back on a large scale revamp of the appearance of the site. This will not only make the site look a lot better but will also incorporate massive improvements in the history page, showing many more results that are filterable and searchable (another common request).


Tue, 08 Jan 2013 10:33:38 +0000

Couple of rare bug fixes

There was a rare error that a backup job would fail to work with an error saying ”Database already exists” when using the “create temporary copy” option. This was due the retry behaviour of communicating with SQL Azure, we know check for existence of the copy rather than simply assuming the create copy call failed when no result was received from Azure.

There was also a restore bug where finding the latest backup wouldn’t include the new filenames without space separators introduced last week.

Hope that makes sense. We’re fixing smaller and smaller things in the system so our stabilisation work is really paying off, thanks so much.

Tue, 27 Nov 2012 14:30:56 +0000

SQL Azure Restore fix and a few more things

There was a serious issue during a SQL Azure restore that if it failed but the database wasn’t immediately removed by the Microsoft Import Service both the renamed copy and the target database would be removed. This only happened if you had “Drop Existing Database” selected and the restore job failed.

Spaces removed from SQL Azure backup files (Microsoft’s new Azure Portal can’t cope with spaces in filenames for import). A file that was previously named “BackupFilename - [timestamp].bacpac” will now be named “BackupFilename-[timestamp].bacpac”

We have a snazzy new price page to explain pricing to people. This is at http://cloudservices.red-gate.com/home/pricing

Payment update link from subscriptions page (http://cloudservices.red-gate.com/Settings/Subscription), many requests for information even though the link should be in the e-mail from Fastspring each month.

Finally kudos to a customer who spotted the title of Azure to Azure storage backup was actually named Amazon to Azure, must have been that way for at least 5 months.

Thu, 22 Nov 2012 14:43:48 +0000

15 Releases in 2 weeks

You may have noticed reliability problems with the service over the last few weeks.  This blog post is a well overdue explanation of everything that has been going on now that we seem to have finally got things under control.

Firstly it’s worthwhile checking you don’t have any temporary databases on your SQL Azure servers that were left behind after a failed job recently.

Service Bus

Turned out to be a mistake, we spent a lot of time architecting a solution to allow jobs to be broken up into tasks that could be run across the scheduling roles. However the introduction of service bus added a lot of complexity and reliance on another external system. The thing about service bus is that needs to be implemented with a lot of “best practices” (read lots and lots retries). The omission of any one of these best practices meant a system you couldn’t completely rely on. If you’re considering service bus I’d strongly advice having a good details read of http://windowsazurecat.com/2011/09/best-practices-leveraging-windows-azure-service-bus-brokered-messaging-api/

Sync job performance - out of memory

Some of the errors were introduced when I did a major rework of the storage sync job. I added a lot of parallelism via the .NET task parallel library. However it seems that introducing this also added a major memory leak, most likely due to large object fragmentation using the Azure SDK. Unfortunately this meant that jobs that were in progress at the time on a scheduler that ran out memory didn’t complete properly or even clean up after themselves.

For now I’ve reverted the code to be largely single threaded before going at it again (more carefully this time). There are significant improvements as it now uses Azure table storage for file lists rather than in-memory and also listing of files in containers is now an order of magnitude faster after having some help from the Cerebrata guys.

SQL Azure Backup jobs with no progress

It seems that Microsoft made modifications to the bacpac import/export service so that the “job pending” string changed format (from ‘Running, Progress = 20.56%’ to ‘Running, Progress = 20.56 done’). This meant that the regular expression we used never recognised a job as in-progress, even though they would eventually complete. I made modifications to the regular expression so that it less fussy but also so the job will fail should an unrecognisable response come from Microsoft. I released a buggy version of this code on the 8th causing a bunch of jobs to fail immediately with ‘Object reference not set to an instance of an object’ around 9am UTC. I spotted it and fixed it within an hour.

DNS lookup timed out

For some reason, attempting a DNS lookup of the backup import/export service from the Azure worker role turned out to be less than reliable. We used to the do this to ensure connectivity before attempting to communicate to the export service but as it’s on a worker role this really serves no purpose other than to introduce errors so it’s been removed. Communication with the service is fine, just the DNS lookup failed on occasion.

Heartbeat

This is a new feature on the UI that lets us know the last time your job reported progress. This should happen regularly throughout a job and is an instant view to see if your job is still running or hung. Be aware that during large file copies the heartbeat will stall for the duration of the copy.

Monitoring

We are putting in place a monitoring system to keep track of the system so we can immediately spot issues in the future much quicker and also gain confidence that we’ve got the system up to an acceptable level of reliability.

Slow down

Lastly we’re going to slow down the development for now to make sure we don’t make things worse, like I did with my work on the sync job.

Hope this all makes sense and thanks for all the words of support from those of you who e-mailed me with failing jobs. You really are the best customers in the world and far more than I deserve.

Wed, 14 Nov 2012 14:14:53 +0000

Mini update - retry for service bus

I’ve put a quick fix live this morning to retry connections to service bus. There looked to have been a failure if connections were closed. They will now retry and reconnect as required (hopefully).

Also there was an issue if SQL Azure backup files contained a period the .bacpac file extension wouldn’t be added causing restore problems as experienced by a user last week.

We’re also putting the final touches into improvements in restore and sync reliability which hopefully will go live later this week.

I’ve also got to sit down and do some serious mathematics in order to offer a per-Gb pricing model - I’ll attempt to ensure anybody already paying won’t end up paying any more. Get in touch if you have opinions on this or anything connected to the service.

Tue, 30 Oct 2012 09:57:04 +0000

Services available again - and website update

So it looks like the main database for Cloud Services on SQL Azure was “unavailable” for 2 hours this morning. I’ve still got an open ticket with Microsoft as to the cause so hopefully we’ll able to do something about it.

You’ll also notice that I’ve updated the website radically, mainly this is in an attempt to improve the support page to reduce support queries as in order to support an ever increasing number of users (not that I’m complaining) I need to make the system as self maintaining as I possibly can.

Do let me know if there’s anything I can do to improve things. Meanwhile we’ll continue to work on the overall stability of the system so we can scale out better.

Mon, 22 Oct 2012 17:04:52 +0100

Cloud Services Azure Database "Unavailable"

Red Gate Cloud Services is currently down as my SQL Azure database is “Unavailable”. I’ve opened a support ticket with Microsoft already. More details as we get them.

Mon, 22 Oct 2012 11:40:37 +0100

(400) - Bad request - Part 2

Just put in the most minor of changes to include the error from Microsoft in the detailed logs.

That’s about it really.

Thu, 18 Oct 2012 15:58:20 +0100

(400) - Bad request

This is now confirmed as an issue by Microsoft. The full details of which can be found on…

http://social.msdn.microsoft.com/Forums/en-US/ssdsgetstarted/thread/8e898f62-3355-44da-bf3e-cee218853f84

The solutions seem to be…

  1. Use the “Create temporary copy before backup for transactional consistency (this will incur a cost from Microsoft)” option on your job
  2. Rename the database and export
  3. Wait for the pending request to clear out - 14 days from the date of the request
  4. Contact Azure support and ask for the pending request to be manually flagged as failed

We’ll be improving the error reporting in cloud services so the actual error from Microsoft isn’t hidden.

Tue, 16 Oct 2012 07:24:16 +0100

Major failure on Sunday morning - DB Size limit hit

All jobs from Sunday until now will have failed. This is due to the SQL Azure database that backs Cloud Services having hit it’s size limit and immediately stopping working. I’ve just increased it’s size now and it should sort itself out.

This is due to me not monitoring the size of the database so I’ll put systems in place to ensure this doesn’t happen again.

I’m deeply embarrassed by this. So very sorry to all who rely on this service.

Mon, 08 Oct 2012 07:48:25 +0100

Re-architecture of SQL Azure Backup job (and more)

So quite a lot has gone on under the scenes over the last month including, but not  limited to…

Scalability of platform improved

We’ve rewritten a large amount of the code for the SQL Azure Backup job so this can now scale with the quantity of tasks that are currently running on the system. of course this has taken a massive effort with very few visible effects, we’ve been keeping a close eye on the system ever since. We’ll be making similar improvements to Restore and storage synchronise soon.

Unexpected API change by Microsoft

On Thursday of last week it appears that Microsoft updated the bacpac export service causing all jobs to fail, as I was out of the office Sam and Lionel stepped into the breach and released a fix to the system the very same day. There are still  two customers who have issues associated with this update from Microsoft however so far they are resisting our best attempts to fix them.

Timezone display

Certain users in the Western Pacific were having issues with times being displayed incorrectly this was due to a programming error and I believe has been fixed now.

Current priorities

For now the main priorities will be to stabilise the functionality of the existing system so any new features won’t happen for a while. I believe that this is the best thing to do for now, please keep telling me of ideas that you feel would improve the system.

Wed, 19 Sep 2012 08:36:57 +0100

Multiple folders, snapshot retention of files

I came in on Saturday unable to stop thinking about a new way of deploying Cloud Services which should make it possible for me to update the live system without any downtime at all. I’ll go into how this is done in a separate post.

This version has a large number of changes - mostly to the storage synchronisation/backup feature and internal structure for unit testing.

Multiple container support

When you choose the source container it now allows you to specify a regular expression to choose multiple containers. The destination container name matches the source container when using this feature. Be aware that on Amazon bucket names are global so you can easily get a name clash unless your container name is highly likely to be unique (guid for example).

The drop down box has been removed in favour of auto-complete although I’m yet to be convinced this is a good way to go.

Snapshot support on Azure

Also as you can see from the screenshot the destination files can have a snapshot taken before they are overwritten. These snapshots can then be deleted after a retention period has passed. The snapshots are easily retrievable via tools such as Cerebrata’s Cloud Storage Studio (full disclosure: Red Gate own Cerebrata)

On Amazon S3 this feature isn’t supported as you can seemingly only specify versioning on a bucket in it’s entirety and you can’t turn this off when it’s enabled or delete old versions as far as I can tell.

Other little fixes

  • Resetting the password of an account works even if the account is locked
  • Full update of all 3rd party packaged
  • “test” button for connectivity to storage accounts etc
  • Of course my no-downtime release system

What’s next?

I think for now the toolset is getting there so for a while I’ll concentrate on marketing the tool more widely and just putting up bug fixes and minor features as suggested by users. With my new deployment system this should be much easier for me to do.

Keep the comments coming either by going to https://getsatisfaction.com/redgatecloudservices or e-mailing me directly at mailto:richard.mitchell@red-gate.com

Mon, 30 Jul 2012 08:08:56 +0100

SQL Azure Restore and Azure to Azure blob backup

This latest release has taken quite a while, partly because I had a terrible cold that took me out of it for a couple of weeks and also I had the indecency to go on holiday (http://www.downloadfestival.co.uk/ for those interested).

Hopefully this’ll be worth the wait though as we’ve added new tools.

SQL Azure Restore is a long overdue feature. You can schedule a restore so that you always have the latest backup available or just perform a single shot restore from a run job. The “Use the latest backup” is a bit dirty relying on string matching blob names so it’s not infallible.

Just as you can backup from Azure to Amazon or Amazon to Azure I added Azure to Azure blob backups. This involved quite a rework in code but it should mean that adding other cloud storage providers isn’t too much work, if you want another let me know. 

I had a bit of a problem with the release as the timestamp wasn’t correctly appended to files. I’ve release an update to the release and informed anybody who had a backup fail due to the issue. Please keep the feedback coming it’s worth it’s weight in gold.

Wed, 20 Jun 2012 09:56:30 +0100