A common complaint that many people had for a long time was that uploading pictures was painfully slow, timing out in many occasions.
The problem was the way we handled them; and allowing to upload many files at once only made things N times slower. The processing for each of them was very naive.
It all started when you pressed the "Upload" button. All files would sequentially be uploaded from your web browser to our server in one big communication. That time would depend in the summed file size of all files together.
Once we had the files in our server's memory we would generate one thumbnail for each file (only the one you are going to need for the preview once the page reloads) and save those generated thumbnails on the server's local file system which is where the thumbnails are served from.
After generating the thumbnails, we would continue to store the files you uploaded that are still in memory to a long term storage, we use Amazon S3. At this point we need to upload the files from our server to Amazon.
Once that's done, we would fire up a parallel job to generate the rest of thumbnails we're going to need for all the uploaded pictures. This happens on a separate server so it doesn't slow us down, and then we proceed to make the needed database entries and generate the OK response you see.
As you can see, for you to see the next page, you had to wait for the time to send the files from your computer to our server and from our server to Amazon, once for each file and one all that was done we would finally prepare your next page.
The new process
The new approach is very different from the old one. It aims to take advantage of the time you spend picking the classification of the picture to process the upload for each photo.
Another big difference is that now we process each picture individually, not sequentially, this allows to do many in parallel.
When the upload page loads, it will separate temporary picture placeholders for each of the upload fields you see. Once you select the file to upload, the page will automatically start uploading the file. This time it will upload it straight to Amazon S3 instead of going through our server. When the upload finishes we update that file's temporary placeholder with the reference of your file. At this point that file is good to go (This is where you see the "Done" check image), but as you still select other pictures or move the mouse to click the upload button, we start to generate the preview thumbnail, so if you're still processing the 2nd picture the thumbnail will be generated as you click around.
By the time you hit "Upload", all files have already been persisted, so all we have to do is look for those temporary image placeholders and create the real picture references just copying their file reference, which is a pretty quick operation so the time spent from clicking the final button until getting a response is going to be pretty fast.
The advantage of this is that one file gets processed as you are still selecting the 2nd one. One slight disadvantage is that in the old process we could generate the thumbnail quickly when we had the file in memory in our server before sending it to Amazon, now since we first send it there, we need to then download the file to our server to generate the thumbnail which could take fractions of a second, but it is negligible.
Another issue is that when you load the page we prepare the upload page we put aside 3 place holders, if you ask to upload more we put aside 3 more. If you don't use them all we are left with those place holders laying around. But it's not a huge deal since we clear all those that haven't been used after 24h. So... if you open the upload picture page, make sure to use submit it within 24 hours!!
Along with the above mentioned changes, a few other features were added:
- There is now a "Manage" picture page for doing operations with your uploaded files, this allows for much more opportunity instead of cramming many options in the same screen to upload files.
- It is now possible to edit your pictures. If you had a typo on your picture's text or classified it wrong, you can now edit it after you uploaded the file
- We now keep track of your original file, so you will have a link available to download your original file. It is only available to you! In the past there was no way to retrieve your original picture, only the generated watermarked thumbnails.
- You can now edit the order of your pictures! In the past they would always show in the order you added them, so recent pictures would always be last. Now you can edit them as you wish.