Programming, Theory, Tips & Tricks, Uncategorized, ruby/rails

Amazon S3 bucket limit

theybestealinmybucket

So, today I learned that Amazon S3 limits the amount of buckets you are allowed to have at any one time. This was a HUGE pain in the ass for me, as I just rewrote my S3 caching libraries to separate my caching into daily buckets.

headesk

The idea I had was that I would keep 30 days worth of buckets before automatically shuffling all those buckets into long term archival at the end of each month.

You see, Amazon S3 doesn’t allow for folders within buckets (not technically, I’ll get to that later), and I think it’s bloody ridiculous for me to store the ~15 millions xml cache files I need daily access to in just one massive bucket. So, I thought my plan to create daily buckets was pretty damn good. Apparently Amazon disagrees, as hey have set a limit on the number of buckets I can have associated with my account (FYI, I have ~90 buckets right now and that’s where the limit is).

So, I’ve been looking into my options, and I have decided that I’m going to have to go back to my original set up of storing all 15 million (and growing) xml files in one bucket. As some of you may know, S3 GUIs like S3Fox are able to show sub-folders within buckets, so I figure I will go this route. My first step was to take a crack at the Ruby S3 library from Amazon (not the AWS gem, which is crap for multi threaded environments) to see how I can create folders within buckets. Turns out, you can’t. At least, not out of the box. You see, AmazonS3 doesn’t actually SUPPORT sub-folders within buckets.

Stupid, right? I know.

So, how does S3Fox do it? It turns out they create virtualized folders by creating a special object that acts as a folder, then you access your stored objects by appending that folder object name to the actual file key. For a directory named “/foo”, you would create an object with the key “foo_$folder$”. Then, to get a directory listing of all files stored under the foo path, you just query S3 for objects with keys that start with “/foo”, and you ignore any objects that end with “_$folder$”.

I’m about to waste my day setting this up, and I’m none to happy about it. It seems like a hackish and shitty work around for an obvious service flaw. I’m sure there is some technically-reasonable answer for why Amazon has set an arbitrary limit for buckets and also why they don’t allow you to create folders, but I don’t know what it is. Anyone have any answers?

Hat-Tip the Dead Programmer Society for the code to create virtualized folders in S3.

some posts that may be related

4 Comments

speak up

You can skip to the end and leave a response. Pinging is currently not allowed.

*Required Fields