Getting the sizes of Top level Directories in an AWS S3 Bucket with Boto3

I was recently asked to create a report showing the total files within the top level folders and all the subdirs under the folder in our S3 Buckets.

S3 bucket ‘files’ are objects that will return a key that contains the path where the object is stored within the bucket.
I came up with this function to take a bucket and iterate over the objects within the bucket. For each item, the key is examined and added to a running total kept in a dictionary.

Here’s what I ended up with.

def get_top_dir_size_summary(bucket_to_search):
    """
    This function takes in the name of an s3 bucket and returns a dictionary
    containing the top level dirs as keys and total filesize and value.
    :param bucket_to_search: a String containing the name of the bucket
    """
    # Setup the output dictionary for running totals
    dirsizedict = {}
    # Create 1 entry for '.' to represent the root folder instead of the default.
    dirsizedict['.'] = 0

    # ------------
    # Setup the AWS Res. and Clients
    s3 = boto3.resource('s3')
    s3client = boto3.client('s3')

    # This is a check to ensure a bad bucket name wasn't passed in.   I'm sure there is a better
    # way to check this.   If you have a better method, please comment on the article. 
    try:
        response = s3client.head_bucket(Bucket=bucket_to_search)
    except:
        print('Bucket ' + bucket_to_search + ' does not exist or is unavailable. - Exiting')
        quit()

    # since buckets could have more than 1000 items, have to use paginator to iterate 1000 at a time
    paginator = s3client.get_paginator('list_objects')
    pageresponse = paginator.paginate(Bucket=bucket_to_search)

    # iterate through each object in the bucket through the paginator.
    for pageobject in pageresponse:

        # Check to see of a buckets has contents, without this an empty bucket would throw an error. 
        if 'Contents' in pageobject.keys():

            # if there are contents, then iterate through each 'file'.
            for file in pageobject['Contents']:
                itemtocheck = s3.ObjectSummary(bucket_to_search, file['Key'])

                # Get Top level directory from the file by splitting the key. 
                keylist = file['Key'].split('/')

                # See if file is on root, if keylist has 1 item (root dir), there are no dirs on item
                if len(keylist) == 1:
                    dirsizedict['.'] += itemtocheck.size
                else:
                    # Not root, check if key already exists, create it needed, and add value otherwise
                    # Just add the value to the running total
                    if keylist[0] in dirsizedict:
                        dirsizedict[keylist[0]] += itemtocheck.size
                    else:
                        dirsizedict[keylist[0]] = itemtocheck.size

    return dirsizedict

That script is probably a little rough to an elite coder, so if you have any thoughts on improvement, let me hear them.

Tagged , , , . Bookmark the permalink.

5 Responses to Getting the sizes of Top level Directories in an AWS S3 Bucket with Boto3

  1. siva says:

    Hi,
    Your work in this is awesome, helping a lot to move forward.
    Can you please give a hint on how to extract “security group ID whose cidrIP is 0.0.0.0/0 in IpRanges in IpPermissions, from clouttrail log which is in JSON format using boto3 and python”. I tried all the ways but unable to move forward. Thanks in advance.

    • mike says:

      I think you are trying to find sec groups with an allow all using 0.0.0.0/0. Why not iterate over all groups in the account and check each rule in each group for a cidr of 0.0.0.0/0

  2. Mapes says:

    Hey thanks I know this is kinda old but, it helped me

  3. gewinnspiel says:

    Wrikte pr announcements aЬout the launch of one’s business, the provision of your givgeaway item, usesful tips and tie-ins
    with trends or breaking neԝs. Thenn choose the best methods for ϲustomers
    to siցn upp оn thiѕ game. Statistics reveal that
    60 entraants usuɑlly add the novel being given away to their ‘to-read’ lists and yes it does not matter what number off copies you decide tto donate for your contest.

  4. gewinnspiele says:

    Write press releases about the launch of your respective
    business, the availaƄility of yoսr respective giveаway item,
    uusefսl tips and tіe-іns with trends or breaking news.
    Diѕtribute these thrоսgh pr release distribution services annd email the crooks to any
    local medija and also specialty magazines associated wіth your іndustry.
    Takke the sаme 200 people and suggest to thdm 7-10 emails аnd you
    ѡill be more inclined to generate sales.

Leave a Reply to mike Cancel reply

Your email address will not be published. Required fields are marked *

Solve : *
15 − 14 =