I was recently asked to create a report showing the total files within the top level folders and all the subdirs under the folder in our S3 Buckets.
S3 bucket ‘files’ are objects that will return a key that contains the path where the object is stored within the bucket.
I came up with this function to take a bucket and iterate over the objects within the bucket. For each item, the key is examined and added to a running total kept in a dictionary.
Here’s what I ended up with.
def get_top_dir_size_summary(bucket_to_search):
"""
This function takes in the name of an s3 bucket and returns a dictionary
containing the top level dirs as keys and total filesize and value.
:param bucket_to_search: a String containing the name of the bucket
"""
# Setup the output dictionary for running totals
dirsizedict = {}
# Create 1 entry for '.' to represent the root folder instead of the default.
dirsizedict['.'] = 0
# ------------
# Setup the AWS Res. and Clients
s3 = boto3.resource('s3')
s3client = boto3.client('s3')
# This is a check to ensure a bad bucket name wasn't passed in. I'm sure there is a better
# way to check this. If you have a better method, please comment on the article.
try:
response = s3client.head_bucket(Bucket=bucket_to_search)
except:
print('Bucket ' + bucket_to_search + ' does not exist or is unavailable. - Exiting')
quit()
# since buckets could have more than 1000 items, have to use paginator to iterate 1000 at a time
paginator = s3client.get_paginator('list_objects')
pageresponse = paginator.paginate(Bucket=bucket_to_search)
# iterate through each object in the bucket through the paginator.
for pageobject in pageresponse:
# Check to see of a buckets has contents, without this an empty bucket would throw an error.
if 'Contents' in pageobject.keys():
# if there are contents, then iterate through each 'file'.
for file in pageobject['Contents']:
itemtocheck = s3.ObjectSummary(bucket_to_search, file['Key'])
# Get Top level directory from the file by splitting the key.
keylist = file['Key'].split('/')
# See if file is on root, if keylist has 1 item (root dir), there are no dirs on item
if len(keylist) == 1:
dirsizedict['.'] += itemtocheck.size
else:
# Not root, check if key already exists, create it needed, and add value otherwise
# Just add the value to the running total
if keylist[0] in dirsizedict:
dirsizedict[keylist[0]] += itemtocheck.size
else:
dirsizedict[keylist[0]] = itemtocheck.size
return dirsizedict
That script is probably a little rough to an elite coder, so if you have any thoughts on improvement, let me hear them.


Hi,
Your work in this is awesome, helping a lot to move forward.
Can you please give a hint on how to extract “security group ID whose cidrIP is 0.0.0.0/0 in IpRanges in IpPermissions, from clouttrail log which is in JSON format using boto3 and python”. I tried all the ways but unable to move forward. Thanks in advance.
I think you are trying to find sec groups with an allow all using 0.0.0.0/0. Why not iterate over all groups in the account and check each rule in each group for a cidr of 0.0.0.0/0
Hey thanks I know this is kinda old but, it helped me
2020 and still great. Consider updating if you ever get the chance 🙂
Script is not working
Script worked in my test VM. What error are you seeing?