Getting the Size of an S3 Bucket using Boto3 for AWS

I’m writing this on 9/14/2016. I make note of the date because the request to get the size of an S3 Bucket may seem a very important bit of information but AWS does not have an easy method with which to collect that info. I fully expect them to add that functionality at some point. As of this date, I could only come up with 2 methods to get the size of a bucket. One could list of all bucket items and iterate over all the objects while keeping a running total. That method does work, but I found that for a bucket with many thousands of items, this method could take hours per bucket.

A better method uses AWS Cloudwatch logs instead. When an S3 bucket is created, it also creates 2 cloudwatch metrics and I use that to pull the Average size over a set period, usually 1 day.

Here’s what I came up with:

 
import boto3
import datetime

now = datetime.datetime.now()

cw = boto3.client('cloudwatch')
s3client = boto3.client('s3')

# Get a list of all buckets
allbuckets = s3client.list_buckets()

# Header Line for the output going to standard out
print('Bucket'.ljust(45) + 'Size in Bytes'.rjust(25))

# Iterate through each bucket
for bucket in allbuckets['Buckets']:
    # For each bucket item, look up the cooresponding metrics from CloudWatch
    response = cw.get_metric_statistics(Namespace='AWS/S3',
                                        MetricName='BucketSizeBytes',
                                        Dimensions=[
                                            {'Name': 'BucketName', 'Value': bucket['Name']},
                                            {'Name': 'StorageType', 'Value': 'StandardStorage'}
                                        ],
                                        Statistics=['Average'],
                                        Period=3600,
                                        StartTime=(now-datetime.timedelta(days=1)).isoformat(),
                                        EndTime=now.isoformat()
                                        )
    # The cloudwatch metrics will have the single datapoint, so we just report on it. 
    for item in response["Datapoints"]:
        print(bucket["Name"].ljust(45) + str("{:,}".format(int(item["Average"]))).rjust(25))
        # Note the use of "{:,}".format.   
        # This is a new shorthand method to format output.
        # I just discovered it recently. 
Tagged , , , . Bookmark the permalink.

One Response to Getting the Size of an S3 Bucket using Boto3 for AWS

  1. work tree says:

    Oh my goodness! Incredible article dude! Thank you so much,
    However I am encountering issues with your
    RSS. I don?t understand why I am unable to join it. Is there anybody
    else getting similar RSS issues? Anyone who knows the
    solution can you kindly respond? Thanx!!

Leave a Reply

Your email address will not be published. Required fields are marked *

No bots allowed * Time limit is exhausted. Please reload the CAPTCHA.