Getting the Size of an S3 Bucket using Boto3 for AWS

I’m writing this on 9/14/2016. I make note of the date because the request to get the size of an S3 Bucket may seem a very important bit of information but AWS does not have an easy method with which to collect that info. I fully expect them to add that functionality at some point. As of this date, I could only come up with 2 methods to get the size of a bucket. One could list of all bucket items and iterate over all the objects while keeping a running total. That method does work, but I found that for a bucket with many thousands of items, this method could take hours per bucket.

A better method uses AWS Cloudwatch logs instead. When an S3 bucket is created, it also creates 2 cloudwatch metrics and I use that to pull the Average size over a set period, usually 1 day.

Here’s what I came up with:

 
import boto3
import datetime

now = datetime.datetime.now()

cw = boto3.client('cloudwatch')
s3client = boto3.client('s3')

# Get a list of all buckets
allbuckets = s3client.list_buckets()

# Header Line for the output going to standard out
print('Bucket'.ljust(45) + 'Size in Bytes'.rjust(25))

# Iterate through each bucket
for bucket in allbuckets['Buckets']:
    # For each bucket item, look up the cooresponding metrics from CloudWatch
    response = cw.get_metric_statistics(Namespace='AWS/S3',
                                        MetricName='BucketSizeBytes',
                                        Dimensions=[
                                            {'Name': 'BucketName', 'Value': bucket['Name']},
                                            {'Name': 'StorageType', 'Value': 'StandardStorage'}
                                        ],
                                        Statistics=['Average'],
                                        Period=3600,
                                        StartTime=(now-datetime.timedelta(days=1)).isoformat(),
                                        EndTime=now.isoformat()
                                        )
    # The cloudwatch metrics will have the single datapoint, so we just report on it. 
    for item in response["Datapoints"]:
        print(bucket["Name"].ljust(45) + str("{:,}".format(int(item["Average"]))).rjust(25))
        # Note the use of "{:,}".format.   
        # This is a new shorthand method to format output.
        # I just discovered it recently. 
Tagged , , , . Bookmark the permalink.

6 Responses to Getting the Size of an S3 Bucket using Boto3 for AWS

  1. work tree says:

    Oh my goodness! Incredible article dude! Thank you so much,
    However I am encountering issues with your
    RSS. I don?t understand why I am unable to join it. Is there anybody
    else getting similar RSS issues? Anyone who knows the
    solution can you kindly respond? Thanx!!

  2. Varun says:

    Works like a charm. Awesome.

  3. Mat says:

    Well, this saved my day! Thank you very much! 🙂

  4. Andy says:

    This is exactly what I needed . thanks so much

  5. Basavaraj says:

    its really Awesome 🙂 Can we send the print output on e mail using boto.ses, i am new to python if you can share the code it will be a grate help.

    • mike says:

      To send mail through SES, you dont need a Boto3 call. SES, once setup, is just another SMTP email server. You would sendmail to the SES host just like you would any other mail server. You have the SES host, port, id, and password.

Leave a Reply

Your email address will not be published. Required fields are marked *

Solve : *
25 + 5 =