Getting the Size of an S3 Bucket using Boto3 for AWS

I’m writing this on 9/14/2016. I make note of the date because the request to get the size of an S3 Bucket may seem a very important bit of information but AWS does not have an easy method with which to collect that info. I fully expect them to add that functionality at some point. As of this date, I could only come up with 2 methods to get the size of a bucket. One could list all bucket items and iterate over all the objects while keeping a running total. That method does work, but I found that for a bucket with many thousands of items, this method could take hours per bucket.

A better method uses AWS Cloudwatch logs instead. When an S3 bucket is created, it also creates 2 cloudwatch metrics and I use that to pull the Average size over a set period, usually 1 day.

Here’s what I came up with:

 
import boto3
import datetime

now = datetime.datetime.now()

cw = boto3.client('cloudwatch')
s3client = boto3.client('s3')

# Get a list of all buckets
allbuckets = s3client.list_buckets()

# Header Line for the output going to standard out
print('Bucket'.ljust(45) + 'Size in Bytes'.rjust(25))

# Iterate through each bucket
for bucket in allbuckets['Buckets']:
    # For each bucket item, look up the cooresponding metrics from CloudWatch
    response = cw.get_metric_statistics(Namespace='AWS/S3',
                                        MetricName='BucketSizeBytes',
                                        Dimensions=[
                                            {'Name': 'BucketName', 'Value': bucket['Name']},
                                            {'Name': 'StorageType', 'Value': 'StandardStorage'}
                                        ],
                                        Statistics=['Average'],
                                        Period=3600,
                                        StartTime=(now-datetime.timedelta(days=1)).isoformat(),
                                        EndTime=now.isoformat()
                                        )
    # The cloudwatch metrics will have the single datapoint, so we just report on it. 
    for item in response["Datapoints"]:
        print(bucket["Name"].ljust(45) + str("{:,}".format(int(item["Average"]))).rjust(25))
        # Note the use of "{:,}".format.   
        # This is a new shorthand method to format output.
        # I just discovered it recently. 
Tagged , , , . Bookmark the permalink.

17 Responses to Getting the Size of an S3 Bucket using Boto3 for AWS

  1. work tree says:

    Oh my goodness! Incredible article dude! Thank you so much,
    However I am encountering issues with your
    RSS. I don?t understand why I am unable to join it. Is there anybody
    else getting similar RSS issues? Anyone who knows the
    solution can you kindly respond? Thanx!!

  2. Varun says:

    Works like a charm. Awesome.

  3. Mat says:

    Well, this saved my day! Thank you very much! 🙂

  4. Andy says:

    This is exactly what I needed . thanks so much

  5. Basavaraj says:

    its really Awesome 🙂 Can we send the print output on e mail using boto.ses, i am new to python if you can share the code it will be a grate help.

    • mike says:

      To send mail through SES, you dont need a Boto3 call. SES, once setup, is just another SMTP email server. You would sendmail to the SES host just like you would any other mail server. You have the SES host, port, id, and password.

  6. Andrew FIgaroa says:

    Wow!!! this is awesome! just what I needed man….

  7. Andrew FIgaroa says:

    Wow!!! this is awesome! just what I needed man….

  8. eswanth says:

    Thats Great work bro But i want to put this in csv file how can i do it

  9. Manasi says:

    This is Awesome!!!

  10. Sam says:

    Thanks Mike! Your script saved several hours of pain for me.

  11. pms says:

    its not showing anything in output

    pms@:~/asset-python-script$ python s3size.py
    Bucket Size in Bytes
    pms@:~/asset-python-script$

  12. David Frey says:

    First, this is a great method for tackling the problem. Thank you.

    One thing, I noticed that at times, I would get no data back. I determined that this was based on the time of day that I ran the report. You can figure out when your data is most complete or if you run this ad-hoc, you can change the StartTime from ‘days=1’ to ‘days=2’. The data will not be quite the same but did some tweaking to get this working. I’m very happy with it.

    Thank you.

  13. Sahil says:

    same issue happening with me, it is not able to pull the metrics.

  14. Google is my Daddy says:

    Damn, hit the jackpot. This is f**king awesome. Bro, I love you for this.

  15. sam says:

    for some buckets its missing i.e if only some buckets have cw metrics or there could be other storage type buckets

Leave a Reply to Sahil Cancel reply

Your email address will not be published. Required fields are marked *

Solve : *
24 − 4 =