What is AWS S3?

Amazon S3 (Simple Storage Service) is an object storage service designed to store and retrieve any amount of data from anywhere on the Internet. It is one of the fundamental AWS services.

Main Features

1. Durability and Availability

  • Durability: 99.999999999% (11 nines)
  • Availability: 99.99%
  • Automatic replication across multiple availability zones

2. Scalability

  • Virtually unlimited capacity
  • No need to provision capacity
  • Automatically scales according to demand

3. Security

  • Encryption in transit (SSL/TLS)
  • Encryption at rest (SSE-S3, SSE-KMS)
  • Granular access control (IAM, Bucket Policies, ACLs)
  • Object versioning

Key Concepts

Buckets

Container for objects stored in S3:

1
2
3
4
5
# Create bucket
aws s3 mb s3://my-bucket --region us-east-1

# List buckets
aws s3 ls

Objects

Files and their metadata stored in S3:

1
2
3
4
5
6
7
8
# Upload file
aws s3 cp file.txt s3://my-bucket/

# Download file
aws s3 cp s3://my-bucket/file.txt ./

# Sync directory
aws s3 sync ./directory/ s3://my-bucket/

Storage Classes

ClassUseCostAvailability
StandardFrequent dataHigh99.99%
Intelligent-TieringVariable accessVariable99.9%
Standard-IAInfrequent accessMedium99.9%
One Zone-IANon-critical dataLow99.5%
GlacierLong-term archiveVery low99.99%
Glacier Deep ArchiveVery long-term archiveMinimum99.99%

Use Cases

1. Static Website Hosting

Perfect for sites generated with Hugo, Jekyll, etc:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Configure bucket for web hosting
aws s3 website s3://my-website/ \
    --index-document index.html \
    --error-document 404.html

# Policy for public access
cat > policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "PublicReadGetObject",
    "Effect": "Allow",
    "Principal": "*",
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::my-website/*"
  }]
}
EOF

aws s3api put-bucket-policy \
    --bucket my-website \
    --policy file://policy.json

2. Backup and Archive

1
2
3
4
# Backup with lifecycle rules
aws s3api put-bucket-lifecycle-configuration \
    --bucket my-backup \
    --lifecycle-configuration file://lifecycle.json

3. Data Lake

Centralized storage for analytics:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Configure for use with Athena
aws s3api put-bucket-encryption \
    --bucket my-datalake \
    --server-side-encryption-configuration '{
      "Rules": [{
        "ApplyServerSideEncryptionByDefault": {
          "SSEAlgorithm": "AES256"
        }
      }]
    }'

4. Content Distribution

Combined with CloudFront:

  • Images
  • Videos
  • Software downloads
  • Application updates

Security and Permissions

IAM Policies

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket/*"
    }
  ]
}

Bucket Policy

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowSSLRequestsOnly",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::my-bucket",
        "arn:aws:s3:::my-bucket/*"
      ],
      "Condition": {
        "Bool": {
          "aws:SecureTransport": "false"
        }
      }
    }
  ]
}

Versioning

Protection against accidental deletion:

1
2
3
4
5
6
7
# Enable versioning
aws s3api put-bucket-versioning \
    --bucket my-bucket \
    --versioning-configuration Status=Enabled

# List versions
aws s3api list-object-versions --bucket my-bucket

Lifecycle Policies

Automate transitions between storage classes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
  "Rules": [
    {
      "Id": "Archive-old-logs",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "logs/"
      },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        }
      ],
      "Expiration": {
        "Days": 365
      }
    }
  ]
}

Replication

Cross-Region Replication (CRR)

1
2
3
4
# Configure replication between regions
aws s3api put-bucket-replication \
    --bucket my-source-bucket \
    --replication-configuration file://replication.json

Events and Notifications

Trigger actions based on S3 events:

1
2
3
4
# Configure Lambda notification
aws s3api put-bucket-notification-configuration \
    --bucket my-bucket \
    --notification-configuration file://notification.json

Available events:

  • s3:ObjectCreated:*
  • s3:ObjectRemoved:*
  • s3:ObjectRestore:*
  • s3:Replication:*

Monitoring and Logs

Server Access Logs

1
2
3
4
5
6
7
8
9
# Enable access logs
aws s3api put-bucket-logging \
    --bucket my-bucket \
    --bucket-logging-status '{
      "LoggingEnabled": {
        "TargetBucket": "my-logs-bucket",
        "TargetPrefix": "access-logs/"
      }
    }'

CloudWatch Metrics

Automatic metrics:

  • BucketSizeBytes
  • NumberOfObjects
  • AllRequests
  • GetRequests
  • PutRequests

Cost Optimization

1. Use Appropriate Storage Classes

1
2
3
4
# Change storage class
aws s3 cp s3://my-bucket/file.txt \
    s3://my-bucket/file.txt \
    --storage-class INTELLIGENT_TIERING

2. Implement Lifecycle Policies

Automatically move old objects to cheaper classes.

3. Delete Old Versions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Rule to delete non-current versions after 30 days
{
  "Rules": [{
    "Id": "DeleteOldVersions",
    "Status": "Enabled",
    "NoncurrentVersionExpiration": {
      "NoncurrentDays": 30
    }
  }]
}

4. Compress Files

Reduce size of stored data:

1
2
tar -czf backup.tar.gz /data/
aws s3 cp backup.tar.gz s3://my-bucket/backups/

Best Practices

  1. Enable versioning for critical data
  2. Use encryption (SSE-S3 or SSE-KMS)
  3. Implement Lifecycle Policies to optimize costs
  4. Configure replication for disaster recovery
  5. Enable MFA Delete for critical buckets
  6. Use IAM roles instead of access keys
  7. Monitor with CloudWatch and CloudTrail
  8. Block public access by default

CloudFront Integration

Improve performance and security:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Create OAI (Origin Access Identity)
aws cloudfront create-cloud-front-origin-access-identity \
    --cloud-front-origin-access-identity-config '{
      "CallerReference": "my-site-'$(date +%s)'",
      "Comment": "OAI for my-website"
    }'

# Update bucket policy to allow only CloudFront
{
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity E1234567890ABC"
    },
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::my-website/*"
  }]
}

Useful Commands

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Calculate bucket size
aws s3 ls s3://my-bucket --recursive --human-readable --summarize

# Copy complete bucket
aws s3 sync s3://source-bucket s3://destination-bucket

# Delete all objects (be careful!)
aws s3 rm s3://my-bucket --recursive

# Make object temporarily public (presigned URL)
aws s3 presign s3://my-bucket/file.txt --expires-in 3600

Limits

  • Maximum object size: 5 TB
  • Maximum PUT size: 5 GB (use multipart upload for larger files)
  • Number of buckets per account: 100 (soft limit, can be increased)
  • No limit of objects per bucket

Comparison with Other Services

ServiceUseAccessCost
S3Object storageHTTP/API$$
EBSBlock storage for EC2Attached$$$
EFSShared file systemNFS$$$$
GlacierLong-term archiveRetrieval time$

References

Useful Tools

  • AWS CLI: Command line client
  • s3cmd: Alternative client
  • CloudBerry: GUI client
  • Cyberduck: FTP/SFTP/S3 client
  • Terraform: IaC for S3