gon infra

Provision and manage AWS infrastructure for GON deployments. Each command is an opinionated wizard with cost estimates, account-mismatch protection, and a one-week-recoverable destroy path. The family covers compute (infra:aws-ec2), object storage (infra:aws-s3), audit logging (infra:aws-cloudtrail), monitoring (infra:aws-alarms), and cost guardrails (infra:aws-budget).

Why this family exists

Setting up these services in the AWS console takes dozens of clicks per service and is exactly the kind of thing nobody remembers between uses (Block Public Access flags? bucket policies? CloudTrail data events? alarm thresholds?). gon infra:* reduces each one to a single command with secure-by-default settings, visible monthly cost, and a confirmation gate before any AWS API call.

Prerequisites

aws CLI in PATH (brew install awscli / winget install Amazon.AWSCLI)
terraform in PATH (brew install hashicorp/tap/terraform)
An AWS account with credentials configured (aws configure or AWS SSO)
IAM permissions covering ec2:*, s3:*, iam:*, cloudtrail:*, cloudwatch:*, sns:*, budgets:*, ssm:GetParameter*, sts:GetCallerIdentity

Run gon infra:doctor first — it checks all of the above and prints a tailored fix walkthrough for whatever is missing, including the full IAM policy JSON.

Typical workflow (greenfield AWS account)

gon infra:doctor — verify aws + terraform + IAM perms
gon infra:aws-budget — set this first: free, $0/mo, prevents surprise bills
gon infra:aws-cloudtrail — audit logging baseline (~$2/mo, compliance-ready)
gon infra:aws-ec2 — server (~$15/mo for small) + auto-chain to gon server:setup
gon infra:aws-alarms — attach monitoring to the EC2 instance (~$0.20/mo)
gon infra:aws-s3 — object storage for Laravel uploads (~$1/mo for small bucket)
gon infra:dns example.com --alias=aws-prod-1 — point a domain
gon server:add-project — deploy your project (see gon server)
gon infra:list — see everything with current cost

infra:doctor

Diagnose AWS prerequisites and walk through credential setup. Always start here on a fresh machine — or whenever you come back to gon infra:* after a few months and don't remember how the IAM user was set up.

gon infra:doctor                          # Check tools, auth, profiles, default VPC
gon infra:doctor --profile=staging        # Test a non-default profile
gon infra:doctor --region=eu-west-1       # Check default VPC in another region
gon infra:doctor --guide                  # Always print the full setup guide

What it checks

Tools — aws CLI and terraform installed (with version + path)
Authentication — sts:GetCallerIdentity succeeds; account ID, alias, ARN
IAM — iam:ListAccountAliases permission (best-effort)
Region — default VPC present in the chosen region (the wizard requires one)
Local profiles — what's in ~/.aws/credentials + ~/.aws/config

Setup walkthrough

When something fails, the doctor prints copy-pasteable steps for both common credential paths (IAM user + access key, or AWS SSO / IAM Identity Center) plus a ready-to-paste minimum IAM policy covering every infra:aws-* command. With --guide it shows the full walkthrough regardless of state.

infra:aws-ec2

Interactive wizard that provisions a minimal AWS stack (single EC2, Elastic IP, security group, key pair) and hands off to gon server:setup. Five steps: identity confirm, configuration, plan + cost, terraform apply, server:setup auto-chain.

gon infra:aws-ec2                          # Full interactive wizard
gon infra:aws-ec2 --dry-run                # Resolve config + show plan + cost, no AWS changes
gon infra:aws-ec2 --no-setup               # Provision but skip the server:setup chain
 
# Fully scripted (for CI, no prompts):
gon infra:aws-ec2 \
  --alias=aws-prod-1 \
  --region=eu-central-1 \
  --size=small \
  --disk=30 \
  --ssh-key=~/.ssh/id_ed25519.pub \
  --ssh-cidr=78.123.45.67/32 \
  --email=ssl@example.com \
  --no-interaction

What it creates

EC2 instance (Ubuntu 24.04 LTS, encrypted gp3 root, IMDSv2 required)
Elastic IP attached to the instance — static public IP for DNS
Security group: SSH from your IP only, HTTP/HTTPS from anywhere
SSH key pair (imports your ~/.ssh/id_ed25519.pub or generates a fresh one)

Resources are tagged ManagedBy=gon-cli + GonAlias=<alias>.

Wizard choices (kept narrow on purpose)

Region — eu-central-1 / eu-west-1 / eu-north-1 / us-east-1 / us-west-2
Size — small (t4g.small, ~$13/mo) / medium (t4g.medium, ~$26/mo) / large (t4g.large, ~$52/mo)
Disk — 30 / 50 / 100 GB gp3
SSH key — existing ~/.ssh/id_ed25519.pub, generate new, or custom path
SSH ingress — your detected public IP (default, recommended) or 0.0.0.0/0

CPU architecture auto-detect

The wizard probes the GHCR manifest for rozklad/gon-base: multi-arch → defaults to arm64 (t4g.* — ~20% cheaper); amd64-only → uses amd64 (t3.*); probe failed → asks explicitly. Override with --arch=arm64 or --arch=amd64.

infra:aws-s3

Create a private S3 bucket + scoped IAM user with access keys, ready for Laravel filesystems.disks.s3. Output is a copy-pasteable .env block with AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION, AWS_BUCKET, AWS_URL.

gon infra:aws-s3                            # Interactive wizard
gon infra:aws-s3 --dry-run                  # Plan + cost, no creation
gon infra:aws-s3 --scope=per-env \
  --project=myapp --env=staging             # Scripted: per-env bucket
gon infra:aws-s3 --scope=standalone \
  --bucket=my-shared-assets-bucket          # Standalone bucket, custom name

Three scopes (wizard offers a picker)

per-env (recommended) — one bucket per project environment. gon-myapp-staging-storage-<account> + gon-myapp-production-storage-<account> are separate, so a staging deploy can never overwrite prod uploads.
per-project — single bucket shared across envs (cheaper, but the same staging-overwrites-prod risk applies).
standalone — bucket independent of any gon project, named freely. Useful for shared assets, database dumps, cross-team buckets.

Security defaults (no opt-out)

All four BlockPublicAccess flags ON (BlockPublicAcls + IgnorePublicAcls + BlockPublicPolicy + RestrictPublicBuckets)
SSE-S3 (AES256) encryption with bucket key enabled
Versioning enabled (override with --no-versioning) — recovery from accidental deletes
Intelligent-Tiering after 90 days (override with --no-tiering) — auto-cheaper for cold data
CORS permissive on origins (so Laravel presigned-URL browser uploads work)

IAM user is bucket-scoped

The generated IAM user (gon-<short-id>-s3) gets an inline policy with s3:ListBucket + GetBucketLocation on the bucket itself, plus GetObject / PutObject / DeleteObject + variants on arn:aws:s3:::<bucket>/*. Zero access to any other AWS resource. If the access key leaks, the blast radius is one bucket.

Credentials output

Print-only by design — no auto-write to local .env (cross-project contamination risk) or push to remote (would need the project to already exist). The Secret Access Key is shown ONCE; if you lose it, rotate via AWS Console. The wizard hints at gon env:set commands you can paste to push the values to a server later.

infra:aws-cloudtrail

Set up an account-wide multi-region audit trail with a dedicated S3 log bucket. One trail captures every AWS API call across every region into s3://gon-cloudtrail-<account>-<region>/AWSLogs/....

gon infra:aws-cloudtrail                    # Defaults: management events only, no insights, no data events
gon infra:aws-cloudtrail --insights         # + Insights ($0.35 per 100k events analyzed)
gon infra:aws-cloudtrail --data-events      # + S3 object access logging (cost trap warning!)
gon infra:aws-cloudtrail --region=us-east-1
gon infra:aws-cloudtrail --dry-run

Conservative defaults (cost-aware)

Management events ONLY — every IAM change, every EC2 launch, every S3 bucket create. First copy is free per account.
Data events OFF — opt-in via --data-events. S3 object reads/writes generate one log event each → buckets with heavy traffic can rack up hundreds of dollars.
Insights OFF — opt-in via --insights. Anomaly detection at $0.35 per 100k events analyzed.
Tamper detection ON — enable_log_file_validation writes digest files to the bucket so log tampering is detectable.
Glacier IR transition after 30 days — Standard storage for the recent month, ~80% cheaper Glacier IR for older logs.

Bucket policy

The dedicated log bucket has a tightly-scoped policy permitting only cloudtrail.amazonaws.com as service principal, only s3:GetBucketAcl + s3:PutObject, only into the AWSLogs/<account>/* prefix, only with bucket-owner-full-control ACL, and only from this trail's ARN. No cross-account access possible.

infra:aws-alarms

Attach two CloudWatch alarms to a gon-managed EC2 instance: CPU > 80% sustained 15 min (warning), and StatusCheckFailed > 0 for 5 min (critical, instance unreachable). Both feed a shared gon-alarms SNS topic with email subscription.

gon infra:aws-alarms                                  # Interactive: picks from gon-managed instances
gon infra:aws-alarms --instance=aws-prod-1 \
  --email=ops@example.com                             # Use server alias from servers.json
gon infra:aws-alarms --instance=i-0abc... \
  --email=ops@example.com --region=eu-central-1       # Raw instance ID
gon infra:aws-alarms --cpu-threshold=90 \
  --instance=aws-prod-1 --email=ops@example.com       # Tighter CPU threshold
gon infra:aws-alarms --dry-run

Why these two metrics

CPUUtilization — 3 datapoints × 5 min = 15-minute window. Catches runaway processes / under-provisioning while tolerating short scheduled-job spikes.
StatusCheckFailed — first datapoint, 5 min window. Fires when AWS's own checks fail (kernel hung, network unreachable, hardware failure). The box is effectively down; alert immediately.

Memory and disk alarms intentionally omitted — they require the CloudWatch Agent installed on the host (cwagent), planned for a future infra:aws-cwagent command.

Shared SNS topic

Every infra:aws-alarms run targets one topic per account: gon-alarms. Each call adds an email subscription to it. You must click the AWS confirmation link in the email within 3 days, or alarms fire silently with no notifications going out. gon infra:list can't detect the pending state — re-check by visiting AWS Console > SNS > Topics > gon-alarms > Subscriptions.

infra:aws-budget

Set a monthly USD spend cap with email notifications at 50%, 80%, 100% (actual), and 100% (forecast). The forecast alert is the safety net — AWS predicts month-end spend 1-2 weeks ahead and fires when that prediction crosses 100%, giving runway to throttle before the actual breach.

gon infra:aws-budget                                  # Interactive
gon infra:aws-budget --limit=50 \
  --email=jan@example.com                             # Scripted, single subscriber
gon infra:aws-budget --limit=200 \
  --email=ops@example.com,ceo@example.com             # Multiple subscribers
gon infra:aws-budget --name=staging-cap --limit=20 \
  --email=ops@example.com

Why this matters

One forgotten --data-events on CloudTrail, one busy spider hitting a non-cached endpoint, one accidentally-exposed S3 bucket: a $5/month account quietly becomes $500. Budget is free ($0/mo, AWS gives the first two budgets per account on the free tier) and the four notifications give multiple chances to react before the surprise.

Notification thresholds

50% actual → early warning ("you're running hotter than expected this month")
80% actual → approaching cap ("look at where the money's going")
100% actual → cap reached ("the thing you said you'd cap at $50 is now at $50")
100% forecast → AWS predicts month will end above cap ("you'll cross $50 by end of month at this rate")

No auto-shutdown — Budget only sends email. Hard caps via Budget Actions are out of scope (require IAM role + lambda) but doable manually in AWS Console.

infra:list

Show every gon-managed AWS resource with its current cost estimate. Two tables: SERVERS (from infra:aws-ec2) and AWS RESOURCES (S3, CloudTrail, alarms, budget).

gon infra:list                # Local servers.json view + cost estimates
gon infra:list --check        # Also call AWS to verify each instance still exists
gon infra:list --check --profile=prod

--check calls ec2:DescribeInstances per row to detect drift (someone terminated an instance from the AWS console). Without it, the list is read entirely from ~/.gon/servers.json (offline-capable). The AWS RESOURCES table doesn't currently support drift check — drift detection per resource type is planned for a future iteration.

Sample output

SERVERS
ALIAS         PROVIDER  REGION         IP             INSTANCE     COST/MO    STATE
aws-prod-1    aws_ec2   eu-central-1   3.121.45.78    t4g.small    ~$15.56    tracked
 
AWS RESOURCES
ID                          TYPE         REGION         COST/MO     STATE
budget-095713295289        budget       (account)      $0.00       tracked
cloudtrail-095713295289    cloudtrail   eu-central-1   ~$0.05      tracked
alarms-i-0b823ba17...      ec2_alarms   eu-central-1   $0.20       tracked
s3-myapp-staging           s3_bucket    eu-central-1   ~$1.23      tracked

infra:dns

Print copy-pasteable DNS records for pointing a domain at a gon-managed server. Read-only — never edits DNS for you.

gon infra:dns example.com                                 # Generic zone-file format
gon infra:dns example.com --provider=cloudflare           # Cloudflare-specific UI walkthrough
gon infra:dns example.com --provider=route53              # AWS Route 53 commands + change-batch JSON
gon infra:dns example.com --alias=aws-prod-1              # Pick server when more than one is managed
gon infra:dns example.com --ip=1.2.3.4                    # Bypass servers.json (escape hatch)
gon infra:dns example.com --ttl=60                        # Custom TTL (default 300s)
gon infra:dns example.com --check                         # dig the live records, report drift

Records it recommends

Two A records — apex + wildcard — both pointing at the server's Elastic IP:

example.com.        300  IN  A  3.121.45.78
*.example.com.      300  IN  A  3.121.45.78

The wildcard handles every subdomain (app.example.com, staging.example.com, …) without extra records because Traefik does per-host HTTP-01 ACME challenges, not DNS-01.

Cloudflare gotcha

--provider=cloudflare spells out the Proxy = DNS only (grey cloud) requirement: Cloudflare's orange-cloud proxy terminates TLS at the edge, which prevents Let's Encrypt's HTTP-01 challenge from reaching Traefik on port 80. Switch to orange + SSL mode "Full (strict)" only after the first cert is issued.

Route 53 helper

--provider=route53 prints aws route53 create-hosted-zone + change-resource-record-sets commands and stashes a ready-to-use change-batch JSON file in /tmp/gon-dns-<hash>.json.

infra:destroy

Tear down a gon-managed AWS stack or AWS resource. Same command for both — pass either an EC2 server alias or an AWS resource ID. Multi-step confirmation with type-specific safety gates.

# EC2 server stack
gon infra:destroy aws-prod-1                # Interactive: prints resources, requires typing alias
gon infra:destroy aws-prod-1 --yes          # Skip the typed-alias confirmation
gon infra:destroy aws-prod-1 --profile=prod # Use a non-default profile
 
# AWS resources (S3 / CloudTrail / Alarms / Budget) — same command, resource ID instead of alias
gon infra:destroy budget-095713295289       # Free, simplest
gon infra:destroy cloudtrail-095713295289   # Trail stops; log bucket survives unless empty
gon infra:destroy alarms-i-0b823ba17...     # Drops 2 alarms; shared SNS topic stays
gon infra:destroy s3-myapp-staging --force-empty   # --force-empty purges bucket first

Safety gates

AWS account confirm — refuses to proceed if sts:GetCallerIdentity doesn't match the account where the resource was created.
Type-specific risk reminder — S3: data + IAM user + access keys gone. CloudTrail: trail stops, audit history in bucket survives. Alarms: instance loses monitoring. Budget: cost cap notifications stop.
Project warning (EC2 only) — if any servers.json projects are deployed on this host, they're listed before the prompt.
Typed ID confirmation — you must type the alias or resource ID verbatim. --yes skips this gate (still prints the account check).

S3 force-empty

Terraform refuses to delete a non-empty S3 bucket — the destroy fails with BucketNotEmpty. Pass --force-empty to run aws s3 rm s3://<bucket> --recursive first. This is irreversible — bucket data is gone, versions and all. CloudTrail's log bucket has force_destroy=false hardcoded so you can't accidentally wipe audit history with a single typo.

Recoverable archive

Instead of rm -rf, the state directory is moved to ~/.gon/infra/_archived/<id>-<timestamp>/. If you destroy the wrong stack, the Terraform state is still there — re-create the resources by re-running the wizard with the same alias and copying the archived state back. Auto-cleanup is not implemented; manually rm archives older than a week or so.

State storage

Everything is local to your home directory. No cloud backend by design — single-user MVP.

~/.gon/
├── config.json                       # GitHub auth (existing)
├── servers.json                      # Server registry (extended with `infra` block + `aws_resources` top-level key)
└── infra/
    ├── <ec2-alias>/                  # one dir per EC2 stack
    │   ├── main.tf, variables.tf, outputs.tf, versions.tf
    │   ├── terraform.tfstate, terraform.tfstate.backup
    │   ├── terraform.tfvars
    │   ├── snapshots/<timestamp>.tfstate    # auto-snapshot before each apply / destroy
    │   ├── id_ed25519, id_ed25519.pub       # only when "generate new" was picked
    ├── s3-<short-id>/                # one dir per S3 bucket
    ├── cloudtrail-<account>/         # one dir per trail
    ├── alarms-<instance-id>/         # one dir per alarms set
    ├── budget-<account>/             # one dir per budget
    └── _archived/<id>-<timestamp>/   # destroyed stacks land here, recoverable

If you reinstall your OS without backing this up, you lose the ability to terraform destroy from gon. Manual cleanup via the AWS console (or per-service aws ... CLI) still works — every resource carries the ManagedBy=gon-cli tag. aws_resources in servers.json is keyed by canonical IDs (s3-myapp-staging, budget-095713295289, …) so cross-references with infra:list output are unambiguous.

Out of scope (today)

AWS RDS — managed PostgreSQL/MySQL. Bigger surface (subnet groups, parameter groups, automated snapshots) — separate release planned.
AWS SES — transactional email. Sandbox→Production approval is an out-of-band manual process AWS doesn't fully automate.
AWS Backup — automated EBS snapshots for EC2.
AWS Secrets Manager — overlaps with gon secrets on the gon registry; possibly a bridge command later.
AWS Chatbot (Slack) — alternative to email subscription on alarms; nice-to-have follow-up.
CloudWatch Agent for memory / disk alarms — needs in-host install; future infra:aws-cwagent.
S3 backend for Terraform state — local only, single user.
Other providers (Hetzner, DigitalOcean) — architecture supports them but no command yet.
AWS Pricing API — hardcoded snapshot, refreshed per release.