AWS AI/ML Best Practices: SageMaker and Bedrock Architecture Guide

Technical TL;DR

AI/ML = Compute-Intensive + Expensive. Architecture matters more than elsewhere.

Key takeaways:

**SageMaker for custom ML** (end-to-end ML platform)

**Bedrock for generative AI** (foundation models via API)

**Spot instances for training** (90% cost savings)

**Multi-model endpoints** for inference (reduce costs)

**MLOps is mandatory** (reproducibility, monitoring, governance)

**Model monitoring** (data drift, prediction quality)

---

1. AI/ML Service Selection

1.1 Service Decision Framework

Start with business requirements, not technology.

| Use Case | Best Service | Why |

|----------|--------------|-----|

| **Custom ML model development** | SageMaker | End-to-end platform: label, train, deploy |

| **Generative AI (text, images)** | Bedrock | Foundation models via API, no training required |

| **Simple predictions (no ML team)** | Lookout for Metrics/Vision | Pre-built ML models, no data science required |

| **Real-time fraud/anomaly detection** | Fraud Detector | Pre-trained models, configure and deploy |

| **Personalization/recommendations** | Personalize | Fully managed, no model training required |

| **AI-powered search** | OpenSearch Service (vector search) | Semantic search without building models |

1.2 SageMaker vs. Bedrock: When to Use What

| Factor | SageMaker | Bedrock |

|--------|-----------|---------|

| **Use Case** | Custom ML models | Generative AI via foundation models |

| **Expertise Required** | Data science team | Prompt engineering, no ML background |

| **Training Required** | Yes (bring your own data) | No (models pre-trained) |

| **Deployment** | Managed endpoints | API calls (no infrastructure) |

| **Cost** | Pay for training + inference | Pay per API call |

| **Time to Production** | Weeks-months | Hours-days |

| **Best For** | Proprietary models, domain-specific | Generative AI, chatbots, content generation |

Decision Tree:

```yaml

Need to train custom model?

Yes → SageMaker

No → Need generative AI?

Yes → Bedrock

No → Consider pre-built AI services (Lookout, Personalize, etc.)

```

---

2. Amazon SageMaker Best Practices

2.1 SageMaker Architecture Overview

SageMaker provides end-to-end ML capabilities.

Core Components:

```yaml

SageMaker Studio: IDE for ML development (notebooks, experiments)

SageMaker Processing: Data preprocessing, feature engineering

SageMaker Training: Model training on managed infrastructure

SageMaker Clarify: Model explainability, bias detection

SageMaker Model Monitor: Production model monitoring

SageMaker Features: Feature store for reusable features

```

2.2 SageMaker Training: Spot Instances Mandatory

Training with Spot instances reduces costs by 90%.

Spot Training Configuration:

```yaml

Managed Spot Training: ENABLE (mandatory for non-urgent training)

Checkpointing:

- Required for spot (interruptible)

- Save model checkpoints to S3

- Resume from checkpoint if interrupted

Instance Strategy:

- Use Spot for 90%+ of training

- Use On-Demand for final checkpoint (if urgency required)

Cost Impact:

- On-Demand: $1/hour

- Spot: ~$0.10/hour (90% savings)

```

Example Configuration:

```python

# SageMaker Training Job with Spot

estimator = Estimator(

instance_count=2,

instance_type='ml.p3.2xlarge',

train_use_spot_instances=True, # Enable Spot

max_wait=3600, # Max wait time

max_run=3600, # Max training time

checkpoint_s3_uri='s3://bucket/checkpoints/',

)

```

2.3 SageMaker Instance Selection

Choose the right instance type for training and inference.

| Instance Type | Use Case | Cost/Hour |

|---------------|----------|-----------|

| **ml.c5** (compute-optimized) | CPU-based training, inference | $0.204 - $4.08 |

| **ml.p3** (GPU) | Deep learning training | $3.06 - $31.62 |

| **ml.p4** (GPU) | Large-scale deep learning | $11.90 - $39.69 |

| **ml.g4** (GPU) | Cost-effective GPU training | $0.526 - $4.881 |

| **ml.inf1** (Inferentia) | High-throughput inference | $0.154 - $2.292 |

| **ml.m5** (general purpose) | Small models, CPU inference | $0.135 - $4.00 |

Selection Guidelines:

```yaml

Training:

- Small models: ml.c5 (CPU)

- Deep learning: ml.g4dn (cost-effective GPU)

- Large models: ml.p3/p4 (high-performance GPU)

Inference:

- Low latency: ml.inf1 (Inferentia chips)

- Cost-effective: ml.c5 (CPU)

- GPU required: ml.g4dn (cheapest GPU)

```

2.4 SageMaker Hyperparameter Tuning

Automated hyperparameter tuning (HPO) finds optimal model parameters.

Tuning Configuration:

```yaml

Hyperparameter Tuning Jobs:

- Define search space (hyperparameter ranges)

- Select objective metric (validation accuracy, F1, etc.)

- Choose tuning strategy (Bayesian optimization, random search)

- Set max number of training jobs

- Enable parallel training (reduce tuning time)

Best Practices:

☐ Start with Bayesian optimization (smarter search)

☐ Enable early stopping (waste less time on bad configs)

☐ Use Spot instances for tuning jobs

☐ Set reasonable max jobs (10-50 typical)

```

2.5 SageMaker Endpoints: Multi-Model Deployment

Multi-model endpoints host multiple models on one endpoint.

When to Use Multi-Model Endpoints:

```yaml

✓ Multiple models sharing same framework (e.g., 10 TensorFlow models)

✓ Models not frequently used (avoid paying for idle endpoints)

✓ A/B testing models (host multiple versions)

✓ Regional personalization (different models per region)

```

Cost Impact:

```yaml

Single Model Endpoint:

- 10 models = 10 endpoints = 10x infrastructure cost

Multi-Model Endpoint:

- 10 models = 1 endpoint = 1x infrastructure cost

- Models loaded from S3 on-demand

- Savings: 80-90% for infrequently used models

```

2.6 SageMaker Model Monitor: Production Monitoring

Model monitoring is required for production ML systems.

What to Monitor:

```yaml

Data Quality:

- Missing values (vs. training baseline)

- Data distribution drift

- Feature attribution drift

Model Quality:

- Prediction accuracy (vs. ground truth)

- Prediction distribution drift

- Model bias detection

Alerting:

- Data drift > threshold

- Model quality degradation

- Feature attribution changes

```

Configuration:

```yaml

☐ Create baseline from training data (statistics, constraints)

☐ Schedule monitoring hourly/daily

☐ Configure CloudWatch alerts for violations

☐ Set up SNS notifications for drift detection

```

2.7 SageMaker Feature Store: Reusable Features

Feature Store eliminates duplicate feature engineering.

Benefits:

```yaml

☐ Single source of truth for features

☐ Reusable across models and teams

☐ Online + offline storage (low-latency + batch)

☐ Time travel (query historical feature values)

☐ Automatic metadata tracking

```

Implementation:

```yaml

Feature Groups:

- Define feature group (schema + record identifier)

- Ingest features (batch or streaming)

- Retrieve for training (offline store)

- Serve for inference (online store)

```

---

3. Amazon Bedrock Best Practices

3.1 Foundation Model Selection

Bedrock provides multiple foundation models via API.

|-------|----------|----------|----------------|

Model Selection Framework:

```yaml

Need long context (100K+ tokens)?

→ Claude (best for large documents)

Need AWS-native (data governance, HIPAA)?

→ Titan Text (AWS-owned)

Need open source (self-host, fine-tune)?

→ Llama 2

Need simple text generation?

→ Titan Text or Jurassic-2

```

3.2 Prompt Engineering Best Practices

Prompt quality determines output quality.

Prompt Engineering Principles:

```yaml

1. Be Specific: Clear instructions, not vague requests

2. Provide Context: Background information, examples

3. Specify Format: JSON, CSV, markdown, etc.

4. Set Constraints: Length limits, style guidelines

5. Use Examples: Few-shot learning (3-5 examples)

6. Chain of Thought: Ask model to explain reasoning

```

Example Prompt Pattern:

```yaml

Context: You are a AWS solutions architect helping with cost optimization.

Task: Analyze the following AWS bill and provide recommendations.

Input: [AWS cost data in JSON format]

Requirements:

- Focus on EC2, RDS, and S3 (70% of spend)

- Provide 3-5 specific recommendations

- Include estimated savings for each recommendation

- Output in markdown format with H2 headers

Output: [Model response]

```

3.3 RAG (Retrieval-Augmented Generation) Pattern

RAG combines LLMs with your proprietary data.

RAG Architecture:

```yaml

1. Ingestion:

- Document parsing (PDF, HTML, etc.)

- Text splitting into chunks

- Embedding generation (via Bedrock Titan Embeddings)

- Store in OpenSearch vector database

2. Retrieval:

- User question → embed query

- Semantic search in OpenSearch (find relevant chunks)

- Retrieve top K chunks (3-10 typical)

3. Generation:

- Combine user question + retrieved chunks

- Send prompt to Bedrock model (Claude, Titan)

- Return answer with source citations

```

Best Practices:

```yaml

☐ Chunk size: 500-1000 tokens (optimal for most models)

☐ Overlap: 20-30% between chunks (preserve context)

☐ Retrieval: Top 3-10 chunks (balance quality vs. cost)

☐ Citations: Always include source references

☐ Fallback: If no relevant chunks, ask user to rephrase

```

3.4 Bedrock Cost Optimization

Bedrock charges per token. Costs scale with usage.

Pricing (Example: Claude Instant):

```yaml

Input: $0.80 per million tokens

Output: $2.40 per million tokens

Typical API Call:

- Input: 1,000 tokens (prompt + context)

- Output: 500 tokens (response)

- Cost: $0.00080 + $0.00120 = $0.002 per call

10,000 calls/day = $20/day = $600/month

```

Cost Optimization Strategies:

```yaml

1. Use smaller models (Claude Instant vs. Claude)

2. Minimize context (only include relevant information)

3. Cache embeddings (avoid re-generating)

4. Use semantic search (retrieve only relevant chunks)

5. Rate limiting (prevent API abuse)

6. Monitoring: Set budget alerts via Cost Explorer

```

3.5 Bedrock Guardrails

Guardrails enforce safety and content policies.

Configurable Guardrails:

```yaml

Content Filtering:

- Hate speech, violence, sexual content

- Personally identifiable information (PII)

- Profanity, insults

Blocked Topics:

- Medical advice, legal advice

- Political content, religious content

- Custom topics (business-specific)

PII Redaction:

- Automatically redact sensitive information

- SSN, credit card, email addresses

Word Filters:

- Block specific words or phrases

- Regex patterns for custom filtering

```

---

4. MLOps Best Practices

4.1 ML Pipeline: Automate Everything

Manual ML processes don't scale. Automate the entire pipeline.

Pipeline Stages:

```yaml

1. Data Ingestion:

- Extract from sources (S3, RDS, API)

- Validation (schema checks, data quality)

- Store in feature store

2. Data Preprocessing:

- Feature engineering

- Train/test/validation split

- Normalization, encoding

3. Model Training:

- Hyperparameter tuning

- Multi-model training (test multiple algorithms)

- Model evaluation

4. Model Validation:

- Performance metrics (accuracy, F1, AUC)

- Business metrics (revenue lift, cost savings)

- Bias and fairness checks

5. Model Deployment:

- Canary deployment (test with small traffic)

- A/B testing (new vs. old model)

- Blue/green deployment (rollback capability)

6. Monitoring:

- Data drift detection

- Model performance monitoring

- Automated retraining triggers

```

4.2 Experiment Tracking

Track all experiments for reproducibility.

What to Track:

```yaml

☐ Hyperparameters (learning rate, batch size, etc.)

☐ Model architecture (layers, parameters)

☐ Training data version (hash, S3 location)

☐ Training metrics (loss, accuracy per epoch)

☐ Validation metrics (precision, recall, F1)

☐ Code version (git commit hash)

☐ Environment (Docker image, library versions)

```

Tools:

```yaml

SageMaker Experiments: Built-in experiment tracking

MLflow: Open-source alternative

Weights & Biases: Advanced visualization

```

4.3 Model Registry: Version Control for Models

Model registry provides centralized model management.

Registry Features:

```yaml

☐ Model versioning (track all iterations)

☐ Model metadata (training data, parameters, metrics)

☐ Approval workflow (dev → staging → production)

☐ Stage transitions (promote/demote models)

☐ Deployment integration (one-click deploy)

```

4.4 CI/CD for ML

ML models require CI/CD like software.

Pipeline:

```yaml

1. Code Commit (Git)

2. Automated Tests (unit tests, data validation)

3. Build Docker Image (model + dependencies)

4. Train Model (SageMaker training job)

5. Evaluate Model (performance, bias, explainability)

6. Register Model (if passes thresholds)

7. Deploy to Staging (canary deployment)

8. Integration Tests (validate predictions)

9. Deploy to Production (blue/green)

10. Monitor (drift detection, quality metrics)

```

---

5. Monitoring and Observability

Required Metrics

```yaml

Training:

- GPU/CPU utilization

- Training loss per epoch

- Validation metrics per epoch

- Training time per epoch

Inference:

- Request latency (p50, p95, p99)

- Requests per second

- Error rate (4xx, 5xx)

- Model invocation count

Model Quality:

- Prediction drift (distribution changes)

- Data drift (feature distribution changes)

- Model accuracy (vs. ground truth)

```

Alerting

```yaml

☐ Model accuracy drops > 10% (model degradation)

☐ Data drift detected (feature distribution shift)

☐ Prediction latency > 1 second (SLA violation)

☐ Error rate > 1% (infrastructure issue)

```

---

6. Cost Optimization Checklist

Immediate Actions (Week 1)

[ ] Enable Spot instances for all SageMaker training

[ ] Audit idle SageMaker endpoints (delete unused)

[ ] Set up budget alerts for Bedrock API usage

[ ] Review SageMaker instance sizing (Compute Optimizer)

Short-Term (Month 1)

[ ] Implement multi-model endpoints (reduce endpoint count)

[ ] Set up Bedrock caching (reduce API calls)

[ ] Configure SageMaker auto-scaling (scale to zero when idle)

[ ] Enable Spot training for all non-urgent workloads

Long-Term (Quarter 1)

[ ] Implement MLOps pipeline (automated training, deployment)

[ ] Set up model monitoring (detect degradation early)

[ ] Archive unused models to S3 (reduce storage costs)

[ ] Establish cost governance (budgets, quotas, approval)

---

7. Security and Compliance

ML Security Checklist

```yaml

☐ VPC: SageMaker endpoints in private subnets

☐ Encryption: Model artifacts encrypted with KMS

☐ IAM: Least-privilege roles for training/inference

☐ Data: Encrypted at rest (S3) and in transit (TLS)

☐ Logging: CloudTrail for API calls, CloudWatch for metrics

☐ Monitoring: SageMaker Model Monitor for drift detection

```

---

Summary: AI/ML Excellence Pillars

1. **Right service selection** (SageMaker for custom ML, Bedrock for generative AI)

2. **Spot instances for training** (90% cost savings)

3. **Multi-model endpoints** (host multiple models, reduce costs)

4. **MLOps is mandatory** (automate pipelines, ensure reproducibility)

5. **Monitor production models** (data drift, quality degradation)

6. **Prompt engineering matters** (Claude > Titan for complex tasks)

7. **RAG for proprietary data** (combine LLMs with your knowledge base)

8. **Model governance** (registry, approval workflows, version control)

---

Need Help with AI/ML Architecture?

Our AI/ML architects can design scalable ML platforms, implement MLOps pipelines, and help you leverage Bedrock for generative AI.

Schedule a Free AI/ML Consultation →

</a>

Internal Linking Strategy:

For **compute**, see [SageMaker Instance Selection](/blog/aws-compute-best-practices)

For **security**, refer to [ML Data Privacy + Encryption](/blog/aws-security-best-practices)

For **storage**, explore [S3 for Model Artifacts](/blog/aws-storage-best-practices)

---

*Last updated: January 5, 2025*

AWS AI/ML Best Practices: SageMaker and Bedrock Architecture Guide

Technical TL;DR

1. AI/ML Service Selection

1.1 Service Decision Framework

1.2 SageMaker vs. Bedrock: When to Use What

2. Amazon SageMaker Best Practices

2.1 SageMaker Architecture Overview

2.2 SageMaker Training: Spot Instances Mandatory

2.3 SageMaker Instance Selection

2.4 SageMaker Hyperparameter Tuning

2.5 SageMaker Endpoints: Multi-Model Deployment

2.6 SageMaker Model Monitor: Production Monitoring

2.7 SageMaker Feature Store: Reusable Features

3. Amazon Bedrock Best Practices

3.1 Foundation Model Selection

3.2 Prompt Engineering Best Practices

3.3 RAG (Retrieval-Augmented Generation) Pattern

3.4 Bedrock Cost Optimization

3.5 Bedrock Guardrails

4. MLOps Best Practices

4.1 ML Pipeline: Automate Everything

4.2 Experiment Tracking

4.3 Model Registry: Version Control for Models

4.4 CI/CD for ML

5. Monitoring and Observability

Required Metrics

Alerting

6. Cost Optimization Checklist

Immediate Actions (Week 1)

Short-Term (Month 1)

Long-Term (Quarter 1)

7. Security and Compliance

ML Security Checklist

Summary: AI/ML Excellence Pillars

Need Help with AI/ML Architecture?

Need Help with Your AWS Infrastructure?

Related Articles

AWS Database Best Practices: RDS, DynamoDB, and ElastiCache Guide

AWS Compute Best Practices: The Complete Well-Architected Guide

AWS Networking Best Practices: VPC, CloudFront, and Connectivity Guide