AWS AI/ML Best Practices: SageMaker and Bedrock Architecture Guide
Master AWS AI/ML services. Learn SageMaker training/deployment, Bedrock generative AI, foundation model selection, MLOps patterns, and cost optimization for machine learning workloads.
Technical TL;DR
AI/ML = Compute-Intensive + Expensive. Architecture matters more than elsewhere.
Key takeaways:
---
1. AI/ML Service Selection
1.1 Service Decision Framework
Start with business requirements, not technology.
| Use Case | Best Service | Why |
|----------|--------------|-----|
| **Custom ML model development** | SageMaker | End-to-end platform: label, train, deploy |
| **Generative AI (text, images)** | Bedrock | Foundation models via API, no training required |
| **Simple predictions (no ML team)** | Lookout for Metrics/Vision | Pre-built ML models, no data science required |
| **Real-time fraud/anomaly detection** | Fraud Detector | Pre-trained models, configure and deploy |
| **Personalization/recommendations** | Personalize | Fully managed, no model training required |
| **AI-powered search** | OpenSearch Service (vector search) | Semantic search without building models |
1.2 SageMaker vs. Bedrock: When to Use What
| Factor | SageMaker | Bedrock |
|--------|-----------|---------|
| **Use Case** | Custom ML models | Generative AI via foundation models |
| **Expertise Required** | Data science team | Prompt engineering, no ML background |
| **Training Required** | Yes (bring your own data) | No (models pre-trained) |
| **Deployment** | Managed endpoints | API calls (no infrastructure) |
| **Cost** | Pay for training + inference | Pay per API call |
| **Time to Production** | Weeks-months | Hours-days |
| **Best For** | Proprietary models, domain-specific | Generative AI, chatbots, content generation |
Decision Tree:
```yaml
Need to train custom model?
Yes → SageMaker
No → Need generative AI?
Yes → Bedrock
No → Consider pre-built AI services (Lookout, Personalize, etc.)
```
---
2. Amazon SageMaker Best Practices
2.1 SageMaker Architecture Overview
SageMaker provides end-to-end ML capabilities.
Core Components:
```yaml
SageMaker Studio: IDE for ML development (notebooks, experiments)
SageMaker Processing: Data preprocessing, feature engineering
SageMaker Training: Model training on managed infrastructure
SageMaker Clarify: Model explainability, bias detection
SageMaker Model Monitor: Production model monitoring
SageMaker Features: Feature store for reusable features
```
2.2 SageMaker Training: Spot Instances Mandatory
Training with Spot instances reduces costs by 90%.
Spot Training Configuration:
```yaml
Managed Spot Training: ENABLE (mandatory for non-urgent training)
Checkpointing:
- Required for spot (interruptible)
- Save model checkpoints to S3
- Resume from checkpoint if interrupted
Instance Strategy:
- Use Spot for 90%+ of training
- Use On-Demand for final checkpoint (if urgency required)
Cost Impact:
- On-Demand: $1/hour
- Spot: ~$0.10/hour (90% savings)
```
Example Configuration:
```python
# SageMaker Training Job with Spot
estimator = Estimator(
instance_count=2,
instance_type='ml.p3.2xlarge',
train_use_spot_instances=True, # Enable Spot
max_wait=3600, # Max wait time
max_run=3600, # Max training time
checkpoint_s3_uri='s3://bucket/checkpoints/',
)
```
2.3 SageMaker Instance Selection
Choose the right instance type for training and inference.
| Instance Type | Use Case | Cost/Hour |
|---------------|----------|-----------|
| **ml.c5** (compute-optimized) | CPU-based training, inference | $0.204 - $4.08 |
| **ml.p3** (GPU) | Deep learning training | $3.06 - $31.62 |
| **ml.p4** (GPU) | Large-scale deep learning | $11.90 - $39.69 |
| **ml.g4** (GPU) | Cost-effective GPU training | $0.526 - $4.881 |
| **ml.inf1** (Inferentia) | High-throughput inference | $0.154 - $2.292 |
| **ml.m5** (general purpose) | Small models, CPU inference | $0.135 - $4.00 |
Selection Guidelines:
```yaml
Training:
- Small models: ml.c5 (CPU)
- Deep learning: ml.g4dn (cost-effective GPU)
- Large models: ml.p3/p4 (high-performance GPU)
Inference:
- Low latency: ml.inf1 (Inferentia chips)
- Cost-effective: ml.c5 (CPU)
- GPU required: ml.g4dn (cheapest GPU)
```
2.4 SageMaker Hyperparameter Tuning
Automated hyperparameter tuning (HPO) finds optimal model parameters.
Tuning Configuration:
```yaml
Hyperparameter Tuning Jobs:
- Define search space (hyperparameter ranges)
- Select objective metric (validation accuracy, F1, etc.)
- Choose tuning strategy (Bayesian optimization, random search)
- Set max number of training jobs
- Enable parallel training (reduce tuning time)
Best Practices:
☐ Start with Bayesian optimization (smarter search)
☐ Enable early stopping (waste less time on bad configs)
☐ Use Spot instances for tuning jobs
☐ Set reasonable max jobs (10-50 typical)
```
2.5 SageMaker Endpoints: Multi-Model Deployment
Multi-model endpoints host multiple models on one endpoint.
When to Use Multi-Model Endpoints:
```yaml
✓ Multiple models sharing same framework (e.g., 10 TensorFlow models)
✓ Models not frequently used (avoid paying for idle endpoints)
✓ A/B testing models (host multiple versions)
✓ Regional personalization (different models per region)
```
Cost Impact:
```yaml
Single Model Endpoint:
- 10 models = 10 endpoints = 10x infrastructure cost
Multi-Model Endpoint:
- 10 models = 1 endpoint = 1x infrastructure cost
- Models loaded from S3 on-demand
- Savings: 80-90% for infrequently used models
```
2.6 SageMaker Model Monitor: Production Monitoring
Model monitoring is required for production ML systems.
What to Monitor:
```yaml
Data Quality:
- Missing values (vs. training baseline)
- Data distribution drift
- Feature attribution drift
Model Quality:
- Prediction accuracy (vs. ground truth)
- Prediction distribution drift
- Model bias detection
Alerting:
- Data drift > threshold
- Model quality degradation
- Feature attribution changes
```
Configuration:
```yaml
☐ Create baseline from training data (statistics, constraints)
☐ Schedule monitoring hourly/daily
☐ Configure CloudWatch alerts for violations
☐ Set up SNS notifications for drift detection
```
2.7 SageMaker Feature Store: Reusable Features
Feature Store eliminates duplicate feature engineering.
Benefits:
```yaml
☐ Single source of truth for features
☐ Reusable across models and teams
☐ Online + offline storage (low-latency + batch)
☐ Time travel (query historical feature values)
☐ Automatic metadata tracking
```
Implementation:
```yaml
Feature Groups:
- Define feature group (schema + record identifier)
- Ingest features (batch or streaming)
- Retrieve for training (offline store)
- Serve for inference (online store)
```
---
3. Amazon Bedrock Best Practices
3.1 Foundation Model Selection
Bedrock provides multiple foundation models via API.
| Model | Provider | Best For | Context Window |
|-------|----------|----------|----------------|
| **Claude** | Anthropic | Analysis, coding, writing | 200K tokens |
| **Titan** | AWS | General purpose, embeddings | 8K tokens |
| **Jurassic-2** | AI21 | Text generation, summarization | 8K tokens |
| **Llama 2** | Meta | Open source, fine-tuning | 4K tokens |
| **Command** | Cohere | RAG, text generation | 4K tokens |
Model Selection Framework:
```yaml
Need long context (100K+ tokens)?
→ Claude (best for large documents)
Need AWS-native (data governance, HIPAA)?
→ Titan Text (AWS-owned)
Need open source (self-host, fine-tune)?
→ Llama 2
Need simple text generation?
→ Titan Text or Jurassic-2
```
3.2 Prompt Engineering Best Practices
Prompt quality determines output quality.
Prompt Engineering Principles:
```yaml
1. Be Specific: Clear instructions, not vague requests
2. Provide Context: Background information, examples
3. Specify Format: JSON, CSV, markdown, etc.
4. Set Constraints: Length limits, style guidelines
5. Use Examples: Few-shot learning (3-5 examples)
6. Chain of Thought: Ask model to explain reasoning
```
Example Prompt Pattern:
```yaml
Context: You are a AWS solutions architect helping with cost optimization.
Task: Analyze the following AWS bill and provide recommendations.
Input: [AWS cost data in JSON format]
Requirements:
- Focus on EC2, RDS, and S3 (70% of spend)
- Provide 3-5 specific recommendations
- Include estimated savings for each recommendation
- Output in markdown format with H2 headers
Output: [Model response]
```
3.3 RAG (Retrieval-Augmented Generation) Pattern
RAG combines LLMs with your proprietary data.
RAG Architecture:
```yaml
1. Ingestion:
- Document parsing (PDF, HTML, etc.)
- Text splitting into chunks
- Embedding generation (via Bedrock Titan Embeddings)
- Store in OpenSearch vector database
2. Retrieval:
- User question → embed query
- Semantic search in OpenSearch (find relevant chunks)
- Retrieve top K chunks (3-10 typical)
3. Generation:
- Combine user question + retrieved chunks
- Send prompt to Bedrock model (Claude, Titan)
- Return answer with source citations
```
Best Practices:
```yaml
☐ Chunk size: 500-1000 tokens (optimal for most models)
☐ Overlap: 20-30% between chunks (preserve context)
☐ Retrieval: Top 3-10 chunks (balance quality vs. cost)
☐ Citations: Always include source references
☐ Fallback: If no relevant chunks, ask user to rephrase
```
3.4 Bedrock Cost Optimization
Bedrock charges per token. Costs scale with usage.
Pricing (Example: Claude Instant):
```yaml
Input: $0.80 per million tokens
Output: $2.40 per million tokens
Typical API Call:
- Input: 1,000 tokens (prompt + context)
- Output: 500 tokens (response)
- Cost: $0.00080 + $0.00120 = $0.002 per call
10,000 calls/day = $20/day = $600/month
```
Cost Optimization Strategies:
```yaml
1. Use smaller models (Claude Instant vs. Claude)
2. Minimize context (only include relevant information)
3. Cache embeddings (avoid re-generating)
4. Use semantic search (retrieve only relevant chunks)
5. Rate limiting (prevent API abuse)
6. Monitoring: Set budget alerts via Cost Explorer
```
3.5 Bedrock Guardrails
Guardrails enforce safety and content policies.
Configurable Guardrails:
```yaml
Content Filtering:
- Hate speech, violence, sexual content
- Personally identifiable information (PII)
- Profanity, insults
Blocked Topics:
- Medical advice, legal advice
- Political content, religious content
- Custom topics (business-specific)
PII Redaction:
- Automatically redact sensitive information
- SSN, credit card, email addresses
Word Filters:
- Block specific words or phrases
- Regex patterns for custom filtering
```
---
4. MLOps Best Practices
4.1 ML Pipeline: Automate Everything
Manual ML processes don't scale. Automate the entire pipeline.
Pipeline Stages:
```yaml
1. Data Ingestion:
- Extract from sources (S3, RDS, API)
- Validation (schema checks, data quality)
- Store in feature store
2. Data Preprocessing:
- Feature engineering
- Train/test/validation split
- Normalization, encoding
3. Model Training:
- Hyperparameter tuning
- Multi-model training (test multiple algorithms)
- Model evaluation
4. Model Validation:
- Performance metrics (accuracy, F1, AUC)
- Business metrics (revenue lift, cost savings)
- Bias and fairness checks
5. Model Deployment:
- Canary deployment (test with small traffic)
- A/B testing (new vs. old model)
- Blue/green deployment (rollback capability)
6. Monitoring:
- Data drift detection
- Model performance monitoring
- Automated retraining triggers
```
4.2 Experiment Tracking
Track all experiments for reproducibility.
What to Track:
```yaml
☐ Hyperparameters (learning rate, batch size, etc.)
☐ Model architecture (layers, parameters)
☐ Training data version (hash, S3 location)
☐ Training metrics (loss, accuracy per epoch)
☐ Validation metrics (precision, recall, F1)
☐ Code version (git commit hash)
☐ Environment (Docker image, library versions)
```
Tools:
```yaml
SageMaker Experiments: Built-in experiment tracking
MLflow: Open-source alternative
Weights & Biases: Advanced visualization
```
4.3 Model Registry: Version Control for Models
Model registry provides centralized model management.
Registry Features:
```yaml
☐ Model versioning (track all iterations)
☐ Model metadata (training data, parameters, metrics)
☐ Approval workflow (dev → staging → production)
☐ Stage transitions (promote/demote models)
☐ Deployment integration (one-click deploy)
```
4.4 CI/CD for ML
ML models require CI/CD like software.
Pipeline:
```yaml
1. Code Commit (Git)
2. Automated Tests (unit tests, data validation)
3. Build Docker Image (model + dependencies)
4. Train Model (SageMaker training job)
5. Evaluate Model (performance, bias, explainability)
6. Register Model (if passes thresholds)
7. Deploy to Staging (canary deployment)
8. Integration Tests (validate predictions)
9. Deploy to Production (blue/green)
10. Monitor (drift detection, quality metrics)
```
---
5. Monitoring and Observability
Required Metrics
```yaml
Training:
- GPU/CPU utilization
- Training loss per epoch
- Validation metrics per epoch
- Training time per epoch
Inference:
- Request latency (p50, p95, p99)
- Requests per second
- Error rate (4xx, 5xx)
- Model invocation count
Model Quality:
- Prediction drift (distribution changes)
- Data drift (feature distribution changes)
- Model accuracy (vs. ground truth)
```
Alerting
```yaml
☐ Model accuracy drops > 10% (model degradation)
☐ Data drift detected (feature distribution shift)
☐ Prediction latency > 1 second (SLA violation)
☐ Error rate > 1% (infrastructure issue)
```
---
6. Cost Optimization Checklist
Immediate Actions (Week 1)
Short-Term (Month 1)
Long-Term (Quarter 1)
---
7. Security and Compliance
ML Security Checklist
```yaml
☐ VPC: SageMaker endpoints in private subnets
☐ Encryption: Model artifacts encrypted with KMS
☐ IAM: Least-privilege roles for training/inference
☐ Data: Encrypted at rest (S3) and in transit (TLS)
☐ Logging: CloudTrail for API calls, CloudWatch for metrics
☐ Monitoring: SageMaker Model Monitor for drift detection
```
---
Summary: AI/ML Excellence Pillars
1. **Right service selection** (SageMaker for custom ML, Bedrock for generative AI)
2. **Spot instances for training** (90% cost savings)
3. **Multi-model endpoints** (host multiple models, reduce costs)
4. **MLOps is mandatory** (automate pipelines, ensure reproducibility)
5. **Monitor production models** (data drift, quality degradation)
6. **Prompt engineering matters** (Claude > Titan for complex tasks)
7. **RAG for proprietary data** (combine LLMs with your knowledge base)
8. **Model governance** (registry, approval workflows, version control)
---
Need Help with AI/ML Architecture?
Our AI/ML architects can design scalable ML platforms, implement MLOps pipelines, and help you leverage Bedrock for generative AI.
<a href="/contact" className="text-aws-orange font-semibold hover:text-aws-light">
Schedule a Free AI/ML Consultation →
</a>
Internal Linking Strategy:
---
*Last updated: January 5, 2025*
Related Articles
AWS Database Best Practices: RDS, DynamoDB, and ElastiCache Guide
Master AWS database selection and architecture. Learn RDS engine selection, DynamoDB modeling, ElastiCache patterns, read replicas, multi-AZ deployments, and database migration strategies.
AWS Compute Best Practices: The Complete Well-Architected Guide
Master EC2, Lambda, and ECS/EKS compute architectures. Learn right-sizing, auto-scaling, serverless patterns, and cost optimization strategies aligned with AWS Well-Architected Framework.
AWS Networking Best Practices: VPC, CloudFront, and Connectivity Guide
Master AWS networking architecture. Learn VPC design, CloudFront CDN, Route 53 DNS, Direct Connect hybrid networking, and network security best practices aligned with AWS Well-Architected Framework.