AWS AI/ML Best Practices: SageMaker and Bedrock Architecture Guide

Master AWS AI/ML services. Learn SageMaker training/deployment, Bedrock generative AI, foundation model selection, MLOps patterns, and cost optimization for machine learning workloads.

12 min read
By CloudBridgeHub

Technical TL;DR


AI/ML = Compute-Intensive + Expensive. Architecture matters more than elsewhere.


Key takeaways:

  • **SageMaker for custom ML** (end-to-end ML platform)
  • **Bedrock for generative AI** (foundation models via API)
  • **Spot instances for training** (90% cost savings)
  • **Multi-model endpoints** for inference (reduce costs)
  • **MLOps is mandatory** (reproducibility, monitoring, governance)
  • **Model monitoring** (data drift, prediction quality)

  • ---


    1. AI/ML Service Selection


    1.1 Service Decision Framework


    Start with business requirements, not technology.


    | Use Case | Best Service | Why |

    |----------|--------------|-----|

    | **Custom ML model development** | SageMaker | End-to-end platform: label, train, deploy |

    | **Generative AI (text, images)** | Bedrock | Foundation models via API, no training required |

    | **Simple predictions (no ML team)** | Lookout for Metrics/Vision | Pre-built ML models, no data science required |

    | **Real-time fraud/anomaly detection** | Fraud Detector | Pre-trained models, configure and deploy |

    | **Personalization/recommendations** | Personalize | Fully managed, no model training required |

    | **AI-powered search** | OpenSearch Service (vector search) | Semantic search without building models |


    1.2 SageMaker vs. Bedrock: When to Use What


    | Factor | SageMaker | Bedrock |

    |--------|-----------|---------|

    | **Use Case** | Custom ML models | Generative AI via foundation models |

    | **Expertise Required** | Data science team | Prompt engineering, no ML background |

    | **Training Required** | Yes (bring your own data) | No (models pre-trained) |

    | **Deployment** | Managed endpoints | API calls (no infrastructure) |

    | **Cost** | Pay for training + inference | Pay per API call |

    | **Time to Production** | Weeks-months | Hours-days |

    | **Best For** | Proprietary models, domain-specific | Generative AI, chatbots, content generation |


    Decision Tree:

    ```yaml

    Need to train custom model?

    Yes → SageMaker

    No → Need generative AI?

    Yes → Bedrock

    No → Consider pre-built AI services (Lookout, Personalize, etc.)

    ```


    ---


    2. Amazon SageMaker Best Practices


    2.1 SageMaker Architecture Overview


    SageMaker provides end-to-end ML capabilities.


    Core Components:

    ```yaml

    SageMaker Studio: IDE for ML development (notebooks, experiments)

    SageMaker Processing: Data preprocessing, feature engineering

    SageMaker Training: Model training on managed infrastructure

    SageMaker Clarify: Model explainability, bias detection

    SageMaker Model Monitor: Production model monitoring

    SageMaker Features: Feature store for reusable features

    ```


    2.2 SageMaker Training: Spot Instances Mandatory


    Training with Spot instances reduces costs by 90%.


    Spot Training Configuration:

    ```yaml

    Managed Spot Training: ENABLE (mandatory for non-urgent training)


    Checkpointing:

    - Required for spot (interruptible)

    - Save model checkpoints to S3

    - Resume from checkpoint if interrupted


    Instance Strategy:

    - Use Spot for 90%+ of training

    - Use On-Demand for final checkpoint (if urgency required)


    Cost Impact:

    - On-Demand: $1/hour

    - Spot: ~$0.10/hour (90% savings)

    ```


    Example Configuration:

    ```python

    # SageMaker Training Job with Spot

    estimator = Estimator(

    instance_count=2,

    instance_type='ml.p3.2xlarge',

    train_use_spot_instances=True, # Enable Spot

    max_wait=3600, # Max wait time

    max_run=3600, # Max training time

    checkpoint_s3_uri='s3://bucket/checkpoints/',

    )

    ```


    2.3 SageMaker Instance Selection


    Choose the right instance type for training and inference.


    | Instance Type | Use Case | Cost/Hour |

    |---------------|----------|-----------|

    | **ml.c5** (compute-optimized) | CPU-based training, inference | $0.204 - $4.08 |

    | **ml.p3** (GPU) | Deep learning training | $3.06 - $31.62 |

    | **ml.p4** (GPU) | Large-scale deep learning | $11.90 - $39.69 |

    | **ml.g4** (GPU) | Cost-effective GPU training | $0.526 - $4.881 |

    | **ml.inf1** (Inferentia) | High-throughput inference | $0.154 - $2.292 |

    | **ml.m5** (general purpose) | Small models, CPU inference | $0.135 - $4.00 |


    Selection Guidelines:

    ```yaml

    Training:

    - Small models: ml.c5 (CPU)

    - Deep learning: ml.g4dn (cost-effective GPU)

    - Large models: ml.p3/p4 (high-performance GPU)


    Inference:

    - Low latency: ml.inf1 (Inferentia chips)

    - Cost-effective: ml.c5 (CPU)

    - GPU required: ml.g4dn (cheapest GPU)

    ```


    2.4 SageMaker Hyperparameter Tuning


    Automated hyperparameter tuning (HPO) finds optimal model parameters.


    Tuning Configuration:

    ```yaml

    Hyperparameter Tuning Jobs:

    - Define search space (hyperparameter ranges)

    - Select objective metric (validation accuracy, F1, etc.)

    - Choose tuning strategy (Bayesian optimization, random search)

    - Set max number of training jobs

    - Enable parallel training (reduce tuning time)


    Best Practices:

    ☐ Start with Bayesian optimization (smarter search)

    ☐ Enable early stopping (waste less time on bad configs)

    ☐ Use Spot instances for tuning jobs

    ☐ Set reasonable max jobs (10-50 typical)

    ```


    2.5 SageMaker Endpoints: Multi-Model Deployment


    Multi-model endpoints host multiple models on one endpoint.


    When to Use Multi-Model Endpoints:

    ```yaml

    ✓ Multiple models sharing same framework (e.g., 10 TensorFlow models)

    ✓ Models not frequently used (avoid paying for idle endpoints)

    ✓ A/B testing models (host multiple versions)

    ✓ Regional personalization (different models per region)

    ```


    Cost Impact:

    ```yaml

    Single Model Endpoint:

    - 10 models = 10 endpoints = 10x infrastructure cost


    Multi-Model Endpoint:

    - 10 models = 1 endpoint = 1x infrastructure cost

    - Models loaded from S3 on-demand

    - Savings: 80-90% for infrequently used models

    ```


    2.6 SageMaker Model Monitor: Production Monitoring


    Model monitoring is required for production ML systems.


    What to Monitor:

    ```yaml

    Data Quality:

    - Missing values (vs. training baseline)

    - Data distribution drift

    - Feature attribution drift


    Model Quality:

    - Prediction accuracy (vs. ground truth)

    - Prediction distribution drift

    - Model bias detection


    Alerting:

    - Data drift > threshold

    - Model quality degradation

    - Feature attribution changes

    ```


    Configuration:

    ```yaml

    ☐ Create baseline from training data (statistics, constraints)

    ☐ Schedule monitoring hourly/daily

    ☐ Configure CloudWatch alerts for violations

    ☐ Set up SNS notifications for drift detection

    ```


    2.7 SageMaker Feature Store: Reusable Features


    Feature Store eliminates duplicate feature engineering.


    Benefits:

    ```yaml

    ☐ Single source of truth for features

    ☐ Reusable across models and teams

    ☐ Online + offline storage (low-latency + batch)

    ☐ Time travel (query historical feature values)

    ☐ Automatic metadata tracking

    ```


    Implementation:

    ```yaml

    Feature Groups:

    - Define feature group (schema + record identifier)

    - Ingest features (batch or streaming)

    - Retrieve for training (offline store)

    - Serve for inference (online store)

    ```


    ---


    3. Amazon Bedrock Best Practices


    3.1 Foundation Model Selection


    Bedrock provides multiple foundation models via API.


    | Model | Provider | Best For | Context Window |

    |-------|----------|----------|----------------|

    | **Claude** | Anthropic | Analysis, coding, writing | 200K tokens |

    | **Titan** | AWS | General purpose, embeddings | 8K tokens |

    | **Jurassic-2** | AI21 | Text generation, summarization | 8K tokens |

    | **Llama 2** | Meta | Open source, fine-tuning | 4K tokens |

    | **Command** | Cohere | RAG, text generation | 4K tokens |


    Model Selection Framework:

    ```yaml

    Need long context (100K+ tokens)?

    → Claude (best for large documents)


    Need AWS-native (data governance, HIPAA)?

    → Titan Text (AWS-owned)


    Need open source (self-host, fine-tune)?

    → Llama 2


    Need simple text generation?

    → Titan Text or Jurassic-2

    ```


    3.2 Prompt Engineering Best Practices


    Prompt quality determines output quality.


    Prompt Engineering Principles:

    ```yaml

    1. Be Specific: Clear instructions, not vague requests

    2. Provide Context: Background information, examples

    3. Specify Format: JSON, CSV, markdown, etc.

    4. Set Constraints: Length limits, style guidelines

    5. Use Examples: Few-shot learning (3-5 examples)

    6. Chain of Thought: Ask model to explain reasoning

    ```


    Example Prompt Pattern:

    ```yaml

    Context: You are a AWS solutions architect helping with cost optimization.


    Task: Analyze the following AWS bill and provide recommendations.


    Input: [AWS cost data in JSON format]


    Requirements:

    - Focus on EC2, RDS, and S3 (70% of spend)

    - Provide 3-5 specific recommendations

    - Include estimated savings for each recommendation

    - Output in markdown format with H2 headers


    Output: [Model response]

    ```


    3.3 RAG (Retrieval-Augmented Generation) Pattern


    RAG combines LLMs with your proprietary data.


    RAG Architecture:

    ```yaml

    1. Ingestion:

    - Document parsing (PDF, HTML, etc.)

    - Text splitting into chunks

    - Embedding generation (via Bedrock Titan Embeddings)

    - Store in OpenSearch vector database


    2. Retrieval:

    - User question → embed query

    - Semantic search in OpenSearch (find relevant chunks)

    - Retrieve top K chunks (3-10 typical)


    3. Generation:

    - Combine user question + retrieved chunks

    - Send prompt to Bedrock model (Claude, Titan)

    - Return answer with source citations

    ```


    Best Practices:

    ```yaml

    ☐ Chunk size: 500-1000 tokens (optimal for most models)

    ☐ Overlap: 20-30% between chunks (preserve context)

    ☐ Retrieval: Top 3-10 chunks (balance quality vs. cost)

    ☐ Citations: Always include source references

    ☐ Fallback: If no relevant chunks, ask user to rephrase

    ```


    3.4 Bedrock Cost Optimization


    Bedrock charges per token. Costs scale with usage.


    Pricing (Example: Claude Instant):

    ```yaml

    Input: $0.80 per million tokens

    Output: $2.40 per million tokens


    Typical API Call:

    - Input: 1,000 tokens (prompt + context)

    - Output: 500 tokens (response)

    - Cost: $0.00080 + $0.00120 = $0.002 per call


    10,000 calls/day = $20/day = $600/month

    ```


    Cost Optimization Strategies:

    ```yaml

    1. Use smaller models (Claude Instant vs. Claude)

    2. Minimize context (only include relevant information)

    3. Cache embeddings (avoid re-generating)

    4. Use semantic search (retrieve only relevant chunks)

    5. Rate limiting (prevent API abuse)

    6. Monitoring: Set budget alerts via Cost Explorer

    ```


    3.5 Bedrock Guardrails


    Guardrails enforce safety and content policies.


    Configurable Guardrails:

    ```yaml

    Content Filtering:

    - Hate speech, violence, sexual content

    - Personally identifiable information (PII)

    - Profanity, insults


    Blocked Topics:

    - Medical advice, legal advice

    - Political content, religious content

    - Custom topics (business-specific)


    PII Redaction:

    - Automatically redact sensitive information

    - SSN, credit card, email addresses


    Word Filters:

    - Block specific words or phrases

    - Regex patterns for custom filtering

    ```


    ---


    4. MLOps Best Practices


    4.1 ML Pipeline: Automate Everything


    Manual ML processes don't scale. Automate the entire pipeline.


    Pipeline Stages:

    ```yaml

    1. Data Ingestion:

    - Extract from sources (S3, RDS, API)

    - Validation (schema checks, data quality)

    - Store in feature store


    2. Data Preprocessing:

    - Feature engineering

    - Train/test/validation split

    - Normalization, encoding


    3. Model Training:

    - Hyperparameter tuning

    - Multi-model training (test multiple algorithms)

    - Model evaluation


    4. Model Validation:

    - Performance metrics (accuracy, F1, AUC)

    - Business metrics (revenue lift, cost savings)

    - Bias and fairness checks


    5. Model Deployment:

    - Canary deployment (test with small traffic)

    - A/B testing (new vs. old model)

    - Blue/green deployment (rollback capability)


    6. Monitoring:

    - Data drift detection

    - Model performance monitoring

    - Automated retraining triggers

    ```


    4.2 Experiment Tracking


    Track all experiments for reproducibility.


    What to Track:

    ```yaml

    ☐ Hyperparameters (learning rate, batch size, etc.)

    ☐ Model architecture (layers, parameters)

    ☐ Training data version (hash, S3 location)

    ☐ Training metrics (loss, accuracy per epoch)

    ☐ Validation metrics (precision, recall, F1)

    ☐ Code version (git commit hash)

    ☐ Environment (Docker image, library versions)

    ```


    Tools:

    ```yaml

    SageMaker Experiments: Built-in experiment tracking

    MLflow: Open-source alternative

    Weights & Biases: Advanced visualization

    ```


    4.3 Model Registry: Version Control for Models


    Model registry provides centralized model management.


    Registry Features:

    ```yaml

    ☐ Model versioning (track all iterations)

    ☐ Model metadata (training data, parameters, metrics)

    ☐ Approval workflow (dev → staging → production)

    ☐ Stage transitions (promote/demote models)

    ☐ Deployment integration (one-click deploy)

    ```


    4.4 CI/CD for ML


    ML models require CI/CD like software.


    Pipeline:

    ```yaml

    1. Code Commit (Git)

    2. Automated Tests (unit tests, data validation)

    3. Build Docker Image (model + dependencies)

    4. Train Model (SageMaker training job)

    5. Evaluate Model (performance, bias, explainability)

    6. Register Model (if passes thresholds)

    7. Deploy to Staging (canary deployment)

    8. Integration Tests (validate predictions)

    9. Deploy to Production (blue/green)

    10. Monitor (drift detection, quality metrics)

    ```


    ---


    5. Monitoring and Observability


    Required Metrics

    ```yaml

    Training:

    - GPU/CPU utilization

    - Training loss per epoch

    - Validation metrics per epoch

    - Training time per epoch


    Inference:

    - Request latency (p50, p95, p99)

    - Requests per second

    - Error rate (4xx, 5xx)

    - Model invocation count


    Model Quality:

    - Prediction drift (distribution changes)

    - Data drift (feature distribution changes)

    - Model accuracy (vs. ground truth)

    ```


    Alerting

    ```yaml

    ☐ Model accuracy drops > 10% (model degradation)

    ☐ Data drift detected (feature distribution shift)

    ☐ Prediction latency > 1 second (SLA violation)

    ☐ Error rate > 1% (infrastructure issue)

    ```


    ---


    6. Cost Optimization Checklist


    Immediate Actions (Week 1)

  • [ ] Enable Spot instances for all SageMaker training
  • [ ] Audit idle SageMaker endpoints (delete unused)
  • [ ] Set up budget alerts for Bedrock API usage
  • [ ] Review SageMaker instance sizing (Compute Optimizer)

  • Short-Term (Month 1)

  • [ ] Implement multi-model endpoints (reduce endpoint count)
  • [ ] Set up Bedrock caching (reduce API calls)
  • [ ] Configure SageMaker auto-scaling (scale to zero when idle)
  • [ ] Enable Spot training for all non-urgent workloads

  • Long-Term (Quarter 1)

  • [ ] Implement MLOps pipeline (automated training, deployment)
  • [ ] Set up model monitoring (detect degradation early)
  • [ ] Archive unused models to S3 (reduce storage costs)
  • [ ] Establish cost governance (budgets, quotas, approval)

  • ---


    7. Security and Compliance


    ML Security Checklist

    ```yaml

    ☐ VPC: SageMaker endpoints in private subnets

    ☐ Encryption: Model artifacts encrypted with KMS

    ☐ IAM: Least-privilege roles for training/inference

    ☐ Data: Encrypted at rest (S3) and in transit (TLS)

    ☐ Logging: CloudTrail for API calls, CloudWatch for metrics

    ☐ Monitoring: SageMaker Model Monitor for drift detection

    ```


    ---


    Summary: AI/ML Excellence Pillars


    1. **Right service selection** (SageMaker for custom ML, Bedrock for generative AI)

    2. **Spot instances for training** (90% cost savings)

    3. **Multi-model endpoints** (host multiple models, reduce costs)

    4. **MLOps is mandatory** (automate pipelines, ensure reproducibility)

    5. **Monitor production models** (data drift, quality degradation)

    6. **Prompt engineering matters** (Claude > Titan for complex tasks)

    7. **RAG for proprietary data** (combine LLMs with your knowledge base)

    8. **Model governance** (registry, approval workflows, version control)


    ---


    Need Help with AI/ML Architecture?


    Our AI/ML architects can design scalable ML platforms, implement MLOps pipelines, and help you leverage Bedrock for generative AI.


    <a href="/contact" className="text-aws-orange font-semibold hover:text-aws-light">

    Schedule a Free AI/ML Consultation →

    </a>


    Internal Linking Strategy:

  • For **compute**, see [SageMaker Instance Selection](/blog/aws-compute-best-practices)
  • For **security**, refer to [ML Data Privacy + Encryption](/blog/aws-security-best-practices)
  • For **storage**, explore [S3 for Model Artifacts](/blog/aws-storage-best-practices)

  • ---


    *Last updated: January 5, 2025*


    Need Help with Your AWS Infrastructure?

    Our AWS certified experts can help you optimize costs, improve security, and build scalable solutions.