Home Solutions GPU ServersLenovo ThinkSystem SR685a V3 — 8x MI300X for Large-Memory Inference

Nexus Compute

Lenovo ThinkSystem SR685a V3 — 8x MI300X for Large-Memory Inference

192GB-per-GPU inference node tuned for high-throughput ROCm serving.

Request Quote Download Datasheet

Full manufacturer warrantyAuthorized channel48-hour quote

We help you choose, configure, and deliver the right system — no obligation.

Lenovo ThinkSystem SR685a V3 — 8x MI300X for Large-Memory Inference — Nexus Compute enterprise hardware

Lenovo ThinkSystem SR685a V3 — 8x MI300X for Large-Memory Inference hardware detail 1

Lenovo ThinkSystem SR685a V3 — 8x MI300X for Large-Memory Inference hardware detail 2

Lenovo ThinkSystem SR685a V3 — 8x MI300X for Large-Memory Inference hardware detail 3

Configuration at a Glance

Accelerator8x AMD Instinct MI300X, 192GB HBM3 each (1.5TB total)

GPU InterconnectAMD Infinity Fabric, 6.4TB/s GPU-to-GPU

CPUDual AMD EPYC 9004/9005 (Genoa/Turin)

System MemoryLenovo TruDDR5 up to 6000 MHz (24 DIMMs)

Tailored per engagement. Full technical overview below.

Configuration Options

Core specifications for this system. Every component is configurable to your workload — request a quote for a tailored build.

GPU / Accelerator

8x AMD Instinct MI300X, 192GB HBM3 each (1.5TB total)

Processor

Dual AMD EPYC 9004/9005 (Genoa/Turin)

Memory

Lenovo TruDDR5 up to 6000 MHz (24 DIMMs)

Storage

Up to 16x hot-swap PCIe Gen5 NVMe

Overview

The Lenovo ThinkSystem SR685a V3 hosts eight MI300X accelerators on dual AMD EPYC processors, an 8U platform optimized for serving large models where per-GPU memory capacity is the binding constraint. Nexus Compute configures the ROCm inference stack, GPUDirect networking, and storage, then delivers each tested system warranty-backed through authorized Lenovo channels.

Who This Solution Is For

AI product teams serving large models to many users

Enterprises deploying private LLM inference on AMD

Lenovo-standardized data centers adding MI300X capacity

Teams whose models exceed competing GPUs' memory limits

Business Benefits

Capacity-led inference

192GB per GPU and 1.5TB pooled HBM3 keep large models resident for high-throughput, low-fragmentation serving.

Tuned serving stack

Configured with ROCm and vLLM-class engines so token throughput is production-ready on delivery.

Lenovo platform support

Sourced through authorized Lenovo channels with enterprise warranty and serviceability.

Typical Business Use Cases

High-concurrency LLM and generative-model serving

Private, on-premises inference for regulated data

Retrieval-augmented generation at scale

Long-context inference workloads

Industry Applications

AI & Machine LearningSaaS & SoftwareFinancial ServicesHealthcare & Life SciencesTelecom

Technical Overview

Built on the Lenovo ThinkSystem SR685a V3 8U platform with eight MI300X accelerators linked by AMD Infinity Fabric at 6.4TB/s GPU-to-GPU bandwidth, driven by dual AMD EPYC 9004/9005 processors. Lenovo TruDDR5 memory, up to 16 PCIe Gen5 NVMe drives, and eight front-accessible GPUDirect PCIe Gen5 slots support inference-serving deployments.

Accelerator	8x AMD Instinct MI300X, 192GB HBM3 each (1.5TB total)
GPU Interconnect	AMD Infinity Fabric, 6.4TB/s GPU-to-GPU
CPU	Dual AMD EPYC 9004/9005 (Genoa/Turin)
System Memory	Lenovo TruDDR5 up to 6000 MHz (24 DIMMs)
Storage	Up to 16x hot-swap PCIe Gen5 NVMe
Networking/Fabric	8x front PCIe Gen5 FHHL slots with GPUDirect
Form Factor	8U rackmount
Management	Lenovo XClarity Controller (XCC)
Warranty	Lenovo enterprise warranty (configurable term)

Specifications are indicative and configured to each engagement. Request a quote for a configuration tailored to your requirements.

Warranty, Support & Fulfillment

Every system ships from an authorized channel, configured and tested, with the documentation enterprise buyers need — backed by warranty and a dedicated account team.

Enterprise Warranty

Full manufacturer warranty with optional on-site, next-business-day support and extended coverage.

Authorized Channel

Sourced through Tier-1 distribution and OEM partners — never grey market. Asset & warranty records included.

Lead Time & Deployment

48-hour quotes, then configured, burn-in tested, and delivered on a committed schedule.

Nationwide Fulfillment

Coordinated logistics, rack-and-stack, and delivery wherever your infrastructure lives.

Frequently Asked Questions

Why pick MI300X for inference specifically?

Its 192GB per GPU keeps large models resident with room for big KV caches and long contexts, raising concurrency before you add nodes. We benchmark your model and SLA to confirm fit.

Which serving software do you configure?

Commonly vLLM on ROCm, with PyTorch and your gateway of choice. We tune batching and parallelism to your latency and throughput targets before delivery.

How does this differ from the training-focused MI300X servers?

Same accelerator, different tuning: this SR685a V3 build is optimized for serving (memory residency, throughput, availability), while liquid-cooled training nodes prioritize sustained clocks and multi-node fabric.

Hardware Assistance

Configure the Lenovo ThinkSystem SR685a V3 — 8x MI300X for Large-Memory Inference with Nexus Compute

Tell us your requirements and a hardware specialist will help you specify, configure, and quote the right system — typically within two business days. No obligation.

Request Quote Speak to an Infrastructure Specialist