Question 1

AI 모델 스택이란 무엇인가요?

Accepted Answer

AI 스택은 같은 GPU에서 실행되는 여러 AI 모델의 조합입니다. 예를 들어 텍스트 생성용 LLM, 이미지용 Stable Diffusion, 음성 인식용 Whisper 등입니다. 이 도구로 GPU가 모두 감당할 수 있는지 확인할 수 있습니다.

Question 2

'항상 로드'와 '필요시 로드'의 차이는 무엇인가요?

Accepted Answer

'항상 로드' 모델은 항상 VRAM에 상주합니다(예: 메인 LLM). '필요시 로드' 모델은 필요할 때만 로드되고 사용 후 언로드됩니다. 피크 VRAM = 모든 '항상 로드' 모델 + 가장 큰 '필요시 로드' 모델.

Question 3

스택의 VRAM 사용량을 어떻게 줄일 수 있나요?

Accepted Answer

LLM에 Q4 양자화 사용(FP16 대비 약 75% 절감). 자주 사용하지 않는 모델은 '필요시 로드'로 설정. 품질이 허용되는 경우 더 작은 모델 변형 사용. TTS 모델에서 지원하는 경우 저VRAM 모드 활성화.

Question 4

Can I run multiple models on a single GPU?

Accepted Answer

Yes, as long as the total VRAM requirement of all loaded models fits within your GPU's memory. Some inference frameworks like Ollama and text-generation-inference support loading multiple models simultaneously. Others can swap models in and out of VRAM on demand, which uses less peak memory but adds latency when switching between models. This calculator shows the peak VRAM needed when all selected models are loaded at once.

Question 5

Does running multiple models slow down inference?

Accepted Answer

Running multiple models simultaneously can reduce per-model inference speed because they compete for GPU compute resources and memory bandwidth. The impact depends on whether models are being queried concurrently or sequentially. For sequential use (one model at a time), performance impact is minimal as long as all models fit in VRAM. For concurrent inference, expect some throughput reduction. Using separate GPUs for different models eliminates this contention.

Question 6

tools.ai-stack-builder.faq.q6

Accepted Answer

tools.ai-stack-builder.faq.a6

Question 7

tools.ai-stack-builder.faq.q7

Accepted Answer

tools.ai-stack-builder.faq.a7

Question 8

tools.ai-stack-builder.faq.q8

Accepted Answer

tools.ai-stack-builder.faq.a8

AI 모델 스택 빌더

모델 브라우저

내 스택

하드웨어

작동 원리

common.whyThisMatters

common.realWorldExamples

방법론 및 출처

common.commonMistakes

자주 묻는 질문

관련 도구

LLM VRAM Checker

API Pricing Calculator

관련 가이드

guides.api-pricing-optimization-guide.title