vLLM服务性能基准测试说明记录
1. 测试环境准备
下载以来软件,最好在虚拟环境中执行:
conda create -n vllm_bench
conda activate vllm_bench
pip install vllm openai pandas datasets #若有缺少再进行pip install
2. 模型服务部署
2.1 启动API服务
vllm serve \
/model_path/ \
--max-model-len 80000 \
--gpu-memory-utilization 0.4 \
--swap_space 512 \
--device auto \
--no-enable-prefix-caching \
初始化完成记录:
INFO 07-25 11:16:59 [serving_chat.py:125] Using default chat sampling params from model: {'temperature': 0.6, 'top_p': 0.9}
INFO 07-25 11:16:59 [serving_completion.py:72] Using default completion sampling params from model: {'temperature': 0.6, 'top_p': 0.9}
INFO 07-25 11:16:59 [api_server.py:1457] Starting vLLM API server 0 on http://0.0.0.0:8000
INFO 07-25 11:16:59 [launcher.py:29] Available routes are:
INFO 07-25 11:16:59 [launcher.py:37] Route: /openapi.json, Methods: GET, HEAD
INFO 07-25 11:16:59 [launcher.py:37] Route: /docs, Methods: GET, HEAD
INFO 07-25 11:16:59 [launcher.py:37] Route: /docs/oauth2-redirect, Methods: GET, HEAD
INFO 07-25 11:16:59 [launcher.py:37] Route: /redoc, Methods: GET, HEAD
INFO 07-25 11:16:59 [launcher.py:37] Route: /health, Methods: GET
INFO 07-25 11:16:59 [launcher.py:37] Route: /load, Methods: GET
INFO 07-25 11:16:59 [launcher.py:37] Route: /ping, Methods: POST
INFO 07-25 11:16:59 [launcher.py:37] Route: /ping, Methods: GET
INFO 07-25 11:16:59 [launcher.py:37] Route: /tokenize, Methods: POST
INFO 07-25 11:16:59 [launcher.py:37] Route: /detokenize, Methods: POST
INFO 07-25 11:16:59 [launcher.py:37] Route: /v1/models, Methods: GET
INFO 07-25 11:16:59 [launcher.py:37] Route: /version, Methods: GET
INFO 07-25 11:16:59 [launcher.py:37] Route: /v1/chat/completions, Methods: POST
INFO 07-25 11:16:59 [launcher.py:37] Route: /v1/completions, Methods: POST
INFO 07-25 11:16:59 [launcher.py:37] Route: /v1/embeddings, Methods: POST
INFO 07-25 11:16:59 [launcher.py:37] Route: /pooling, Methods: POST
INFO 07-25 11:16:59 [launcher.py:37] Route: /classify, Methods: POST
INFO 07-25 11:16:59 [launcher.py:37] Route: /score, Methods: POST
INFO 07-25 11:16:59 [launcher.py:37] Route: /v1/score, Methods: POST
INFO 07-25 11:16:59 [launcher.py:37] Route: /v1/audio/transcriptions, Methods: POST
INFO 07-25 11:16:59 [launcher.py:37] Route: /v1/audio/translations, Methods: POST
INFO 07-25 11:16:59 [launcher.py:37] Route: /rerank, Methods: POST
INFO 07-25 11:16:59 [launcher.py:37] Route: /v1/rerank, Methods: POST
INFO 07-25 11:16:59 [launcher.py:37] Route: /v2/rerank, Methods: POST
INFO 07-25 11:16:59 [launcher


1万+

被折叠的 条评论
为什么被折叠?



