top of page
Inference Pricing
LLM, Image, Audio, Video, and Signal models are available through Inference API. For these models, Pay-As-You-Go or use a Flat rate of $1000 per month on your hosting. Prices are per 1,000 tokens including input and output tokens.
Model
Price per 1 token
Model Size | Large Language Model, Chat | Image/ Audio/ Video/ Signal | Price per Hour Hosting |
---|---|---|---|
40.1B - 70B | $0.0010 | $0.01 | $6.17
|
Fine-tuned models
After you fine-tune a model with the Fine-tuning API you can host it for inference. When hosting your own model you pay hourly for the GPU instances. You can start or stop your instance any time through the web-based Playground or using the start/stop instance APIs.
bottom of page