Meus 2 cents:
Ao lado do "modelo do momento" tem tambem "o provedor/cloud GPU" para buscar o fornecedor mais barato (colocar um servidor de GPU privado hospedando seu modelo LLM e tentar escapar do custo por token e usar o mais previsivel custo por servidor).
No momento o que tenho usado como referencia eh:
Lista de provedores
https://devinschumacher.github.io/cloud-gpu-servers-services-providers/
https://gist.github.com/devinschumacher/87dd5b87234f2d0e5dba56503bfba533
https://research.aimultiple.com/cloud-gpu-providers/
https://research.aimultiple.com/cloud-gpu/
Alguns deles
https://www.vultr.com/pricing/#cloud-gpu
https://www.hetzner.com/dedicated-rootserver/matrix-gpu/
https://lambda.ai/service/gpu-cloud#pricing
https://www.liquidweb.com/gpu-hosting/
https://www.interserver.net/dedicated/gpu.html
https://www.runpod.io/pricing
https://www.cherryservers.com/dedicated-gpu-servers
https://gthost.com/gpu-dedicated-servers