Envoy AI API endpoint
NRP-Managed LLMs
The NRP provides several hosted open-weights LLM for either API access, or use with our hosted chat interfaces.
Chat Interfaces
If you are looking to chat with an LLM model similar to the interface provided by ChatGPT, we provide LibreChat, based on the LibreChat project. This is a simple chat interface for all of the NRP hosted models. You can use it to chat with the models, or to test out the models.
Visit the LibreChat interface
On MacOS and Safari you can make it always available in Dock for quick access: having librechat open in Safari, click File->Add to Dock.
API Access to LLMs via Envoy
(Envoy AI Gateway, work in progress)
To access our LLMs through the Envoy AI Gateway, you need to be a member of a group with LLM flag. Your membership info can be found on the namespaces page.
Start from creating a token. You can use this token to query the LLM endpoint
with CURL or any OpenAI API compatible tool.
curl -H "Authorization: Bearer <your_token>" https://ellm.nrp-nautilus.io/v1/models
API Access to LLMs via LiteLLM
(LiteLLM, to be deprecated soon)
API access to the LLM is provided through an LiteLLM proxy. In order to access our LLMs, you need to:
Login to NRP’s LiteLLM instance.
Create an API key. During key creation, you will select the models that the key is allowed to access (or all models).
With the API key, you are able to access the API through the endpoint
https://llm.nrp-nautilus.io/
. An example of how to use the API is below Python code.
Visit the NRP LiteLLM interface
Examples
Python Code
To access the NRP LLMs, you can use the OpenAI Python client. Below is an example of how to use the OpenAI Python client to access the NRP LLMs.
import osfrom openai import OpenAI
client = OpenAI( # This is the default and can be omitted api_key = os.environ.get("OPENAI_API_KEY"), base_url = "https://llm.nrp-nautilus.io/")
completion = client.chat.completions.create( model="gemma3", messages=[ {"role": "developer", "content": "Talk like a pirate."}, { "role": "user", "content": "How do I check if a Python object is an instance of a class?", }, ],)
print(completion.choices[0].message.content)
Bash+Curl
curl -H "Authorization: Bearer <TOKEN>" https://ellm.nrp-nautilus.io/v1/models
curl -H "Authorization: Bearer <TOKEN>" -X POST "https://ellm.nrp-nautilus.io/v1/chat/completions" \-H "Content-Type: application/json" \-d '{ "model": "meta-llama/Llama-3.2-90B-Vision-Instruct", "messages": [ {"role": "user", "content": "Hey!"} ] }'
Available Models
main - Model is generally supported. You can report issues with the service.
dep - LLM is deprecated and is likely to go away soon.
eval - The LLM is added for testing and we’re evaluating it’s capabilities. Can be unavailable sometimes and change configuration without notifications.
You can follow all updates in our Matrix Machine Learning channel.
LiteLLM name | Model | Features |
---|---|---|
deepseek-r1 main | QuantTrio/DeepSeek-R1-0528-GPTQ-Int4-Int8Mix-Medium | 685B parameters, INT4/INT8 mixed quantization, 163,840 tokens, tool calling, Claude and o3 performance |
gemma3 main | google/gemma-3-27b-it | Agentic AI workflows, 131,072 tokens, speaks 140+ languages |
llama3 main | meta-llama/Llama-3.2-90B-Vision-Instruct | multimodal (vision), 131,072 tokens |
llama3-sdsc dep | meta-llama/Llama-3.3-70B-Instruct | 8 languages, 131,072 tokens, tool use |
embed-mistral main | intfloat/e5-mistral-7b-instruct | embeddings |
qwen3 eval | Qwen/Qwen3-235B-A22B-Thinking-2507-FP8 | 235B parameters, FP8 quantization, 262,144 tokens, tool calling, Claude and o3 performance |
gorilla eval | gorilla-llm/gorilla-openfunctions-v2 | function calling |
llava-onevision eval | llava-hf/llava-onevision-qwen2-7b-ov-hf | vision |
olmo eval | allenai/OLMo-2-0325-32B-Instruct | open source |
phi3 eval | microsoft/Phi-3.5-vision-instruct | vision |
watt eval | watt-ai/watt-tool-8B | function calling |
