Skip to content

NRP-Managed LLMs

The NRP provides several hosted open-weights LLM for either API access, or use with our hosted chat interfaces.

Chat Interfaces

If you are looking to chat with an LLM model similar to the interface provided by ChatGPT, we provide LibreChat, based on the LibreChat project. This is a simple chat interface for all of the NRP hosted models. You can use it to chat with the models, or to test out the models.

Visit the LibreChat interface

On MacOS and Safari you can make it always available in Dock for quick access: having librechat open in Safari, click File->Add to Dock.

API Access to LLMs via Envoy

(Envoy AI Gateway, work in progress)

To access our LLMs through the Envoy AI Gateway, you need to be a member of a group with LLM flag. Your membership info can be found on the namespaces page.

Start from creating a token. You can use this token to query the LLM endpoint

with CURL or any OpenAI API compatible tool.

curl -H "Authorization: Bearer <your_token>" https://ellm.nrp-nautilus.io/v1/models

API Access to LLMs via LiteLLM

(LiteLLM, to be deprecated soon)

API access to the LLM is provided through an LiteLLM proxy. In order to access our LLMs, you need to:

  1. Login to NRP’s LiteLLM instance.

  2. Create an API key. During key creation, you will select the models that the key is allowed to access (or all models).

  3. With the API key, you are able to access the API through the endpoint https://llm.nrp-nautilus.io/. An example of how to use the API is below Python code.

Visit the NRP LiteLLM interface

Examples

Python Code

To access the NRP LLMs, you can use the OpenAI Python client. Below is an example of how to use the OpenAI Python client to access the NRP LLMs.

nrp-llm.py
import os
from openai import OpenAI
client = OpenAI(
# This is the default and can be omitted
api_key = os.environ.get("OPENAI_API_KEY"),
base_url = "https://llm.nrp-nautilus.io/"
)
completion = client.chat.completions.create(
model="gemma3",
messages=[
{"role": "developer", "content": "Talk like a pirate."},
{
"role": "user",
"content": "How do I check if a Python object is an instance of a class?",
},
],
)
print(completion.choices[0].message.content)

Bash+Curl

curl -H "Authorization: Bearer <TOKEN>" https://ellm.nrp-nautilus.io/v1/models
curl -H "Authorization: Bearer <TOKEN>" -X POST "https://ellm.nrp-nautilus.io/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.2-90B-Vision-Instruct",
"messages": [
{"role": "user", "content": "Hey!"}
]
}'

Available Models

main - Model is generally supported. You can report issues with the service.

dep - LLM is deprecated and is likely to go away soon.

eval - The LLM is added for testing and we’re evaluating it’s capabilities. Can be unavailable sometimes and change configuration without notifications.

You can follow all updates in our Matrix Machine Learning channel.

LiteLLM nameModelFeatures
deepseek-r1
main
QuantTrio/DeepSeek-R1-0528-GPTQ-Int4-Int8Mix-Medium685B parameters, INT4/INT8 mixed quantization, 163,840 tokens, tool calling, Claude and o3 performance
gemma3
main
google/gemma-3-27b-itAgentic AI workflows, 131,072 tokens, speaks 140+ languages
llama3
main
meta-llama/Llama-3.2-90B-Vision-Instructmultimodal (vision), 131,072 tokens
llama3-sdsc
dep
meta-llama/Llama-3.3-70B-Instruct8 languages, 131,072 tokens, tool use
embed-mistral
main
intfloat/e5-mistral-7b-instructembeddings
qwen3
eval
Qwen/Qwen3-235B-A22B-Thinking-2507-FP8235B parameters, FP8 quantization, 262,144 tokens, tool calling, Claude and o3 performance
gorilla
eval
gorilla-llm/gorilla-openfunctions-v2function calling
llava-onevision
eval
llava-hf/llava-onevision-qwen2-7b-ov-hfvision
olmo
eval
allenai/OLMo-2-0325-32B-Instructopen source
phi3
eval
microsoft/Phi-3.5-vision-instructvision
watt
eval
watt-ai/watt-tool-8Bfunction calling
NSF Logo
This work was supported in part by National Science Foundation (NSF) awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, OAC-2112167, CNS-2100237, CNS-2120019.