Secure by design
The enterprise API key belongs only on your backend. Every client-side interaction should use temporary session credentials returned by your server.
This guide shows how to create an API key, configure an agent, generate a session from your backend, and pass only the short-lived credentials to the browser.
Browser receives
Short-lived session token
LiveKit websocket URL
Keep `x-api-key` and any backend credentials server-side. The frontend only needs enough information to connect to the audio room.
The enterprise API key belongs only on your backend. Every client-side interaction should use temporary session credentials returned by your server.
Sessions can inherit agent defaults or override language, prompt, greeting, and voice for a single call.
Once your backend returns the session payload, the frontend uses livekit-client to stream the agent's voice and exchange text-chat messages over the same LiveKit data channel.
Setup flow
Open app.oshara.ai/settings?tab=developer and create an API key that starts with sk_... Store it only in your backend secrets store. The key is shown once and must never be exposed to the browser.
Go to app.oshara.ai/agents and create an agent. Configure the default reference voice, system prompt, greeting message, and language. Keep the Agent ID from the agent detail page because you will send it when creating a session.
Optionally attach documents to the Knowledge Base and connect an MCP server if the agent needs external tools or actions during calls.
Call the enterprise API directly from your backend. Send the API key in x-api-key, include the required agent field, and override language, system prompt, greeting, or reference_audio_url only when needed for that session.
A successful response returns a short-lived token, the LiveKit websocket URL, and a unique room name. These credentials are generated server-side for a single session handshake.
Return only the token and livekit_url to the browser. Keep the API key and any other backend-only secrets away from the client at all times.
Install livekit-client in your frontend app and join the room with the temporary session credentials. The same room delivers the agent's voice over an audio track and exchanges chat messages over the data channel — publish user messages on the voice.user_text topic and render agent replies received on voice.reply.
Call the session endpoint from your backend service. Send the API key as a request header, and include the agent ID as the required JSON field.
POST https://api.oshara.ai/api/agents/agent-session/
x-api-key: sk_...
Content-Type: application/json
{
"agent": "agent_id",
"language": "en",
"system_prompt": "...",
"greeting": "...",
"reference_audio_url": "..."
}After the backend creates the session, return only the credentials needed by the frontend. The token authorizes the room connection, and the LiveKit URL tells the client where to connect.
{
"success": true,
"message": "Request successful",
"data": {
"token": "eyJ...",
"livekit_url": "wss://audio-inference.oshara.ai",
"room_name": "unique_room_id"
}
}Install `livekit-client` in the frontend app and connect using the credentials returned by your backend. The same room handles both the spoken audio track and a text-chat channel — publish user messages on the `voice.user_text` topic and listen for agent replies on `voice.reply`.
import { Room, RoomEvent, Track } from "livekit-client";
type AgentSession = {
token: string;
livekit_url: string;
room_name?: string;
};
type ChatMessage = {
role: "user" | "assistant";
text: string;
};
// The agent echoes typed messages back on "voice.reply" with
// type: "user_text" — the same shape it uses for STT transcripts of
// spoken input. We track texts the user just typed so we can render them
// optimistically and skip the matching echo when it arrives.
const pendingUserTextsByRoom = new WeakMap<Room, string[]>();
async function getAgentSession(agentId: string): Promise<AgentSession> {
const response = await fetch("/api/voice-agent/session", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
agent: agentId,
language: "en",
system_prompt: "...",
greeting: "...",
reference_audio_url: "...",
}),
});
if (!response.ok) {
const message = await response.text();
throw new Error(message || "Failed to create voice-agent session");
}
const payload = await response.json();
return payload.data as AgentSession;
}
export async function startVoiceAgentCall(
agentId: string,
onMessage?: (message: ChatMessage) => void,
) {
const room = new Room({
adaptiveStream: true,
dynacast: true,
});
pendingUserTextsByRoom.set(room, []);
room.on(RoomEvent.Connected, () => {
console.log("Connected to voice agent room");
});
room.on(RoomEvent.Disconnected, () => {
pendingUserTextsByRoom.delete(room);
console.log("Disconnected from room");
});
// Subscribe to the agent's spoken audio track.
room.on(RoomEvent.TrackSubscribed, (track, publication, participant) => {
if (track.kind === Track.Kind.Audio) {
const element = track.attach();
element.autoplay = true;
document.body.appendChild(element);
console.log("Subscribed to audio track from", participant.identity);
}
});
// Receive agent replies and STT transcripts on the "voice.reply" topic.
room.on(RoomEvent.DataReceived, (payload, _participant, _kind, topic) => {
if (topic !== "voice.reply") return;
try {
const data = JSON.parse(new TextDecoder().decode(payload));
const text = (data.text || data.message || "").trim();
if (!text) return;
// Drop echoes of messages the user just typed (already rendered).
if (data.type === "user_text") {
const pending = pendingUserTextsByRoom.get(room);
if (pending) {
const index = pending.indexOf(text);
if (index !== -1) {
pending.splice(index, 1);
return;
}
}
}
onMessage?.({
role: data.type === "user_text" ? "user" : "assistant",
text,
});
} catch {
// Ignore unparseable data frames.
}
});
try {
const session = await getAgentSession(agentId);
if (!session.token || !session.livekit_url) {
throw new Error("Session response is missing token or livekit_url");
}
await room.connect(session.livekit_url, session.token, {
autoSubscribe: true,
});
await room.localParticipant.setMicrophoneEnabled(true);
return room;
} catch (error) {
pendingUserTextsByRoom.delete(room);
room.disconnect();
throw error;
}
}
// Send a user-typed message into the live agent session. Render the
// message in your UI immediately after calling this — the matching echo
// from the agent will be deduped by the DataReceived handler above.
export async function sendTextMessage(room: Room, text: string) {
const trimmed = text.trim();
if (!trimmed) return;
pendingUserTextsByRoom.get(room)?.push(trimmed);
await room.localParticipant.publishData(
new TextEncoder().encode(
JSON.stringify({
type: "user_text",
text: trimmed,
}),
),
{
reliable: true,
topic: "voice.user_text",
},
);
}
export async function endVoiceAgentCall(room: Room | null) {
if (!room) return;
pendingUserTextsByRoom.delete(room);
await room.localParticipant.setMicrophoneEnabled(false);
room.disconnect();
}A drop-in React component that wires the helpers above to a small UI — a connection indicator, start/end buttons, a message list, and a text input. Drop it into any client route and pass an agentId.
"use client";
import { useEffect, useRef, useState } from "react";
import type { Room } from "livekit-client";
import {
endVoiceAgentCall,
sendTextMessage,
startVoiceAgentCall,
} from "./voice-agent";
type ChatMessage = {
role: "user" | "assistant";
text: string;
};
export function CallView({ agentId }: { agentId: string }) {
const roomRef = useRef<Room | null>(null);
const [status, setStatus] = useState<"idle" | "connecting" | "live">("idle");
const [messages, setMessages] = useState<ChatMessage[]>([]);
const [draft, setDraft] = useState("");
useEffect(() => {
return () => {
endVoiceAgentCall(roomRef.current);
roomRef.current = null;
};
}, []);
const handleStart = async () => {
if (status !== "idle") return;
setStatus("connecting");
try {
const room = await startVoiceAgentCall(agentId, (message) => {
setMessages((prev) => [...prev, message]);
});
roomRef.current = room;
setStatus("live");
} catch (error) {
console.error(error);
setStatus("idle");
}
};
const handleEnd = async () => {
await endVoiceAgentCall(roomRef.current);
roomRef.current = null;
setStatus("idle");
};
const handleSend = async (event: React.FormEvent) => {
event.preventDefault();
const text = draft.trim();
if (!text || !roomRef.current) return;
setMessages((prev) => [...prev, { role: "user", text }]);
setDraft("");
await sendTextMessage(roomRef.current, text);
};
return (
<div className="mx-auto flex h-[520px] w-full max-w-md flex-col rounded-2xl border border-slate-200 bg-white shadow-sm">
<header className="flex items-center justify-between border-b border-slate-200 px-4 py-3">
<div className="flex items-center gap-2">
<span
className={
"h-2.5 w-2.5 rounded-full " +
(status === "live"
? "bg-emerald-500"
: status === "connecting"
? "bg-amber-500"
: "bg-slate-300")
}
/>
<p className="text-sm font-medium text-slate-700">
{status === "live"
? "Connected"
: status === "connecting"
? "Connecting…"
: "Not connected"}
</p>
</div>
{status === "idle" ? (
<button
type="button"
onClick={handleStart}
className="rounded-full bg-orange-500 px-3 py-1.5 text-sm font-medium text-white"
>
Start call
</button>
) : (
<button
type="button"
onClick={handleEnd}
className="rounded-full bg-slate-900 px-3 py-1.5 text-sm font-medium text-white"
>
End call
</button>
)}
</header>
<div className="flex-1 space-y-3 overflow-y-auto px-4 py-4">
{messages.length === 0 ? (
<p className="text-sm text-slate-400">
Start the call and speak — or type a message below.
</p>
) : (
messages.map((message, index) => (
<div
key={index}
className={
"flex " +
(message.role === "user" ? "justify-end" : "justify-start")
}
>
<span
className={
"max-w-[80%] rounded-2xl px-3 py-2 text-sm " +
(message.role === "user"
? "bg-orange-500 text-white"
: "bg-slate-100 text-slate-800")
}
>
{message.text}
</span>
</div>
))
)}
</div>
<form
onSubmit={handleSend}
className="flex items-center gap-2 border-t border-slate-200 px-4 py-3"
>
<input
value={draft}
onChange={(event) => setDraft(event.target.value)}
placeholder="Type a message"
disabled={status !== "live"}
className="flex-1 rounded-full border border-slate-200 bg-slate-50 px-4 py-2 text-sm outline-none focus:border-orange-300"
/>
<button
type="submit"
disabled={status !== "live" || !draft.trim()}
className="rounded-full bg-orange-500 px-4 py-2 text-sm font-medium text-white disabled:opacity-50"
>
Send
</button>
</form>
</div>
);
}