Building a Custom Coach AI Assistant with OpenAI and Pinecone

TL;DR: This post walks through building a personalized AI assistant that helps coaches track and summarize client progress using OpenAI and Pinecone. We'll cover why this setup is useful, installation steps, and a working code implementation.

Table of Contents:

Why Use Pinecone and OpenAI?
The Concept
Installation
Implementation Overview
1. Setting Up Pinecone and OpenAI
2. Chunking and Saving Client Memories
3. Querying Memories and Generating a Response
4. Creating the API Route
Final Thoughts

Why Use Pinecone and OpenAI?

By combining a vector database like Pinecone with a powerful LLM like OpenAI’s GPT-4o-mini, we can build an intelligent system that remembers everything a coach’s client has shared—then uses that memory to generate insightful, personalized summaries and updates.

Instead of dumping a massive context into every AI request, we vectorize client notes, transcripts, and summaries, then selectively query only the relevant memories when needed.

This keeps the AI’s input concise, focused, and affordable to run.

The Concept

Store client data (like coaching notes, session transcripts, action steps) as vector embeddings inside Pinecone.
Query Pinecone dynamically for the most relevant memories.
Generate AI responses using OpenAI by feeding in the retrieved memories.
Wrap it all in an easy API endpoint the Coach can hit from the frontend.

This approach ensures the AI stays grounded in true client history rather than making up information.

Installation

You'll need a few key dependencies:

yarn add openai @pinecone-database/pineconeCopied!

Note: I used a Node.js server with Next.js for my backend. If you use a different stack, check the Pinecone Docs for setup instructions specific to your framework/language.

Also make sure you have environment variables for:

OPENAI_API_KEY
PINECONE_API_KEY
PINECONE_HOST

Implementation Overview

Let’s walk through the high-level pieces needed to make this work.

1. Setting Up Pinecone and OpenAI

First, create reusable clients to interact with OpenAI and Pinecone:

import OpenAI from 'openai';
import { Pinecone } from '@pinecone-database/pinecone';
 
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });Copied!

You can now use openai to create completions and pinecone to upsert/query vectors.

2. Chunking and Saving Client Memories

To keep memory sizes manageable, I chunk text into smaller pieces before inserting them into Pinecone:

export function chunkTextForE5(
  text: string,
  maxChars: number = 1800,     
  overlapChars: number = 300
): string[] {
  if (!text.trim()) return [];
  const chunks: string[] = [];
  let start = 0;
  while (start < text.length) {
    const end = Math.min(start + maxChars, text.length);
    chunks.push(text.slice(start, end));
    start += maxChars - overlapChars;
  }
  return chunks;
}Copied!

Then upsert those chunks into Pinecone using a namespace based on the client’s ID:

export async function upsertClientMemory(
  clientId: string, 
  inputId: string, 
  textChunks: string[], 
  userId: string, 
  coachId: string, 
  createdAt: Date,
  type: 'transcript' | 'notes' | 'summary' | 'actionSteps'
) {
  const namespace = pinecone.index("clients", process.env.PINECONE_HOST).namespace(clientId);
  const now = new Date().toISOString();
  await namespace.upsertRecords(textChunks.map((text, index) => ({
    _id: `session-${inputId}-${type}${index > 0 ? `-${index}` : ''}`,
    text,
    clientId,
    type,
    createdAt: createdAt ? createdAt.toISOString() : now,
    updatedAt: now,
    userId,
    coachId,
  })));
}Copied!

3. Querying Memories and Generating a Response

When the coach wants a summary, we query the most relevant memories:

export async function queryClientMemories(clientId: string, query: string) {
  const namespace = pinecone.index("clients", process.env.PINECONE_HOST).namespace(clientId);
  const response = await namespace.searchRecords({
    query: { topK: 2, inputs: { text: query } },
    fields: ['chunk_text', 'type'],
  });
  return response?.result?.hits || [];
}Copied!

Then use those memories in an OpenAI chat completion:

export async function generateResponse(clientId: string, query: string) {
  const memories = await queryClientMemories(clientId, query);
  const completion = openai.chat.completions.create({
    model: "gpt-4o-mini",
    store: true,
    messages: [
      { role: "user", content: query },
      { role: "assistant", content: memories.map(memory => memory.fields).join('\n') },
    ],
  });
  return completion;
}Copied!

4. Creating the API Route

Here’s a simple pages/api/coach/clients/[id]/ai.ts file in Next.js that puts it all together:

import { queryClientMemories } from '@/lib/ai/ai.util';
import OpenAI from 'openai';
import { prisma } from '@/lib/prisma';
import { CoachMiddleware } from '@/lib/middleware/coach';
 
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
 
async function handler(req, res, session) {
  if (req.method !== 'POST') {
    return res.status(405).json({ message: 'Method Not Allowed' });
  }
 
  const { id } = req.query;
  if (!session?.user?.id) {
    return res.status(400).json({ message: 'Missing session' });
  }
 
  const client = await prisma.client.findUnique({
    where: { id: id, coachId: session.user.id },
    include: { client: { select: { name: true } } }
  });
 
  const coach = await prisma.user.findUnique({ where: { id: session.user.id } });
 
  const memories = await queryClientMemories(id, "summary of the client goals challenges progress");
 
  const messages = [
    { role: "user", content: "You are a helpful coaching assistant. Use the following memories to answer questions about a client." },
    ...memories.map(memory => ({
      role: "assistant",
      content: `Memory:\n${JSON.stringify(memory.fields)}`
    })),
    { 
      role: "user", 
      content: `
Coach: ${coach?.name || 'Coach'}
Client: ${client.client?.name || 'Unnamed Client'}
Client Details: ${client.notes || 'No extra notes available.'}
      
Please give me a summary of the client. Include their goals, challenges, and progress. Explain what they are currently working on.
`.trim()
    }
  ];
 
  const completion = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    temperature: 0,
    store: true,
    messages,
  });
 
  return res.status(200).json({ response: completion });
}
 
export default CoachMiddleware(handler);Copied!

This endpoint allows the frontend to simply POST to /api/coach/clients/:id/ai and get back an AI-generated coaching summary!

Final Thoughts

Using Pinecone and OpenAI together creates a memory-enhanced AI experience that stays personalized to each individual client. It’s fast, cost-efficient, and prevents hallucinations that happen when AI doesn’t have enough grounding context.

This setup is also highly extendable—next I'll use it to track client habits, action steps, session insights, and more.

You can use this setup in any context and any application to create "trained" AI's tailored to your specific platform and use-case! Let me know if you found this helpful in the comments below.