Advanced AI Customization

Envoy's core intelligence for drafting cover letters, evaluating job fit, and conducting profile interviews is powered by large language models (LLMs). While Envoy comes with sensible defaults, advanced users can customize these AI components to suit their specific needs, integrate different LLM providers, or experiment with various models and parameters.

This section outlines how to configure Envoy's AI behavior.

Configuring LLM Providers and Models

Envoy is designed to be compatible with OpenAI's API interface, which means you can use OpenAI's official models, self-hosted solutions, or other services that mimic the OpenAI API.

The following environment variables control which LLM provider and model Envoy uses:

OPENAI_BASE_URL: The base URL for the OpenAI-compatible API endpoint.
OPENAI_API_KEY: The API key required to authenticate with your chosen LLM service.
OPENAI_MODEL: The specific model identifier to use (e.g., gpt-4o-mini, ollama/llama3).

You can set these environment variables in your shell before starting Envoy, or in a .env file in the Envoy agent directory.

Example: Using OpenAI's GPT-4o-mini

To use OpenAI's latest efficient model, gpt-4o-mini:

export OPENAI_BASE_URL="https://api.openai.com/v1"
export OPENAI_API_KEY="sk-YOUR_OPENAI_API_KEY"
export OPENAI_MODEL="gpt-4o-mini"

Example: Using a Self-Hosted LLM (e.g., Ollama, LM Studio)

Many local LLM runners (like Ollama or LM Studio) provide an OpenAI-compatible API endpoint. This allows you to run powerful models locally without incurring API costs or sending your data externally.

To use a local Ollama instance running llama3:

Ensure Ollama is running and has the llama3 model downloaded (ollama pull llama3).

Start Envoy with the following environment variables:

export OPENAI_BASE_URL="http://localhost:11434/v1" # Default Ollama API endpoint
export OPENAI_API_KEY="ollama" # An API key is often not required for local setups, but some clients expect a value. 'ollama' is a common placeholder.
export OPENAI_MODEL="ollama/llama3" # Format depends on your local LLM setup. For Ollama, it's typically 'ollama/<model_name>'.

To use LM Studio with a model like Meta-Llama-3-8B-Instruct-GGUF (replace with your chosen model):

Ensure LM Studio is running, you've downloaded your desired model, and the local server is started.

Start Envoy with the following environment variables:

export OPENAI_BASE_URL="http://localhost:1234/v1" # Default LM Studio API endpoint
export OPENAI_API_KEY="lm-studio" # Placeholder API key
export OPENAI_MODEL="lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF" # Or whatever model name LM Studio exposes

Adjusting LLM Generation Parameters

Envoy uses parameters like temperature and max_tokens when making API calls to the LLM. These parameters influence the creativity, verbosity, and cost of the generated text.

temperature: Controls the randomness of the output. Higher values (e.g., 0.7-1.0) make the output more varied and creative, while lower values (e.g., 0.1-0.5) make it more focused and deterministic.
max_tokens: The maximum number of tokens (words/sub-words) the LLM will generate in its response.

Current Implementation Note: As of the current version, temperature and max_tokens are hardcoded within the agent/app/services/profile_interview_ai.py file (and similar AI service files).

For example, in plan_profile_question:

        response = await client.chat.completions.create(
            model=settings.openai_model,
            messages=[
                {"role": "system", "content": system},
                {"role": "user", "content": user},
            ],
            temperature=0.2, # Hardcoded temperature
            max_tokens=180,  # Hardcoded max_tokens
        )

To adjust these parameters:

Locate the relevant AI service files (e.g., agent/app/services/profile_interview_ai.py).
Modify the temperature and max_tokens values directly in the client.chat.completions.create calls.
Restart the Envoy agent service for changes to take effect.

Caution: Modifying these values directly in the source code requires a basic understanding of Python and Envoy's agent structure. Ensure you back up any changes and consider the potential impact on generation quality and performance.

Customizing AI Prompts (Advanced)

The "personality" and specific instructions given to the LLM are defined by its system and user prompts. These prompts guide the LLM on how to interpret requests and format its responses for tasks like planning interview questions or interpreting answers.

For example, the plan_profile_question function uses the following system prompt to define the LLM's role:

    system = (
        "You are a professional resume writer interviewing a candidate. "
        "Ask exactly one concise follow-up question that will best improve the missing STAR signal. "
        "Also provide a cautious example answer based only on the evidence already present, "
        "plus 1-3 short source-basis bullets and one practical coaching hint. "
        "Do not ask multiple questions. Do not invent facts or metrics. "
        "Return ONLY valid JSON with this shape: "
        "{\"question\": \"...\", \"suggested_answer\": \"...\", "
        "\"source_basis\": [\"...\"], \"improvement_hint\": \"...\"}."
    )

Customization: Changing these prompts allows for a high degree of control over Envoy's AI behavior. You could:

Modify the persona (e.g., "a direct recruiter" instead of "a resume writer").
Adjust the output format or content requirements.
Guide the LLM towards specific types of questions or interpretations.

Implementation: To customize prompts:

Identify the specific AI service function responsible for the behavior you want to change (e.g., plan_profile_question, interpret_profile_answer in agent/app/services/profile_interview_ai.py).
Edit the system and user prompt strings directly in the source code.
Restart the Envoy agent service.

Considerations: Prompt engineering is an iterative process. Small changes can have significant impacts. Ensure your custom prompts are clear, concise, and guide the LLM effectively to achieve your desired outcome while maintaining the required JSON output structure.