June 13, 2024

LlamaIndex – Design pattern utilizing Chat method of OpenAI Class (Part 1)

Working with Large Language Models (LLMs) as a developer can be challenging due to their complexity and the broad scope of their capabilities. There are numerous ways to interact with them, each with its own nuances and potential pitfalls. In this article, we demonstrate code that implements the "chat" functionality, which is fundamental to how models like ChatGPT operate. The basic process involves providing the LLM with a system instruction, which sets the context for the interaction. You then send a user message to the LLM, which processes this input and generates a response. This interaction can continue with follow-up responses, creating a dynamic conversation. This model showcases the conversational abilities of LLMs, making them ideal for applications such as virtual assistants, customer service bots, and interactive applications that require nuanced, human-like interactions.

Prem Urali

LLM programming intro series:

LlamaIndex is a popular open source library for easily integrating LLMs using standardized design patterns and access models for a wide range of LLMs.

Series

Focus: In a series of articles, we will demonstrate the programming models supported by LllamaIndex. This is meant for technical software architects, developers, LLMOps engineers as well as technical enthusiasts. We provide you with actual working code so you can copy it and use it as you please.

Links

Link	Purpose
https://www.llamaindex.ai/	Main website
https://docs.llamaindex.ai/en/stable/	Documentation website

Objective of this code sample

In this code sample, we will create a Python script that interacts with the LlamaIndex’s OpenAI module. The script begins by checking if an OpenAI API key is present in the environment. If found, it initializes two instances of the OpenAI client, one for the model GPT-3.5 Turbo and another for GPT-4, with the API key set for both. Each client is used to send a series of chat messages predefined in the script, which simulate a conversation asking for travel advice to Paris. The messages include initial system instructions and user queries. After sending these messages, the script captures and processes the responses from both GPT models. It synchronously makes these API calls and collects responses, adding subsequent messages that prompt the models to summarize the travel advice. Finally, the script measures and outputs the elapsed time for these operations, alongside the models’ responses, demonstrating the performance and interaction capabilities with different versions of the GPT models within the LlamaIndex framework.

Learning objectives

1. Get introduced to LlamaIndex as a programming model for interacting with LLMs.
2. Try out a simple design pattern.
3. Set you up for more advanced concepts in future articles.

We begin with importing the packages…

				
					# Note1: make sure to pip install llama_index.core before proceeding
# Note2: make sure your openai api key is set as an env variable as well.
# Import required standard packages 
import time
import os

# Import required LlamaIndex Subpackages
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage

Helper function that checks if the key is present as an env variable

				
					# helper function
def check_key() -> bool:
    # Check for the OpenAI API key in the environment and set it
    # Setting in env is the best way to make llama_index not throw an exception
    if "OPENAI_API_KEY" in os.environ:
        print(f"\nOPENAI_API_KEY detected in env")
        return True
    else:
        return False

Use the helper function check_key() to get the api key. If you do not have one, you can get it here: https://platform.openai.com/api-keys

				
					
def main():

    if check_key():
        openai_client_gpt_3_5_turbo = OpenAI(model="gpt-3.5-turbo")
        openai_client_gpt_3_5_turbo.api_key = os.environ["OPENAI_API_KEY"]

        openai_client_gpt_4 = OpenAI(model="gpt-4")
        openai_client_gpt_4.api_key = os.environ["OPENAI_API_KEY"]

    else:
        print("OPENAI_API_KEY not in env")
        exit(1)  # Exit if no API key is found

    # Define the GPT 3.5 chat messages to initiate the chat
    messages_3_5 = [
        ChatMessage(role="system", content="You are a helpful AI assistant."),
        ChatMessage(role="user", content="Tell me the best day to visit Paris. Then, elaborate.")
    ]

    # Define the GPT4 chat messages to initiate the chat
    messages_4 = [
        ChatMessage(role="system", content="You are a helpful AI assistant."),
        ChatMessage(role="user", content="Tell me the best day to visit Paris. Then, elaborate.")
    ]

Here we are using the synchronous method of calling the LI APIs. Each call will block (ie wait) till all operations encapsulated in that line of code completes.

				
					# timed section
   # Get the current time
    start_time = time.time()

    # Synchronously (blocking call made) call GPT-3.5-turbo

    response_3_5 = openai_client_gpt_3_5_turbo.chat(messages_3_5)
    messages_3_5.append(ChatMessage(role="assistant", content=str({response_3_5.message.content})))
    next_prompt_3_5 = ChatMessage(role="user", content="Summarize it in 30 words.")
    messages_3_5.append(next_prompt_3_5)
    response_3_5 = openai_client_gpt_3_5_turbo.chat(messages_3_5)


    response_4 = openai_client_gpt_4.chat(messages_4)
    messages_4.append(ChatMessage(role="assistant", content=str({response_4.message.content})))
    next_prompt_4 = ChatMessage(role="user", content="Summarize it in 30 words.")
    messages_4.append(next_prompt_4)
    response_4 = openai_client_gpt_4.chat(messages_4)


    # Get the end time
    end_time = time.time()

In this final step, we print out the responses from the LLM. We also print out the time it took to get a run completed.

				
					
    # Print the responses with labels
    print("\nResponse from GPT-3.5-turbo:")
    print(response_3_5.message.content)

    # Print the responses with labels
    print("\nResponse from GPT4:")
    print(response_4.message.content)

    # Calculate the elapsed time in seconds
    elapsed_time = end_time - start_time

    # Format the elapsed time to two decimal places
    formatted_time = "{:.2f}".format(elapsed_time)

    # Print the formatted time
    print(f"Elapsed time: {formatted_time} seconds")
    
    
if __name__ == "__main__":
    main()

We are working on getting you a Jupyter notebook version of this code. Also, an easy way for you to get this code from Github. We will also be publishing a YouTube series of videos that go over this material and then some more. Please stay tuned.

				
					#sample output
# # Output
# OPENAI_API_KEY detected in env

# Response from GPT-3.5-turbo:
# The best time to visit Paris is in spring or fall for mild weather, fewer crowds, blooming flowers, vibrant foliage, cultural events, and charming ambiance in the city.

# Response from GPT4:
# The best time to visit Paris is spring (April-June) or fall (September-November) for pleasant weather and smaller crowds. Spring offers blooming parks, while fall hosts cultural events and beautiful autumn colors.
# Elapsed time: 22.70 seconds

Subscribe to our newsletter

Join over 1,000+ other people who are mastering AI in 2024

You will be the first to know when we publish new articles