code
February 4, 2023

Make GPT-3 work for you

Little demo of final results (with slow frame rate). GPT-3 generate a prompt itself and final SQL query.

Hi! I have never written articles, this will be my first time.

At my job, we use PostgreSQL for data storage and often need to write many small, simple queries for the analytics department. I wanted to try GPT-3 and see how it could help with this task. However, using the GPT-3 Playground is inconvenient for this task because you need to manually describe the structure of all the tables first.

So, I decided to write a small script that would convert the database structure into a text query that could be used with instructions to create real SQL queries. I chose to use Python for this project because I wanted to try a new programming language.

I figured this would be a good way to try out a new programming language and try to write something useful.

Here was the plan for solving the problem:

  1. Connect to the database and retrieve the table structure with fields and field comments
  2. Generate a text query based on the retrieved structure
  3. Wait for the user to enter a request
  4. Ask GPT-3 to generate a real SQL query by passing it the text query with the database structure and the user’s request
  5. Allow the user to edit the received query and check it for errors (using https://sqlformat.org/)
  6. Run the query and display the result (with a minimal UI and the ability to export data for further analysis)

Writing in new languages thanks to Copilot (it based on GPT-3 Codex model) is a great experience, and I highly recommend trying it with any programming language you are new to.

For those who are unfamiliar with GPT-3: it is the most advanced neural network based on the GPT (Generative Pre-trained Transformer) principle. It can generate text, perform natural language processing tasks, solve image processing tasks, and more.

Andréj Karpathy talks about how GPT-3 works in this video: https://www.youtube.com/watch?v=kCc8FmEb1nY.

Also, it is worth mentioning that this is the same network that powers ChatGPT (https://chat.openai.com).

To connect to the database and retrieve the table structure, I used the psycopg2 library, which was recommended to me by GPT-3. Before we get started, let’s install all the necessary dependencies for the project:

pip install psycopg2 openai

Now, start with Schema class:

class Schema:
	"""Generate SQL Schema from PostgreSQL"""

	def __init__(self, schema = 'public'):
		"""Connect to PostgreSQL database"""
		self.schema = schema
		try:
			self.conn = psycopg2.connect(DATABASE_URL)
		except psycopg2.OperationalError as err:
			print(f'Unable to connect!\n{err}')
			sys.exit(1)
		else:
			print('Connected to PostgreSQL database successfully.')
		self.cur = self.conn.cursor()
		self.comments = []
		self.tables = []
		self.columns = []

Next, we retrieve the list of tables and their comments. Since comments are stored in a separate location in PostgreSQL, we need to execute an additional query to get all comments for all tables:

def get_tables(self):
	"""Get list of tables"""
	self.cur.execute("SELECT table_name FROM information_schema.tables WHERE table_schema = %s", (self.schema,))
	tables = self.cur.fetchall()
	self.tables = tables
	return tables

def get_all_comments(self):
	"""Get list of all comments"""
	self.cur.execute('select c.table_schema, c.table_name,  c.column_name, pgd.description from pg_catalog.pg_statio_all_tables as st inner join pg_catalog.pg_description pgd on (pgd.objoid = st.relid) inner join information_schema.columns c on (pgd.objsubid   = c.ordinal_position and c.table_schema = st.schemaname and c.table_name   = st.relname);')
	comments = self.cur.fetchall()
	self.comments = comments
	return comments

	

Next, we obtain a list of fields and their data types for each table.

def get_columns(self, table):
	"""Get list of columns for a table"""
	self.cur.execute("SELECT column_name, data_type FROM information_schema.columns WHERE table_schema = %s AND table_name = %s", (self.schema, table))
	columns = self.cur.fetchall()
	return columnsWe will generate the full structure of the base. In my repository on GitHub you can find the generation for the tables you need, but in this example I will simply generate the structure for all tables.

The less extra tables are used in generation, the easier it is to get the correct result and the cheaper the request to the OpenAI API will cost.

We compile all the information into a single string to create the text structure of the database:

def index(self):
	"""Generate SQL Schema"""
	prompt = ''
	tables = self.get_tables()
	comments = self.get_all_comments()
	for table in tables:
		columns = self.get_columns(table[0])
		prompt += f'The "{table[0]}" table has columns: '
		for column in columns:
			cmnt = ''
			for comment in comments:
				if comment[0] == self.schema and comment[1] == table[0] and comment[2] == column[0]:
					cmnt = comment[3]
					break
			if cmnt == '':
				prompt += f'{column[0]} ({column[1]}), '
			else:
				prompt += f'{column[0]} ({column[1]} - {cmnt}), '
		prompt = prompt[:-2] + '. '    
	return prompt

We now have the text structure of the database. At this point, I created a simple UI using Vue and Bootstrap for ease of use with the API. You can see what it looks like in the first GIF at the beginning of the article.

The code for the UI is available in the GitHub repository.

For demonstration purposes, I will be using a simple text input, but you can use any other convenient method. We save the resulting class code in the schema.py file and then import it in the main.py file.

Remember to obtain an OpenAI API key to access the API. You can get one at: https://platform.openai.com/account/api-keys.

import openai
from schema import Schema
openai.api_key = 'sk-......'
prompt = input('Enter prompt: ')

Now we need to properly address GPT-3 and tell what we expect from it. Here you need to connect your creativity in making a request. I was inspired by some requests I found during the implementation on the seekwell.io blog.

My request ends up looking like this:

Given an input question, respond with syntactically correct PostgreSQL. Be creative but the SQL must be correct, not nessesary to use all tables. {sql_schema}\n\nInstructions: {prompt}\n\nSQL:

Where is:

  1. {sql_schema} — the base text structure we got earlier
  2. {prompt} — the user’s input from CLI

For example, we get the following request:

Given an input question, respond with syntactically correct PostgreSQL. Be creative but the SQL must be correct, not nessesary to use all tables. The “public”.”users” table has columns: id (integer — user id), name (text — user name), email (text — user email). The “public”.”posts” table has columns: id (integer — post id), title (text — post title), body (text — post body), user_id (integer — user id). The “public”.”comments” table has columns: id (integer — comment id), body (text — comment body), post_id (integer — post id), user_id (integer — user id). Instructions: Give me all posts by user where email hosted on gmail.com SQL:

As a result from GPT-3, we can get the following answer (the example was generated by copilot at the time of writing the article, this is mind-blowing!):

SELECT * FROM posts WHERE user_id IN (SELECT id FROM users WHERE email LIKE ‘%@gmail.com’);

OK. How to do it in code!

First, we need to set the correct set of parameters for the request in the GPT-3 API. We have these important parameters to pass into query:

  1. query_temperture - is the temperature that determines how much the model will experiment with the responses. The higher the temperature, the more experiments the model will do. The lower the temperature, the more specific the answers will be. I recommend using 0.5–0.7
  2. max_tokens is the maximum number of tokens that the model can generate. I recommend using 100–150 if you don’t expect a large SQL query to be returned.
  3. engine is the model itself that we want to use. I recommend using davinci, but you can try other models like ada or babbage. The most recent version at the time of writing is text-davinci-003. Different models have different response quality and price, so I recommend using davinci if you have the opportunity.
  4. prompt is what we would like to receive in response. In our case, this is the question we ask the user.
  5. stop — gpt-3 will stop generating text when it encounters one of the stop tokens. In our case, we will use the \n\n token, which means that the model will stop generating text when it encounters two new lines.

Let’s now make a request to the GPT-3 API and get a response:

query_temperture = 0.5
final_prompt = f'Given an input question, respond with syntactically correct PostgreSQL. Be creative but the SQL must be correct, not nessesary to use all tables.\n\n{sql_schema}\n\nInstructions: {prompt}\n\nSQL:\n'

gpt_response = openai.Completion.create(
	engine="text-davinci-003",
	prompt=final_prompt,
	temperature=float(query_temperture),
	max_tokens=500,
	stop=["\n\n"]
)

print(f'GPT-3 response: {gpt_response["choices"][0]["text"]}')

You can find the full implementation here: https://github.com/Hormold/gpt-sql-box/blob/master/cli.py

That’s all! Now we can get a response from the GPT-3 API and send it to PostgreSQL for execution.

Be careful because the GPT-3 API may return invalid SQL query, which can lead to data loss — carefully check the model responses before executing the query.

We can easily change the database to a SQL like database such as MySQL or MariaDB by simply changing the way the database schema is built and making small changes to the final_prompt.

For the convenient use of this code, I made a small project on GitHub with minimal functionality and interface for easy use: https://github.com/Hormold/gpt-sql-box

Here is Midjourney illustration for this small project

If you want to use the GPT-3 API in your projects, then you will need to register at https://platform.openai.com and get an API key.

I hope you enjoyed the article and will be able to use the GPT-3 API in your next projects.

If you have questions, then write to me on Telegram @define and I will try to help you.