Docker with postgres and pgvector extension
There is an official image for this, it’s better but this was a good challenge for my limited Docker ability, along with the fact that it would be cool to edit the Laravel sail config to do this at a later point.
Obviously if you need this to be in a production environment find someone who actually knows what they’re doing, or use Supabase they have a postgres with vector embedding option.
This image is for Postgres and pgvector, the idea being with one docker compose up
command, it will spin up a
docker container without the need to run any manual commands on the database after it’s set up and the vector field can be used right away.
Oh vector fields are for embeddings which use some math magic that is beyond my mental capacities to match the similarity between two pieces of text. Embeddings are generated by an embedding model like ada-002 in the case of OpenAI.
These files can be seen over on my github
First up we have the docker-compose.yml
file
version: "3"
name: vectorexample
services:
postgres:
build:
context: ./postgres
dockerfile: postgres.Dockerfile
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
- ./postgres/vector_extension.sql:/docker-entrypoint-initdb.d/0-vector_extension.sql
# - ./postgres/0-vector-extension.sh:/docker-entrypoint-initdb.d/0-vector-extension.sh
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=vectorexample
volumes:
postgres_data:
- This sets up the build command to look in the
postgres
folder and use thepostgres.Dockerfile
- It sets the volumes (which is named on the same level as service) but most importantly it moves our
vector_extension.sql
file to thedocker-endpoint-initdb.d
directory which is run on start up, the file start is used to determine the order if we were to have more than one script e.g it’s0-vector-extension.sql
after it’s moved. - There is also a .sh file this was just to experiment with both.
- The environment variables are used with the postgres image to setup the user, password and table name, the nice thing here is it will setup these details before running our
vector-extension.sql
script so the database will exist when we try to install the extension. - Volumes is where we name this volume so when we restart docker our data is all still there.
Next up the postgres.Dockerfile
# This is installing the pgvector extension for postgres
FROM postgres:latest
RUN apt-get update && apt-get install -y \
build-essential \
git \
postgresql-server-dev-all \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /tmp
RUN git clone https://github.com/pgvector/pgvector.git
WORKDIR /tmp/pgvector
RUN make
RUN make install
- We use the latest postgres base image hosted on docker.
- We run update and install to make sure git and postgres has everything it needs before setting up pgvector.
- Git then clones down the repo with the pgvector extension.
- Move to where the pgvector was installed and run the command as explained in the install guide.
The vector_extension.sql
file
-- Create the 'vector' extension within the database that is set in the docker-compose.yml
CREATE EXTENSION IF NOT EXISTS vector;
- Run the create extension command, this should mean when we connect to the database the vector field is available. The reason we can run this without creating a database or connection details is it’s already done by the base image using the details from the
docker-compose.yml
file.
And we’re done, connect with your preferred sql client using the details specified in the docker-compose file.