Aug 25, 2023

Docker with postgres and pgvector extension

There is an official image for this, it’s better but this was a good challenge for my limited Docker ability, along with the fact that it would be cool to edit the Laravel sail config to do this at a later point.

Obviously if you need this to be in a production environment find someone who actually knows what they’re doing, or use Supabase they have a postgres with vector embedding option.

This image is for Postgres and pgvector, the idea being with one docker compose up command, it will spin up a docker container without the need to run any manual commands on the database after it’s set up and the vector field can be used right away.

Oh vector fields are for embeddings which use some math magic that is beyond my mental capacities to match the similarity between two pieces of text. Embeddings are generated by an embedding model like ada-002 in the case of OpenAI.

These files can be seen over on my github

First up we have the docker-compose.yml file

version: "3"
name: vectorexample 
services:
  postgres:
    build:
      context: ./postgres
      dockerfile: postgres.Dockerfile
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./postgres/vector_extension.sql:/docker-entrypoint-initdb.d/0-vector_extension.sql
      # - ./postgres/0-vector-extension.sh:/docker-entrypoint-initdb.d/0-vector-extension.sh

    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_DB=vectorexample
volumes:
  postgres_data:

This sets up the build command to look in the postgres folder and use the postgres.Dockerfile
It sets the volumes (which is named on the same level as service) but most importantly it moves our vector_extension.sql file to the docker-endpoint-initdb.d directory which is run on start up, the file start is used to determine the order if we were to have more than one script e.g it’s 0-vector-extension.sql after it’s moved.
There is also a .sh file this was just to experiment with both.
The environment variables are used with the postgres image to setup the user, password and table name, the nice thing here is it will setup these details before running our vector-extension.sql script so the database will exist when we try to install the extension.
Volumes is where we name this volume so when we restart docker our data is all still there.

Next up the postgres.Dockerfile

# This is installing the pgvector extension for postgres
FROM postgres:latest

RUN apt-get update && apt-get install -y \
    build-essential \
    git \
    postgresql-server-dev-all \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /tmp
RUN git clone https://github.com/pgvector/pgvector.git

WORKDIR /tmp/pgvector
RUN make
RUN make install

We use the latest postgres base image hosted on docker.
We run update and install to make sure git and postgres has everything it needs before setting up pgvector.
Git then clones down the repo with the pgvector extension.
Move to where the pgvector was installed and run the command as explained in the install guide.

The vector_extension.sql file

-- Create the 'vector' extension within the database that is set in the docker-compose.yml
CREATE EXTENSION IF NOT EXISTS vector;

Run the create extension command, this should mean when we connect to the database the vector field is available. The reason we can run this without creating a database or connection details is it’s already done by the base image using the details from the docker-compose.yml file.

And we’re done, connect with your preferred sql client using the details specified in the docker-compose file.

Hey, I'm not sure how you got here but this stuff is now over on my Notes. I have left these posts here for now.

Docker with postgres and pgvector extension