Postgres in Docker with persistent storage

keep the postgres data across container restarts

Yes, it’s perfectly fine to run databases in containers. The only challenge is to make sure that the data stored by the database does not reside within the file system of the container. Otherwise, after removing the container, the data will be gone, too.

§basic docker

Let’s have a look at the most basic example from the Postgres Docker Hub:

1
2
3
4
docker run \
    --name some-postgres \
    -e POSTGRES_PASSWORD=mysecretpassword \
    -d postgres

The problem here is, once we stop working with this container, if we do docker stop some-postgres && docker rm some-postgres, the data stored in the database will be gone because the default Postgres data directory resides within the container file system.

To prevent this from happening, the Postgres container can be configured with a volume mapped to the host directory. For example, to store the data in the /tmp/postgres-data directory of the host, the container can be started like this:

1
2
3
4
5
6
docker run -d \
    --name some-postgres \
    -e POSTGRES_PASSWORD=mysecretpassword \
    -e PGDATA=/var/lib/postgresql/data/pgdata \
    -v /tmp/postgres-data:/var/lib/postgresql/data \
    postgres

The -v, or long –volume, option has the form of [host path]:[container path]. The PGDATA environment variable tells the docker-entrypoint.sh to configure the pgdata directory with its value. At the same time, the container starts with that directory mapped from the host.

The -v option can be specified multiple times to map different directories or files. The option is documented here.

§docker compose

When working with Docker Compose, the best option is to use a bind volume mount. The simplest example would be:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
version: '3.3'
services:
  postgres:
    image: postgres:13.2
    restart: unless-stopped
    environment:
      POSTGRES_DB: mydb
      POSTGRES_USER: myuser
      POSTGRES_PASSWORD: mysecretpassword
      PGDATA: /var/lib/postgresql/data/pgdata
    ports:
      - "5432:5432"
    volumes:
      -
        type: bind
        source: /tmp/postgres-data
        target: /var/lib/postgresql/data
    networks:
      - reference
networks:
  reference:

§running in the cloud

When running containers in a cloud environment, say on an EC2 instance in AWS, the preferred way is to use an EBS like volume (block storage) as the host volume for the container host path location. This will ensure data survivability across container instances and VM instances, as long as the EBS volume with the host directory is attached to the expected host running the Docker container.