Knora stack set-up and container memory limits

Hello everybody,

we recently suffered of memory exhaustion on our prod server.
The server is only running the knora stack.
The OS chose to kill webapi (which did not manage to restart, but that’s another issue).

In order avoid this, I want to fix size limits to the containers.

So I looked at the documentation and advises.

It is not recommended to set index/cache (entity pool) of graphdb into the heap, so it is not feasible to fix for sure the size of the graphdb process. According to ontotext a rule of thumb is that the heap is 2/3rd of the host.

I would dedicate 1G for the host.

Commodity services and front ends are frugal but I think it is not wise to let a docker instance run on host with less than 500m of ram. app2, app1, traefik, redis with 500m each make 2G.

That makes 3G and leaves the rest of the server’s RAM for db, iiif and api.
Let’s assume that sipi is light and should be able to deal with at minimum 1G. There are 4G used.

In my case, the server is 16G, that I split in 4.5G for api (4G in java heap, 500M for the rest) and 7.5G for db which makes 5G for GDB_HEAP_SIZE.

Does someone has a wiser plan?

For running the tests on Github CI, we have defined GDB_HEAP_SIZE to 5G which is not always enough. We have intermittent test failures, because of memory problems.

In production, we run with GDB_HEAP_SIZE of 8G which never gave us any problems thus far. I would in your case try with 8G.

Also, the memory configuration in production for knora-api is 1G (see docker-compose.yml in ops-deploy).

Thanks @subotic!

right now we also have GDB_HEAP_SIZE set to 8G, without the 1G limit on the api and docker stats currently gives this:

CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
ab0aaa70171b        db                  0.06%               8.618GiB / 15.51GiB   55.55%              1.41GB / 372MB      1.3GB / 950kB       41
47a1e97ef53b        api                 0.37%               1.561GiB / 15.51GiB   10.06%              569MB / 205MB       316MB / 0B          41
78a64b6867c3        app2                0.00%               1.605MiB / 15.51GiB   0.01%               171kB / 7.66MB      24.4MB / 24.6kB     3
8967fc898ff9        iiif                0.00%               510.1MiB / 15.51GiB   3.21%               1.96GB / 2.59GB     201MB / 0B          2
08ef24bf1aad        app1                0.26%               264.4MiB / 15.51GiB   1.66%               6.46MB / 81.6MB     170MB / 0B          31
9803a277b65c        redis               0.09%               2.934MiB / 15.51GiB   0.02%               5.99MB / 456MB      14.6MB / 0B         4
e19004fd4239        traefik             0.01%               27.62MiB / 15.51GiB   0.17%               4.79GB / 4.82GB     180MB / 0B          15

So I thought the api would benefit from more room than 1G. Being short will trigger GC more often and result in slow downs. I didn’t monitor the GC stats though, I probably should.

So I can rebalance 2G from api to db, set 8G to GDB_HEAP_SIZE and ask our IT department to higher our server ram to 18.5G (I’ll ask for 20 to make it even).

and I could then set the containers ram limits to:

  • app2, app1, traefik, redis: 2G
  • sipi: 1G
  • db (heap 8): 12G
  • api: 2.5
    (1G is reserved for the host)