Load ontologies and timeout

Dear all,

when knora starts and loads ontologies, the response is consistently timing out after 10s:

# api asks for ontologies
api_1_8e5b0eafd583 | [2019-10-16 05:28:22,272] INFO  - ApplicationStateActor - appStateChanged - to state: LoadingOntologies
# api fails... only 10s after last call
api_1_8e5b0eafd583 | [2019-10-16 05:28:32,689] ERROR - OneForOneStrategy - Failed to connect to triplestore
api_1_8e5b0eafd583 | org.knora.webapi.TriplestoreConnectionException: Failed to connect to triplestore

Despite the config saying:
triplestore.query-timeout = 30 seconds

I tried looking at loadOntologies on OntologyResponderV2.scala but could not figure out why.

It is quite problematic because it means that our production environment can not be restarted.

The trick to do so, as improbable and not logical as it may seem, is:

  • export the triple-store to a trig file
  • thin out the data graphs out of the trig file and reserve for later use
  • wipe out the tripplestore
  • load the lightened trig file into tripple store
  • start knora (success, relief)
  • load the data
  • update lucene index

which is a bit nerve wrecking.

Thanks, but if you would like us to help you with this, you need to provide a way for us to reproduce the issue.

The best thing would be to put your TriG file somewhere we can download it.

Thanks Ben, but I don’t think that the trig file will help as it is not systematically reproducible (it happens on prod, not on test or on my laptop).

But when it happens though, what I can observe is that there is a consistent 10s time-out.

Therefore, I wanted to know if there was a logical explanation by scrutinizing the code itself, or by looking at the log and telling me that the two end-points of the 10s period that I look at, are meaningless because they are not linked by any code logic (and thus no timeout). So some sort of forensic rather than live investigation.

Would it be possible?

In the log, when the org.knora.webapi.TriplestoreConnectionException is printed, there should be a stack trace. That would show you where in the code the exception is being thrown.

I would try to find out which SPARQL query is taking so long.

OntologyResponderV2.loadOntologies first does a SELECT query to find out which named graphs have ontologies in them. The SPARQL template is in src/main/twirl/queries/sparql/v2/getAllOntologyMetadata.scala.txt.

Then for each of those graphs, it does a CONSTRUCT query to get the entire contents of the graph.
The template is in src/main/twirl/queries/sparql/v2/getOntologyGraph.scala.txt.

You can print out each query in HttpTriplestoreConnector before it is sent to the triplestore.

For the SELECT query, you can uncomment the first line of HttpTriplestoreConnector.sparqlHttpSelect:

        // println(sparql)

For the CONSTRUCT queries, you can insert the same println at the beginning of HttpTriplestoreConnector.sparqlHttpExtendedConstruct.

Then you should see each query printed on the console. If a query times out, you can copy and paste it from the console into the GraphDB workbench and see if it takes just as long that way.

I don’t know why your 30-second triplestore.query-timeout setting isn’t being respected. I’ll look into that.

Also, how long does Knora take to load the ontologies with the same TriG file on your laptop?

Thank you for answering Ben :wink:

here are the first answers.

the stack I saved doesn’t show much:

api_1_8e5b0eafd583 | [2019-10-16 05:18:09,274] ERROR - Dispatcher - Failed to connect to triplestore
api_1_8e5b0eafd583 | org.knora.webapi.TriplestoreConnectionException: Failed to connect to triplestore
api_1_8e5b0eafd583 |    at org.knora.webapi.TriplestoreConnectionException$.apply(Exceptions.scala:249)
api_1_8e5b0eafd583 |    at org.knora.webapi.store.triplestore.http.HttpTriplestoreConnector$$anonfun$getTriplestoreHttpResponse$2.applyOrElse(HttpTriplestoreConnector.scala:608)
api_1_8e5b0eafd583 |    at org.knora.webapi.store.triplestore.http.HttpTriplestoreConnector$$anonfun$getTriplestoreHttpResponse$2.applyOrElse(HttpTriplestoreConnector.scala:606)
api_1_8e5b0eafd583 |    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:34)
api_1_8e5b0eafd583 |    at scala.util.Failure.recover(Try.scala:230)
api_1_8e5b0eafd583 |    at org.knora.webapi.store.triplestore.http.HttpTriplestoreConnector.getTriplestoreHttpResponse(HttpTriplestoreConnector.scala:606)
api_1_8e5b0eafd583 |    at org.knora.webapi.store.triplestore.http.HttpTriplestoreConnector.org$knora$webapi$store$triplestore$http$HttpTriplestoreConnector$$sparqlHttpExtendedConstruct(HttpTriplestoreConnector.scala:342)
api_1_8e5b0eafd583 |    at org.knora.webapi.store.triplestore.http.HttpTriplestoreConnector$$anonfun$receive$1.applyOrElse(HttpTriplestoreConnector.scala:121)
api_1_8e5b0eafd583 |    at akka.actor.Actor.aroundReceive(Actor.scala:517)
api_1_8e5b0eafd583 |    at akka.actor.Actor.aroundReceive$(Actor.scala:515)
api_1_8e5b0eafd583 |    at org.knora.webapi.store.triplestore.http.HttpTriplestoreConnector.aroundReceive(HttpTriplestoreConnector.scala:61)
api_1_8e5b0eafd583 |    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:588)
api_1_8e5b0eafd583 |    at akka.actor.ActorCell.invoke(ActorCell.scala:557)
api_1_8e5b0eafd583 |    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
api_1_8e5b0eafd583 |    at akka.dispatch.Mailbox.run(Mailbox.scala:225)
api_1_8e5b0eafd583 |    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
api_1_8e5b0eafd583 |    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
api_1_8e5b0eafd583 |    at java.base/java.lang.Thread.run(Thread.java:844)
api_1_8e5b0eafd583 | Caused by: java.net.SocketTimeoutException: Read timed out

Here is the log with a bit larger scope, we see that graphdb continues its duties and finishes happily, it is really knora that does not manage to wait for it:

# api asks for ontologies
api_1_8e5b0eafd583 | [2019-10-16 05:28:22,272] INFO  - ApplicationStateActor - appStateChanged - to state: LoadingOntologies

# db works
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:22,396 [repositories/knora-prod | c.o.f.s.RepositoryController] POST query 1636182412
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:22,404 [repositories/knora-prod | c.o.c.AbstractParameter] Configured parameter 'deduplicate.construct' to default value 'true'
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:22,404 [repositories/knora-prod | c.o.c.AbstractParameter] Configured parameter 'reuse.vars.in.subselects' to default value 'false'
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:22,423 [repositories/knora-prod | c.o.g.s.TupleQueryResultView] Request for query 1636182412 is finished
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:22,662 [repositories/knora-prod | c.o.f.s.RepositoryController] POST query -1254606180
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:22,771 [repositories/knora-prod | c.o.f.s.RepositoryController] POST query 18232400
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:22,795 [repositories/knora-prod | c.o.f.s.RepositoryController] POST query 1165864634
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:22,837 [repositories/knora-prod | c.o.f.s.RepositoryController] POST query 1572943862
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:22,848 [repositories/knora-prod | c.o.f.s.RepositoryController] POST query 1394326240
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:22,849 [repositories/knora-prod | c.o.f.s.RepositoryController] POST query 536716006
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:22,858 [repositories/knora-prod | c.o.f.s.RepositoryController] POST query -1561808608
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:22,874 [repositories/knora-prod | c.o.f.s.RepositoryController] POST query -497482518
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:22,896 [repositories/knora-prod | c.o.f.s.RepositoryController] POST query 893209034
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:22,927 [repositories/knora-prod | c.o.f.s.RepositoryController] POST query 1381733697

# api fails... only 10s after last call
api_1_8e5b0eafd583 | [2019-10-16 05:28:32,689] ERROR - OneForOneStrategy - Failed to connect to triplestore
api_1_8e5b0eafd583 | org.knora.webapi.TriplestoreConnectionException: Failed to connect to triplestore

# db finishes a long query
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:33,074 [repositories/knora-prod | c.o.f.s.RepositoryController] POST query 742947293
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:33,288 [repositories/knora-prod | c.o.f.s.RepositoryController] POST query 496334016
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:33,317 [repositories/knora-prod | c.o.f.s.RepositoryController] POST query 1311641049
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:35,780 [repositories/knora-prod | c.o.g.s.GraphQueryResultView] Request for query 893209034 is finished
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:35,923 [repositories/knora-prod | c.o.g.s.GraphQueryResultView] Request for query 1381733697 is finished
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:36,220 [repositories/knora-prod | c.o.g.s.GraphQueryResultView] Request for query 536716006 is finished
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:36,743 [repositories/knora-prod | c.o.g.s.GraphQueryResultView] Request for query 18232400 is finished
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:36,809 [repositories/knora-prod | c.o.g.s.GraphQueryResultView] Request for query -1254606180 is finished
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:36,895 [repositories/knora-prod | c.o.g.s.GraphQueryResultView] Request for query -1561808608 is finished
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:40,458 [repositories/knora-prod | c.o.g.s.GraphQueryResultView] Request for query 742947293 is finished
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:42,082 [repositories/knora-prod | c.o.g.s.GraphQueryResultView] Request for query 1311641049 is finished
db_1_6635d7366e63 | [INFO ] 2019-10-16 05:28:42,173 [repositories/knora-prod | c.o.g.s.GraphQueryResultView] Request for query 496334016 is finished

# 20s after initial call, api complains again about not being able to connect to db
api_1_8e5b0eafd583 | [2019-10-16 05:28:42,205] ERROR - Dispatcher - Failed to connect to triplestore
api_1_8e5b0eafd583 | org.knora.webapi.TriplestoreConnectionException: Failed to connect to triplestore

The query that was executing is:

CONSTRUCT { ?s ?p ?o . }

FROM <http://www.ontotext.com/explicit>

WHERE {
    BIND(IRI("http://www.knora.org/ontology/0114/elites-cio") AS ?ontologyGraph)

    GRAPH ?ontologyGraph { ?s ?p ?o . }
}

it was amongst many of its kind requesting other projects’ ontologies.

Outside of the start-up sequence, it executes within seconds.

I will try again to reproduce and maybe debug.

Also, how long does Knora take to load the ontologies with the same TriG file on your laptop?

I am not sure yet

Thanks for these clues.

the stack I saved doesn’t show much

The more interesting part would be after this line:

Caused by: java.net.SocketTimeoutException: Read timed out

That would be the stack trace for the cause of the exception, i.e. the original exception (a java.net.SocketTimeoutException) which has been wrapped in an org.knora.webapi.TriplestoreConnectionException. It could be useful to know which line of code it was thrown from.

Looking at your log, it seems that this happens when Knora has already sent a number of concurrent requests to the triplestore. We have a thread pool with a limited size for HttpTriplestoreConnector, and I wonder if the problem is that the Knora thread that wants to do this query has to wait for a number of other requests to finish.

As an experiment, you could increase the value of akka.actor.deployment/applicationManager/storeManager/triplestoreManager/httpTriplestoreRouter.core-pool-size-max and see if that makes a difference.

Outside of the start-up sequence, it executes within seconds.

I think it should be faster than that. I wonder if we could make it faster by using the SPARQL Graph Store Protocol instead of of a CONSTRUCT query.

I haven’t had time yet to look into the timeout setting issue, and I’ll be on holiday next week, but I’ll be able to look at this again when I get back to work on 4 November.

It could also be that GraphDB has a limited number of threads to work with, and its thread pool is too busy with other concurrent requests.

My hypothesis is something like this: when you have a lot of ontologies, Knora makes a lot of concurrent requests to load them, and some of those requests then have to wait in a queue to be run in a thread pool (whether it’s Knora’s thread pool or GraphDB’s). When there’s also a lot of data in the triplestore, this slows down the ontology requests enough so that it takes too long for the queued requests to execute before the timeout.

Another idea would be to cache the ontologies in Redis.

… and do fewer concurrent requests when loading the ontologies from GraphDB.

It would be great if you could reproduce this on your laptop (so then I could reproduce it, too).

Thanks Ben, in the meantime will replicate on my laptop (it hangs now on importing the prod data set). And gather more materials and report the meaningful stack.

The threadpool size rising a lot of hope as it could by an easy and immediate quick fix!
I see it is 10, and we have 14 projects on prod (some I should clean-up but the 14 projects is still a reasonable case) and there is one concurrent request for each of them. I don’t understand though why having data there would slow the process, but that’s not important for now.
We see graphdb finishing to answer all the queries fine.

Cache ontologies in Redis: warm boot optimization, it reminds me of a previous life.

Enjoy your hollidays :wink:

Trying to load our prod on my laptop (16G, 8 cores, SSD), graphdb dies, loadRdf keeps on running without finishing, preload ends but the next restart of graphdb complains about:

Caused by: org.eclipse.rdf4j.repository.RepositoryException: org.eclipse.rdf4j.sail.SailException: Preload recovery point detected. Abort initialization!

and lets an unusable graphdb.

I have to say that I am using graphdb 9.0.0 because it is the distribution version that I could download, do you guys have a 8.5 that I could grab?

You should be able to get all versions from here:

http://maven.ontotext.com/#browse/browse:owlim-releases:com%2Fontotext%2Fgraphdb%2Fgraphdb-se%2F8.5.0%2Fgraphdb-se-8.5.0-dist.zip

Just grab the *-dist.zip variant.

cool, thanks @subotic

If I understand correctly, Preload doesn’t do inference, and LoadRDF does. Knora assumes inference is enabled with GraphDB, so I think that means Preload isn’t an option.

yes, maybe I was too optimistic thinking that graphdb would fill the gaps on restart

I have some more information regarding dropping graphs.

With the help of Ontotext, I am now able to deactivate and activate KnoraRules on the fly.

With deactivated KnoraRules, loading a ~700 MB repository takes 1.5 minutes, and dropping a named Graph (beol-data) around 2 minutes.

With activated KnoraRules, loading a ~700 MB repository takes around 18 minutes, and dropping a named Graph longer than 12 h (exactly how long I don’t know, since I newer waited longer than 12h).

Also, I think that how fast GraphDB is executing these operations with activated rules is highly dependent on the number of cores the license covers and how fast the CPU is. So I guess, having a 16 core license and server, should speedup these operations considerably. I don’t think that this is the way to go, because this solution doesn’t scale.

The way to go is probably to find a safe workflow involving deactivating KnoraRules before certain operations.