Knora releases and system upgrades

Dear all,

Here is user experience feed-back of our upgrade from knora api v6 to v8.

I don’t know how much effort should be dedicated to this because we are the only one of such users (aren’t we?), and specifically, nobody else will ever make this exercise of upgrading a production set-up from v6. I was even sceptical about the usefulness of writing this, but there might be some of the issues that will hit back at any upgrade.

clone

Not knowing how long it takes and what will be necessary to fix problems, I don’t only back-up the data, but execute the upgrades on a clone. It is the only reasonable way, imho. So we might write it in the documentation.

auto-update starts from v7

The earlier automatic upgrade plugins is pr1307 which is in v8.0.0, so the knora stack should first be brought to v7.

doc lag

The page Release Notes/Migration Notes should be deleted; it is outdated and having found such a page is misleading because one will not look for more as the actual information that is on Deploying Knora/Updating Repositories When Upgrading Knora.
The two pages should be merged.

Also, it would be nice to have the doc for v8.1.0 (knora.org is on v8.0.0), alright, there was no v8.1.0 official tag, anyway, v9.0.0 is out.

v7 upgrade

Some passwords were lost during the run of the upgrade scripts.

Strict consistency checker

Consistency checker is made more strict and existing data might not comply anymore, this kind of problem won’t be fixed by a script (which is not there to invent data) but a checker would be helpful.
One might object that the test is there: when we import and when we start knora.

graphs segregation

Importing a big trig file is long, when it fails for a couple of lines it is frustrating to start over.
When knora fails on a graph, it stops, so on problematic project can stop the whole set-up and all of the projects.

v8 upgrade

The script ran fine until uploading the final trig file to graphdb, it failed on a MemoryError.
Uploading the file directly worked though.

code and architecture upgrade

v8 saw some other changes either in the behaviour of the process or in the way they interact (sipi needs a test.html, knora checks on sipi, there is a redis cache that is not optional).
Nothing bad but discovered on the way as a trial and error process.
To avoid people (just me for now) feeling miserable, we should add a section to the documentation Deploying Knora describing a recommended way to set-up the knora ecosystem and how to upgrade from one version to the next one.

Again, it doesn’t make sense to catch back the history as nobody else than us (Basel and Lausanne) installed it so far. And it might not make sense if we end-up all on the same infrastructure on switch.
But having a section “Updating Repositories When Upgrading Knora” without another one saying “How to upgrade a Knora set-up” lacks some legitimacy.

Importing a big trig file is long, when it fails for a couple of lines it is frustrating to start over.

The update script used to upload one graph at a time, but we changed it to upload the whole repository in one transaction, because there can be dependencies between graphs (which would cause consistency errors if the dependencies are not met), and there is no simple way to automatically determine what those dependencies are.

Could you make a test case and open an issue for this?

Exactly. What would a separate consistency checker do that Knora doesn’t already do when it starts?

a hell of a lot, now the workflow is:

  • run the upgrade script (which takes a long time because its for all of the projects)
  • start knora, get errors and fix them (one at a time)

Now, for a production site update, I can not do that live, so I test it first and the workflow is like that:

  • set-up a working upgrade procedure (data, software, whatever is needed)
    • clone prod
    • run upgrade on clone (knora, errors, fixes, iterate)
  • fix on prod
  • rerun the upgrade procedure defined in the first step and define an amended upgrade procedure
  • apply the amended upgrade procedure on prod

Where it could be:

  • check prod data
  • fix prod data
  • set-up upgrade procedure
  • apply upgrade procedure

=> the upgrade scripts are run twice instead of three times

But as the data changes, I don’t think such consistency checker before data change is even possible.

(by the way, now you know my definition of hell: running upgrade procedure three times instead of two :wink: sorry about that, I am truly grateful for the existence of your upgrade scripts)

About the pyhton memory error:

I didn’t really check if there were possibilities to set data limits to python scripts but the usual SO answer is rather to chunk the problem than to set a variable to a bigger limit

But as long as we have the trig file, it is fine.

About lost passwords:

I’ll try to

  • run the upgrade script (which takes a long time because its for all of the projects)
  • start knora, get errors and fix them (one at a time)

If the upgrade script is working properly, you shouldn’t get any errors when you start Knora. If you do, it means that there’s a bug in the upgrade script.

  • check prod data
  • fix prod data

I don’t understand. You just said you don’t want to do this on the live server, you want to make a clone. So to check for errors, you can start Knora using the cloned repository. So I think Knora is your checker.

I call it an upgrade procedure because it is a compounded set of actions, it is more than running the data repository scripts, but it is also checking the configs, seeing what has change, testing side effects like sipi moving images from directories or installing redis.
It takes time but it is then replicable.

So I let the prod server up and running and when the procedure is ready, I replicate it on the prod server with a predictable and acceptable downtime.

So I let the prod server up and running and when the procedure is ready, I replicate it on the prod server with a predictable and acceptable downtime.

That seems totally reasonable to me. So I still don’t understand: why is it a problem to use Knora to check the cloned repository? What would you gain by having a separate checker that would check the production repository directly?

Also, keep in mind that Knora only checks ontologies on startup, not data. It aims to ensure that data is consistent with ontologies by checking update requests, but it expects existing data to be correct.

The upgrade scripts don’t aim to do this either. They just know that when you upgrade from version X to version Y, there are specific things that need to be changed. They don’t try to check anything else.

Writing a tool to check all existing data would be … a big job. And such a checker would probably be very slow. Knora’s consistency rules in GraphDB do most of it, probably much more efficiently than a separate tool could do it.

Also I think that if you really want to be sure that version X of Knora is going to work with your repository, the only way to be 100% certain is to actually run that version of Knora: first on a clone, then on production.

because I could fix the data on prod before starting to set up the upgrade procedure and it would save me one run of the upgrade procedure.

well understood, and I fully agree.

exactly, that’s why I have to run it three times.

discuss topics are not sub-threaded (unlike git) right? or do I do something wrong?

because I could fix the data on prod before starting to set up the upgrade procedure and it would save me one run of the upgrade procedure.

The upgrade script should fix your data for you (in the cloned repository). You shouldn’t have to do it by hand. You shouldn’t even need to know what needs to be fixed. If there are things you have to fix by hand, let us know so we can fix the upgrade script.

discuss topics are not sub-threaded

It looks like each topic is just one thread.

i do have to fix data by hand, not because of your script, but data that is faulty in the first place.
For example, on prod with knora v6, there is a triple like that:

_:node3112 <http://www.knora.org/ontology/salsah-gui#> "3"^^xsd:nonNegativeInteger .

And prod is not bothered about it.

after the upgrade, knora is bothered and tells:

api_1    | _:node3112 <http://www.knora.org/ontology/salsah-gui#> "3"^^xsd:nonNegativeInteger .
api_1    |
api_1    | ================================================================================
api_1    |
api_1    | org.knora.webapi.DataConversionException: Couldn't parse IRI: http://www.knora.org/ontology/salsah-gui#

So I have to fix the data by hand and start over.

What I actually do is fix on prod, fix on clone and start the clone over for checking the next error, otherwise I would even have to run your scripts once more.

OK, I understand. But why is it a problem to do this:

  1. Make the clone
  2. Run the update script on the clone
  3. Start Knora on clone and find out about this error
  4. Fix the error and repeat (3)
  5. When there are no more errors on clone, fix prod.

It seems to me that you will always have to do steps 1-3 in any case, even if there are no such inconsistencies.

there is no problem in doing that, but while doing that the initial state (prod) has changed, so I have to re-test once the upgrade procedure before applying it finally for real with warning to users that we are going into maintenance.

so it is:

  1. set-up upgrade procedure
    1.1. Make the clone
    1.2 Run the update script on the clone
    1.3 Upgrade Knora ecosystem
    1.4 Start the new Knora on clone and find out about this error
    1.5 Fix the error and repeat (1.3)
    1.6 Fix prod
  2. make sure that the said procedure still applies on the updated prod
    2.1. Make the clone
    2.2 Run the update script on the clone
    2.3 Upgrade Knora ecosystem
    2.4 Start the new Knora on clone, if there are errors jump to 1.5
  3. apply the upgrade procedure for real
    3.1 warn users, shut down service, make back-up
    3.2 Run the update script on prod
    3.3 Upgrade Knora ecosystem
    3.4 Start the new Knora on error, roll back to 3.1 back-up, scratch head, and think of 1.3 and 1.4 before jumping to 1

With a checker, it would look like:

  1. fix prod inconsistencies
    0.1 run check
    0.2 fix prod
  2. as described above
  3. skip 2, jump to 3 because we didn’t touch prod while elaborating 1
  4. as described above

next time I should try what @subotic suggests in https://github.com/dasch-swiss/knora-api/issues/1511#issuecomment-552202450

which would translate into, turn off consistency checker, update the new KnoraRule, enable it and fix in errors, then run the upgrade script.

That would nearly bring the feature of a consistency checker script discussed earlier in this thread.