For example I have a DATA_DIR
environment variable which points to the directory where disk-data is stored
Depending on where the agent is, the value of DATA_DIR
might change
Okay Jake, so that basically means I don't have to touch any server configuration regarding the file-server
on the trains server. It will simply get ignored and all I/O initiated by clients with the right configuration will cover for that?
Oh I get it, that also makes sense with the docs directing this at inference jobs and avoiding GPU - because of the 1-N thing
cluster.routing.allocation.disk.watermark.low:
I manually deleted the allegroai/trains:latest
image, that didn't help either
a machine that had previous installation, but I deleted the /opt/trains
directory beforehand
I followed the upgrading still nothing
not manually I assume that if I deleted the image, and then docker-composed up, and I can see the pull working it should pull the correct one
Where is the docker-compose file? It's not at /opt
(again, I didn't place it anywhere, I'm just using the ami)
Sorry I meant this link
https://azuremarketplace.microsoft.com/en-us/marketplace/apps/apps-4-rent.clearml-on-centos8
I think a good idea is to add to the error message when the clearml agent fails due to import error, a suggestion ot try out with pip freeze
sudo curl
https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o /opt/trains/docker-compose.yml
Can you lend a few a words about how the not-pip freeze mechanism of detecting packages work?
but shouldn't the :lastest
make it redownload the right image?
this is the selection from the column setting menu
But does it disable the agent? or will the tasks still wait for the agent to dequeue?
cool, didn't know about the PAT
One sec I'll paste the relevant pieces of code