
Reputation
Badges 1
19 × Eureka!This is the log of the remote run (the local one is just a bunch of “Sleeping until the next pool instead 1.5 minutes” plus the logs of my schedule function as expected). The version of the sdk is 1.17.1
Do I maybe need to do something more that I wasn’t aware to start the trigger schedule on the service queue? Or is it better at this point to just manually run a script on the machine and do “start()” instead of “start_remotely()”?
The log of the “clearml-agent-services” only contained this:
I see as this can be a problem at least when someone wants to migrate the machine on a new domain/ip. In any case, a warning in the documentation would be useful, as the default deploy shown is for localhost, but would brake as soon as someone tries to access those data from the local network instead
The container is still running and doesn’t show any log entry when I start the trigger scheduler remotely
I just checked and indeed the clearml.conf file that of the user that I used to upload the dataset has indeed set the various servers host as localhost (since for this testing that user was on the same machine as the server). Is this an expected behavior ? I would have thought that such a config only influences the the connection between the cli/ask of the user and the server, but once the data is uploaded it’s duty of the server to provide the right url for whoever is accessing the data
Yes, in fact, if I take the urls of the files that the webserver provides me and I replace the localhost part with that ip I can clearly view the underlying data from my browser
I only used the env variables I mentioned (I also checked inside the docker-compose.yaml and noticed that only CLEARML_HOST_IP hasn’t a default value, so I tried to set only that env variable, but the result didn’t change). I haven’t any other configuration other than apiserver.conf in /opt/clearml/config with the users. I definitely haven’t seen any configuration.json file for now. Ps: the docker-compose.yaml is just the one inside the repo, without any change to it.
I noticed th e problem in the preview section of the dataset files: they cannot be shown because they point to localhost, but, if I click on “open image” and then replace localhost with the server ip everything works as expected
:
environment:
WEBSERVER__fileBaseUrl: '"http://192.168.1.83:8081"'
WEBSERVER__useFilesProxy: 'true'
Exactly, I guess that is exactly the problem probably caused by start_remotely() terminating the process accidentally. The server is 2.0.0-613
I don’t thinks it’s possible : the checkpoints are output models, and such can be accessed from the task, but from the model class I can get only the url of the last version (as I expected from using update weights, as it does sound like it replaces the old weights with the new ones). Also, even if I could get the urls of the old checkpoints, how can I delete them from the file server? StorageManager doesn’t seem to have any method to delete remote files
Both the web app and server versions are 2.0.0-613, api is instead 2.31