Reputation
Badges 1
979 × Eureka!Sorry, its actuallytask.update_requirements(["."])
/data/shared/miniconda3/bin/python /data/shared/miniconda3/bin/clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only
It failed as well
And now that I restarted the server and went back into the project where I initially deleted the archived experiments, some of them are still there - I will leave them alone, too scared to do anything now 😄
It could be yes but the difference between now
and last_report_time
doesn’t match the difference I observe
I can also access these files directly if I enter the url in the browser
I get the following error:
Thanks SuccessfulKoala55 for the answer! One followup question:
When I specify:agent.package_manager.pip_version: '==20.2.3'
in the trains.conf, I get:trains_agent: ERROR: Failed parsing /home/machine1/trains.conf (ParseException): Expected end of text, found '=' (at char 326), (line:7, col:37)
Note: Could be related to https://github.com/allegroai/clearml/issues/790 , not sure
Done! Also I tried to use git cache ( https://git-scm.com/docs/git-credential-cache ) as a workaround (hoping that the first time it clones the experiment repo, it caches the creds for the next times, but I then get a different error: fatal: unable to find a suitable socket path; use --socket
)
Yes, that's what it looks like. Somehow when you clone the experiment repo, you correctly set the git creds in the url, but when the dependencies are installed, the git creds are not taken in account
And after the update, the loss graph appears
Thanks! I would like to use this opportunity to split the indices into multiple shards, as explained here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-split-index.html#indices-split-index
Same, it also returns a ProxyDictPostWrite
, which is not supported by OmegaConf.create
you mean to run it on the CI machine ?
yes
That should not happen, no? Maybe there is a bug that needs fixing on clearml-agent ?
It just to test that the logic being executed in if not Task.running_locally()
is correct
Yea, the config is not appearing in the webUI anymore with this method 😞
I am not using hydra, I am reading the conf with:config_dict = read_yaml(conf_yaml_path) config = OmegaConf.create(task.connect_configuration(config_dict))
Actually I think I am approaching the problem from the wrong angle
Yes, I would like to update all references to the old bucket unfortunately… I think I’ll simply delete the old s3 bucket, wait or his name to be available again and recreate it where on the other aws account and move the data there. This way I don’t have to mess with clearml data - I am afraid to do something wrong and loose data
sorry, the clearml-session. The error is the one I shared at the beginning of this thread
Thanks! I will investigate further, I am thinking that the AWS instance might have been stuck for an unknown reason (becoming unhealthy)
Interesting - I can reproduce easily
They are, but this doesn’t work - I guess it’s because temp IAM accesses have an extra token, that should be passed as well, but there is no such option on the web UI, right?
In the comparison the problem will be the same, right? If I choose last/min/max values, it won’t tell me the corresponding values for others metrics. I could switch to graphs, group by metric and look manually for the corresponding values, but that becomes quickly cumbersome as the number of experiments compared grow
Sorry, what I meant is that it is not documented anywhere that the agent should run in docker mode, hence my confusion