Reputation
Badges 1
981 × Eureka!I am not using hydra, I am reading the conf with:config_dict = read_yaml(conf_yaml_path) config = OmegaConf.create(task.connect_configuration(config_dict))
So the migration from one server to another + adding new accounts with password worked, thanks for your help!
no, at least not in clearml-server version 1.1.1-135 • 1.1.1 • 2.14
AgitatedDove14 I see that the default sample_frequency_per_sec=2. , but in the UI, I see that there isn’t such resolution (ie. it logs every ~120 iterations, corresponding to ~30 secs.) What is the difference with report_frequency_sec=30. ?
To help you debugging this: in the /dashboard endpoint, all projects were still there, but empty (no experiment inside). No experiments archived as well.
So it seems like it doesn't copy /root/clearml.conf and it doesn't pass the environment variables (CLEARML_API_HOST, CLEARML_API_ACCESS_KEY, CLEARML_API_SECRET_KEY)
Ho wow! is it possible to not specify a remote task? (If i am working with Task.set_offline(True))
Now I am trying to restart the cluster with docker-compose and specifying the last volume, how can I do that?
Looks like its a hurray then 😄 🎉 🍾
So it can be that when restarting the docker-compose, it used another volume, hence the loss of data
Nevertheless there might still be some value in that, because it would allow to reduce the starting time by removing the initial setup of the agent + downloading of the data to the instance - but not as much as I described initially, if instances stopped are bound to the same capacity limitations as new instances launched
nvm, bug might be from my side. I will open an issue if I find any easy reproducible example
So either I specify in the clearml-agent agent.python_binary: python3.8 as you suggested, or I enforce the task locally to run with python3.8 using task.data.script.binary
These images are actually stored there and I can access them via the url shared above (the one written in the pop up message saying that these files could not be deleted)
Usually one or two tags, indeed, task ids are not so convenient, but only because they are not displayed in the page, so I have to go back to another page to check the ID of each experiment. Maybe just showing the ID of each experiment in the SCALAR page would already be great, wdyt?
CostlyOstrich36 , this also happens with clearml-agent 1.1.1 on a aws instance…
Alright SuccessfulKoala55 I was able to make it work by downgrading clearml-agent to 0.17.2
Ok so it seems that the single quote is the reason, using double quotes works
There’s a reason for the ES index max size
Does ClearML enforce a max index size? what typically happens when that limit is reached?
Just tried, still the same issue
What I mean is that I don't need to have cudatoolkit installed in the current conda env, right?
The only thing that changed is the new auth.fixed_users.pass_hashed field, that I don’t have in my config file
The main issue is the task_logger.report_scalar() not reporting the scalars
The clean up service is awesome, but it would require to have another agent running in services mode in the same machine, which I would rather avoid
AgitatedDove14 I finally solved it: The problem was --network='host' should be --network=host
And so in the UI, in workers&queues tab, I see randomly one of the two experiments for the worker that is running both experiments