Reputation
Badges 1
611 × Eureka!@<1576381444509405184:profile|ManiacalLizard2> Just so I understand correctly:
You are saying that in your local, user-specific, clearml.conf you set the api.files_server , but in your remote, clearml-agent, clearml.conf you left it empty?
Maybe this opens up another question, which is more about how clearml-agent is supposed to be used. The "pure" way would be to make the docker image provide everything and clearml-agent should do not setup at all.
What I currently do instead is letting the docker image provide all system dependencies and let clearml-agent setup all the python dependencies. This allows me to reuse a docker image for more different experiments. However, then it would make sense to have as many configs as possib...
So clearml 1.0.1 clearml-agent 1.0.0 and clearml-server from master
I am currently on the Open Source version, so no Vault. The environment variables are not meant to used on a per task basis right?
I am referring to the UI. The default cleanup service should work with S3 with a correctly configured clearml service agent if I understand the workings correctly.
I ll add creating an issue to my todo list
Thanks for answering. I don't quite get your explanation. You mean if I have 100 experiments and I start up another one (experiment "101"), then experiment "0" logs will get replaced?
I think I still don't get how clearml is supposed to work/be used. Why wouldn't the following work currently?
Example:
` task = Task.init(...)
if not running_remotely:
task_dict = task.export_task()
requirements = task_dict["script"]["requirements"]["pip"].splitlines()
requirement_torch = [r for r in requirements if r.startswith("torch==")]
requirements.remove(requirement_torch[0])
requirements.append("torch >= 1.8.1")
task_dict["script"]["requirements"]["pip"] = "\n"....
If I understood correctly, if I tried to print(os.environ["MUJOCO_GL"]) after the clearml Task is created, this should be set?
Good to know the --debug flag exists in master! 😄
Hi @<1523701087100473344:profile|SuccessfulKoala55> Thank you very much.
Is there some way to verify the server uses the correct configuration files? (E.g. see it in the logs/web ui). I Just tried it does not work.
At least I can see the async_delete service complains about a missing secret, so I can start debugging there. I am using the same config as for my agents, but somehow for async_delete it does not work...
Thank you very much!
I just tested with remote_execution and the problem seems to exist there, too. It is just that when the task switches from local to remote execution (i.e. exists the local script) the local scalars will appear, but no scalar of remote execution will show up. So also the iteration will not update. However, at least for remote execution I get live console output.
@<1523701435869433856:profile|SmugDolphin23> Good catch. I have a good but unsatisfying message for you guys: I restarted the whole machine (server and agent) and now it works fine ...
I do not have a global cuda install on this machine. Everything except for the driver is installed via conda.
How can I see that?
I have venv_update.enabled: true and detect_with_conda_freeze: true
Or maybe even better: How can I get all the information of the "INFO" page in the WebUI of a task?
I mean that locally I was able to install the correct version without a problem.
I forgot to add this:
` Here is my error:
Traceback (most recent call last):
File "src/run_gym.py", line 25, in <module>
print(os.environ["MUJOCO_GL"])
File "/home/tim/.clearml/venvs-builds/3.7/lib/python3.7/os.py", line 681, in getitem
raise KeyError(key) from None
KeyError: 'MUJOCO_GL' `
This is at the top of my script.
Thank you. The reports feature is super cool! Greetings to the team. One of the best features for educational use!
Thank you very much. I am going to try that.
Can you give me an example how I can add a second fileserver?
Perfect, will try it. fyi: The conda_channels that I used are from clearml-agent init