Thank you. The reports feature is super cool! Greetings to the team. One of the best features for educational use!
Interesting. Will probably only matter for very small experiments or experiments, where validation is run very infrequently.
Ok. I just wanted to make sure I have configured my agent properly. Just want to make sure I have to set it on all agents.
When you say it is an SDK parameter this means that I only have to specify it on the computer where I start the task from, right? So an clearml-agent would read this parameter from the task itself.
Thank you very much, good to know!
For me this does not work (at least with nested tqdm bars, did not try single ones yet).
Here is how my start_carla .py task looks like currently:
` import os
import subprocess
from time import sleep
from clearml import Task
from clearml.config import running_remotely
def create_task(node):
task = Task.create(
project_name="examples",
task_name="start-carla",
repo="myrepo",
branch="carla-clearml-integration",
script="src/start_carla_task.py",
working_directory="src",
packages=["clearml"],
add_task_init_call=...
But it is not possible to aggregate scalars, right? Like taking the mean, median or max of the scalars of multiple experiments.
I have no idea myself, but what the serverfault thread says about man-in-the-middle makes sense. However this also prohibits an automatic solution except for a shared known_hosts file I guess.
Is there a clearml.conf for this agent somewhere?
Is there a simple way to get the response of the MinIO instance? Then I can verify whether it is the MinIO instance or my client
I will try again tomorrow. It s getting late! Thank you for helping so far!
Thank you. I am still having the issue. I verified that output_uri
of Task.init works and also clearml-data
with MinIO storage works, but the logger still throws errors
No. Here is a better example. I have two types of workstations: Type X can execute tasks of type A and B. Type Y can execute tasks of type B. This could be the case if type X workstations have for example more VRAM, newer drivers, etc...
I have two queues. Queue A and Queue B. I submit tasks of type A to queue A and tasks of type B to queue B.
Here is what can happen:
Enqueue the first task of type B. Workstations of type X will run this task. Enqueue the second task of type A. Workstation ...
Nvm, that does not seem to be a problem. I added a part to the logs in the post above. It shows that some packages are found from conda.
Maybe the difference is that I am using pipnow and I used to use conda! The NVIDIA PyTorch container uses conda. Could that be a reason?
Is there a way to specify this on a per task basis? I am running clearml-agent in docker mode btw.
I am getting permission errors when I try to use the clearml-agent with docker containers. The .ssh is mounted, but the owner is my local user, so the docker containers root does not seem to have the correct permissions.
What exactly do you mean by docker run permissions?
clearml will register conda packages that cannot be installed if clearml-agent is configured to use pip. So although it is nice that a complete package list is tracked, it makes it cumbersome to rerun the experiment.
Is there a way for me to configure/add the run arguments for the docker run
call?
For example I get the following error if I simply clone and rerun:ERROR: Could not find a version that satisfies the requirement ruamel_yaml_conda>=0.11.14 (from conda==4.10.1->-r /tmp/cached-reqs6wtc73be.txt (line 28)) (from versions: none) ERROR: No matching distribution found for ruamel_yaml_conda>=0.11.14 (from conda==4.10.1->-r /tmp/cached-reqs6wtc73be.txt (line 28))
If you think the explanation takes too much time, no worries! I do not want to waste your time on my confusion 😄