Reputation
Badges 1
606 × Eureka!Any idea why deletion of artifacts on my second fileserver does not work?
fileserver_datasets: networks: - backend - frontend command: - fileserver container_name: clearml-fileserver-datasets image: allegroai/clearml:latest restart: unless-stopped volumes: - /opt/clearml/logs:/var/log/clearml - /opt/clearml/data/fileserver-datasets:/mnt/fileserver - /opt/clearml/config:/opt/clearml/config ports: - "8082:8081"
ClearML successfu...
One more thing: The cuda_version that clearml finds automatically is wrong.
Locally it works fine.
Let me check again.
I just tried to envrionment setup steps that clearml-agent is doing locally, but with my environment.yml instead of the one that clearml generates.
By host you mean the machine on which the agent is running? How does clearml-agent find the cuda_version?
AnxiousSeal95 Thanks a lot. Seems to be working fine for me. I see the clearml-agent version that pip installs in the docker is now fixed to the host version 🙂 PyTorch Nightly is also installed correctly now!
Well, I guess no hurdles vs. safety is inherently no solvable. I am all for hurdles, if it is clear how to overcome it. And in my opinion referring to clearml-init
is something which makes sense from a developer and a user perspective.
Thank you, good to know!
(btw: the simulator is called carla, not clara :))
Yea, and the script ends with clearml.Task - INFO - Waiting to finish uploads
481.2130692792125 seconds
Done
Yes, I did not change this part of the config.
I am wondering where to put my experiment logic, so that it gets lazily executed and not at task definition time (i.e. in get_task_experiment()
how to get my experiment logic in there without running it)
Wow, thank you very much. And how would I bind my code to task? Should I still use Task.init
and it will just use the file it is called in as entrypoint or should I create a task using Task.create
and specify the script?
Another example on what I would expect:
` ### start_carla.py
def get_task():
task = Task.init(project_name="examples", task_name="start-carla", task_type="application")
# experiment is not run here. The experiment is only run when this is executed as standalone or on a clearml-agent.
return task
def run_experiment(task):
...
This task can also be run as standalone or run by a clearml-agent
if name == "main":
task = get_task()
run_experiment(task)
run_pi...
Also here is how I run my experiments right now, so I can execute them locally and remotely:
` # Initialize ClearML Task
task = (
Task.init(
project_name="examples",
task_name=args.name,
output_uri=True,
)
if track_remote or enqueue
else None
)
# Execute remotly via CLearML
if enqueue is not None and not running_remotely():
if enqueue == "None":
queue_name = None
task.reset()
...
Is there a way to capture uncommited changes with Task.create
just like Task.init
does? Actually, I would like to populate the repo, branch and packages automatically...
@<1576381444509405184:profile|ManiacalLizard2> Maybe you are using the enterprise version with the vault? I suppose the enterprise version is running differently, but I dont have experience with it.
For the open-source version, each clearml-agent is using it's own clearml.conf
So my network seems to be fine. Downloading artifacts from the server to the agents is around 100 MB/s, while uploading from the agent to the server is slow.
An upload of 11GB took around 20 hours which cannot be right. Do you have any idea whether ClearML could have something to do with this slow upload speed? If not I am going to start debugging with the hardware/network.