Reputation
Badges 1
606 × Eureka!@<1576381444509405184:profile|ManiacalLizard2> Yes, exactly. I just didn't know how, but now it is all working 🙂
And yes, I have multiple credentials in the clearml.conf of the agents. It's not a good solution, but since I am currently limited to the free version of ClearML, it is the best I could do.
Thanks a lot, now I think I understand.
Debug samples can only be controlled via api.file_server (or programatically)
Could you guide me how to approach this programmatically? Can I implement my own storage adapter for debug samples with ClearML interfaces or am I on my own?
Depends on how you start the task afaik. I think clearml-task uses requirements.txt
by default, but otherwise clearml will parse your files dependencies or if you changed in clearml.conf it will use your conda/pip environment to generate the requirements.
Setting the api.files_server:
s3://myhost:9000/clearml in clearml.conf
works!
@<1576381444509405184:profile|ManiacalLizard2> Thank you, but afaik this only works locally and not if you run your task on a clearml-agent!
When I add the file the to repo it works fine just like you said.
Yes, I am also talking about agents on different machines. I had two agents on the server machine, which also seem to have been killed. The ones on different machines kept working until 1 or 2 minutes after the clearml-server restarted.
Can you explain what you meant by entropy point file? In a new git repository my code works fine.
Thank you. I am still having the issue. I verified that output_uri
of Task.init works and also clearml-data
with MinIO storage works, but the logger still throws errors
These are the errors I get if I use file_servers without a bucket ( s3://my_minio_instance:9000 )
2022-11-16 17:13:28,852 - clearml.storage - ERROR - Failed creating storage object
Reason: Missing key and secret for S3 storage access (
) 2022-11-16 17:13:28,853 - clearml.metrics - WARNING - Failed uploading to
('NoneType' object has no attribute 'upload_from_stream') 2022-11-16 17:13:28,854 - clearml.storage - ERROR - Failed creating storage object
` Reason: Missing key...
Okay, this seems to work fine.
SweetBadger76 I am using the Cleanup Service
Nvm. I forgot to start my agent with --docker
. So here comes my follow up question: It seems like there is no way to define that a Task requires docker support from an agent, right?
With clearml==1.4.1 it works, but with the current version it aborts. Here is a log with latest clearml
Well, after restarting the agent (to set it into --detached more) it set the cleanup_task.py into service mode, but my monitoring tasks are just executed on the agent itself (no new service clearml-agent is started) and then it is aborted right after starting.
Here is how my start_carla .py task looks like currently:
` import os
import subprocess
from time import sleep
from clearml import Task
from clearml.config import running_remotely
def create_task(node):
task = Task.create(
project_name="examples",
task_name="start-carla",
repo="myrepo",
branch="carla-clearml-integration",
script="src/start_carla_task.py",
working_directory="src",
packages=["clearml"],
add_task_init_call=...
@<1523701205467926528:profile|AgitatedDove14> Thank you very much for your guidance. Setting these manually works for me!
` ocker-compose ps
Name Command State Ports
clearml-agent-services /usr/agent/entrypoint.sh Restarting
clearml-apiserver /opt/clearml/wrapper.sh ap ... Up 0.0.0.0:8008->8008/tcp, 8080/tcp, 8081/tcp ...
Mhhm, good hint! Unfortunetly I can see nowhere in logs when the server creates a delete request
I usually also experience no problems with restarting the clearml-server. It seems like it has to do with the OOM (or whatever issue I have).
It didn't revert. Just one of my colleagues that I wanted to introduce to clearml put his clearml.conf in the wrong directory and pushed his experiments to the public server.
So I do not blame clearml for this mistake, but generally designing the system to be fail-safe is better than hope that everything is used like it has been designed 🙂
Wouldn't it be enough to just require a call to clearml-init
and throw an error when running without clearml.conf
which tells the user to run clearml-init first?
Okay, I found something out: When I use docker image ubuntu:22.04
it does not spin up a service agent and aborts the task. When I used python:latest
everything works fine!