Setting the api.files_server:
s3://myhost:9000/clearml in clearml.conf
works!
@<1576381444509405184:profile|ManiacalLizard2> Thank you, but afaik this only works locally and not if you run your task on a clearml-agent!
When I add the file the to repo it works fine just like you said.
Yes, I am also talking about agents on different machines. I had two agents on the server machine, which also seem to have been killed. The ones on different machines kept working until 1 or 2 minutes after the clearml-server restarted.
Can you explain what you meant by entropy point file? In a new git repository my code works fine.
Thank you. I am still having the issue. I verified that output_uri
of Task.init works and also clearml-data
with MinIO storage works, but the logger still throws errors
These are the errors I get if I use file_servers without a bucket ( s3://my_minio_instance:9000 )
2022-11-16 17:13:28,852 - clearml.storage - ERROR - Failed creating storage object
Reason: Missing key and secret for S3 storage access (
) 2022-11-16 17:13:28,853 - clearml.metrics - WARNING - Failed uploading to
('NoneType' object has no attribute 'upload_from_stream') 2022-11-16 17:13:28,854 - clearml.storage - ERROR - Failed creating storage object
` Reason: Missing key...
Okay, this seems to work fine.
SweetBadger76 I am using the Cleanup Service
With clearml==1.4.1 it works, but with the current version it aborts. Here is a log with latest clearml
Well, after restarting the agent (to set it into --detached more) it set the cleanup_task.py into service mode, but my monitoring tasks are just executed on the agent itself (no new service clearml-agent is started) and then it is aborted right after starting.
Here is how my start_carla .py task looks like currently:
` import os
import subprocess
from time import sleep
from clearml import Task
from clearml.config import running_remotely
def create_task(node):
task = Task.create(
project_name="examples",
task_name="start-carla",
repo="myrepo",
branch="carla-clearml-integration",
script="src/start_carla_task.py",
working_directory="src",
packages=["clearml"],
add_task_init_call=...
@<1523701205467926528:profile|AgitatedDove14> Thank you very much for your guidance. Setting these manually works for me!
` ocker-compose ps
Name Command State Ports
clearml-agent-services /usr/agent/entrypoint.sh Restarting
clearml-apiserver /opt/clearml/wrapper.sh ap ... Up 0.0.0.0:8008->8008/tcp, 8080/tcp, 8081/tcp ...
I usually also experience no problems with restarting the clearml-server. It seems like it has to do with the OOM (or whatever issue I have).
It didn't revert. Just one of my colleagues that I wanted to introduce to clearml put his clearml.conf in the wrong directory and pushed his experiments to the public server.
So I do not blame clearml for this mistake, but generally designing the system to be fail-safe is better than hope that everything is used like it has been designed 🙂
Wouldn't it be enough to just require a call to clearml-init
and throw an error when running without clearml.conf
which tells the user to run clearml-init first?
Okay, I found something out: When I use docker image ubuntu:22.04
it does not spin up a service agent and aborts the task. When I used python:latest
everything works fine!
@<1576381444509405184:profile|ManiacalLizard2> Yea, that makes sense. However, my problem is that I do not want to set it on the remote clearml-agent, since every use may have a different storage. E.g. one user pushes to Azure, while another one pushes to S3
Okay, great! I just want to run the cleanup services, however I am running into ssh issues so I wanted to restart it to try to debug.
Thank you very much. I am going to try that.
Is there a clearml.conf for this agent somewhere?