Reputation
Badges 1
606 × Eureka!I am getting permission errors when I try to use the clearml-agent with docker containers. The .ssh is mounted, but the owner is my local user, so the docker containers root does not seem to have the correct permissions.
Okay, thanks for explaining!
When I add the file the to repo it works fine just like you said.
Yes, I am also talking about agents on different machines. I had two agents on the server machine, which also seem to have been killed. The ones on different machines kept working until 1 or 2 minutes after the clearml-server restarted.
Can you explain what you meant by entropy point file? In a new git repository my code works fine.
Thank you. I am still having the issue. I verified that output_uri
of Task.init works and also clearml-data
with MinIO storage works, but the logger still throws errors
These are the errors I get if I use file_servers without a bucket ( s3://my_minio_instance:9000 )
2022-11-16 17:13:28,852 - clearml.storage - ERROR - Failed creating storage object
Reason: Missing key and secret for S3 storage access (
) 2022-11-16 17:13:28,853 - clearml.metrics - WARNING - Failed uploading to
('NoneType' object has no attribute 'upload_from_stream') 2022-11-16 17:13:28,854 - clearml.storage - ERROR - Failed creating storage object
` Reason: Missing key...
There is no way to create an artifact/model/dataset without a task, right? Just always inherit from the parent task. And if cloned change the user to the user who did the clone.
(just for my own interest: how much does the enterprise version divert from the open source version? It it just extended or are there core changes to the enterprise version)
It is weird though. The task is submitted by the original user and then run on the agent. The task however is still registered by the original user, since it is created by the original user.
Makes more sense to just inherit the user from the task than from the agent?
Okay, this seems to work fine.
SweetBadger76 I am using the Cleanup Service
Nvm. I forgot to start my agent with --docker
. So here comes my follow up question: It seems like there is no way to define that a Task requires docker support from an agent, right?
With clearml==1.4.1 it works, but with the current version it aborts. Here is a log with latest clearml
Well, after restarting the agent (to set it into --detached more) it set the cleanup_task.py into service mode, but my monitoring tasks are just executed on the agent itself (no new service clearml-agent is started) and then it is aborted right after starting.
Or better some cache option. Otherweise the cron job is what I will use 🙂 Thanks again
Here is how my start_carla .py task looks like currently:
` import os
import subprocess
from time import sleep
from clearml import Task
from clearml.config import running_remotely
def create_task(node):
task = Task.create(
project_name="examples",
task_name="start-carla",
repo="myrepo",
branch="carla-clearml-integration",
script="src/start_carla_task.py",
working_directory="src",
packages=["clearml"],
add_task_init_call=...
@<1523701205467926528:profile|AgitatedDove14> Thank you very much for your guidance. Setting these manually works for me!
` ocker-compose ps
Name Command State Ports
clearml-agent-services /usr/agent/entrypoint.sh Restarting
clearml-apiserver /opt/clearml/wrapper.sh ap ... Up 0.0.0.0:8008->8008/tcp, 8080/tcp, 8081/tcp ...
Mhhm, good hint! Unfortunetly I can see nowhere in logs when the server creates a delete request
I usually also experience no problems with restarting the clearml-server. It seems like it has to do with the OOM (or whatever issue I have).
It didn't revert. Just one of my colleagues that I wanted to introduce to clearml put his clearml.conf in the wrong directory and pushed his experiments to the public server.
So I do not blame clearml for this mistake, but generally designing the system to be fail-safe is better than hope that everything is used like it has been designed 🙂
Wouldn't it be enough to just require a call to clearml-init
and throw an error when running without clearml.conf
which tells the user to run clearml-init first?
Okay, I found something out: When I use docker image ubuntu:22.04
it does not spin up a service agent and aborts the task. When I used python:latest
everything works fine!
@<1576381444509405184:profile|ManiacalLizard2> Yea, that makes sense. However, my problem is that I do not want to set it on the remote clearml-agent, since every use may have a different storage. E.g. one user pushes to Azure, while another one pushes to S3