Reputation
Badges 1
606 × Eureka!Alright, thanks. Would be a nice feature 🙂
I can put anything there: s3://my_minio_instance:9000 /bucket_that_does_not_exist
and it will work.
I have set default_output_uri
to s3://my_minio_instance:9000/clearml
If I set files_server
to s3://my_minio_instance:9000 /bucket_that_does_not_exist
it fails at uploading metrics, but model upload still works:
WARNING - Failed uploading to
s3://my_minio_instance:9000/ bucket_that_does_not_exist
('NoneType' object has no attribute 'upload')
clearml.Task - INFO - Completed model upload to
s3://my_minio_instance:9000/clearml
What is ` default_out...
These are the errors I get if I use file_servers without a bucket ( s3://my_minio_instance:9000 )
2022-11-16 17:13:28,852 - clearml.storage - ERROR - Failed creating storage object
Reason: Missing key and secret for S3 storage access (
) 2022-11-16 17:13:28,853 - clearml.metrics - WARNING - Failed uploading to
('NoneType' object has no attribute 'upload_from_stream') 2022-11-16 17:13:28,854 - clearml.storage - ERROR - Failed creating storage object
` Reason: Missing key...
It is only a single agent that is sending a single artifact. server-->agent is fast, but agent-->server is slow.
Agent runs in docker mode. I ran the agent on the same machine as the server this time.
So my network seems to be fine. Downloading artifacts from the server to the agents is around 100 MB/s, while uploading from the agent to the server is slow.
Yea, it was finished after 20 hours. Since the artifact started uploading when the experiment finishes otherwise, there is no reporting for the the time where it uploaded. I will debug it and report what I find out
I guess this is from clearml-server and seems to be bottlenecking artifact transfer speed.
Yea, correct! No problem. Uploading such large artifacts as I am doing seems to be an absolute edge case 🙂
AgitatedDove14 Yea, I also had this problem: https://github.com/allegroai/clearml-server/issues/87 I have Samsung 970 Pro 2TB on all machines, but maybe something is missconfigured like SuccessfulKoala55 suggested. I will take a look. Thank you for now!
Okay, thanks for the info! I am currently not using k8s, but may be good to know for the future.
Oh, interesting!
So pip version on per task basis makes sense ;D?
I just tested with remote_execution and the problem seems to exist there, too. It is just that when the task switches from local to remote execution (i.e. exists the local script) the local scalars will appear, but no scalar of remote execution will show up. So also the iteration will not update. However, at least for remote execution I get live console output.
I don't think so. It is related to issue with the clearml-server I posted in the other thread. Essentially the clearml-server hangs, then I restart it with docker-compose down && docker-compose up -d
and the experiments sometimes show as running, but on the clearml-agents I see that actually nothing is running or they show as aborted.
I know that usually clearml-agents do not abort on server restart and just continue.
If I understood correctly, if I tried to print(os.environ["MUJOCO_GL"])
after the clearml Task is created, this should be set?
Very nice!
Maybe for the long-term future you could look into how to make better use of vertical space. Currently, there are 7 (5 in fullscreen mode)= different sections from content to top of the page. Maybe a compact mode would be nice or less space for content headlines.
I use this snippet:
Logger.current_logger().set_default_upload_destination(
"
" # or
)
Alright, that s unfortunate. But thank you very much!
Both, actually. So what I personally would find intuitive is something like this:
` class Task:
def load_statedict(self, state_dict):
pass
async def synchronize(self):
...
async def task_execute_remotely(self):
await self.synchronize()
...
def add_requirement(self, requirement):
...
@classmethod
async def init(task_name):
task = Task()
task.load_statedict(await Task.load_or_create(task_name))
await tas...
If you think the explanation takes too much time, no worries! I do not want to waste your time on my confusion 😄
Then I could also do this:# My custom very special use case task = Task() task = task.load_statedict(await Task.load_or_create(task_name)) await task.synchronize() await run_code_analysis() task.add_requirement("myreq") await task.synchronize()
Is there a way for me to configure/add the run arguments for the docker run
call?
Perfect, thanks! Only issue that is left, is that it seems like .ssh
is used even when I provideSSH_AUTH_SOCK
. I created an issue here: https://github.com/allegroai/clearml-agent/issues/45
For everyone who had the patience to read through everything, here is my solution to make clearml work with ssh-agent forwarding in the current version:
Start and ssh-agent Add ssh keys with ssh-add to agent echo $SSH_AUTH_SOCK and paste into clearml.conf as here: https://github.com/allegroai/clearml-agent/issues/45#issuecomment-779302144 (replace $SSH_AUTH_SOCKET with actually value) Move all the files except known_hosts
out of ~/.ssh
of the clearml-agent workstation. Start the...
However, I have not yet found a flexible solution other than ssh-agent forwarding.
Yes, but this seems pretty reasonable to assume imo.