ReassuredTiger98

95 Questions, 639 Answers

Active since 10 January 2023

Last activity 9 months ago

Reputation

Badges 1

606 × Eureka!

Answers 639

0 Fyi: Conda Installation Of Pytorch Is Broken Again. My Old Tasks Which Worked Before Now Fail Since They Do Not Find Torch. However, I Can See In The Execution That Conda Had Errors. Most Probably It Happens Because Pytorch 1.8.1 Has Been Released, But I

I got some warnings about broken packages. I cleaned the conda cache with conda clean -a ` and now it installed fine!

3 years ago

0 Hi Everyone, Is There A Way To Either Aggregate Scalars In The Web Ui Or To Read Scalars From Existing Tasks? My Use Case Is That I Do Multiple Runs Of The Same Task And Want To Plot The Mean Metrics. However, Since A Single Run Takes Quite Long, I Do Wan

Thank you very much, good to know!

one year ago

0 Hello Everyone! Is It Possible To Deactivate Package Analysis For Remote Execution? I Run My Code With Clearml-Agent In Docker Mode With Nvidia:Pytorch Container. When Clearml Is Running Inside The Docker The Installed Packages Of The Webui Get Updated. H

For example I get the following error if I simply clone and rerun:
ERROR: Could not find a version that satisfies the requirement ruamel_yaml_conda>=0.11.14 (from conda==4.10.1->-r /tmp/cached-reqs6wtc73be.txt (line 28)) (from versions: none) ERROR: No matching distribution found for ruamel_yaml_conda>=0.11.14 (from conda==4.10.1->-r /tmp/cached-reqs6wtc73be.txt (line 28))

3 years ago

0 Hi Fam! Sorry For The Potential Dumb Question, But I Couldn’T Find Anything On The Interwebs About It. I’M Hosting A Clearml Server On Aws, Using S3 As A Backend For Artifact Storage. I Find That Whenever I Delete Archived Artifacts In The Web App, I Get

I am referring to the UI. The default cleanup service should work with S3 with a correctly configured clearml service agent if I understand the workings correctly.

2 years ago

0 I Have A Problem That Might Not Directly Be Clearml Related, But Maybe Someone Here Has An Idea: I Run A Clearml-Server On A Machine With 128Gb Ram, 32 Cores And 2 Gpus. On The Same Machine I Run 2 Clearml-Agent Each With Access To 1 Gpu, 12 Cores, An 48G

I usually also experience no problems with restarting the clearml-server. It seems like it has to do with the OOM (or whatever issue I have).

2 years ago

0 Hi Everyone, Quick Question: When Clearml-Agent Sets Up The Virtual Environment With Pip, Is Finding The Correct Cuda Version For Pytorch Something That Pip Or That Clearml Does?

Tested with clearml-agent 1.0.1rc4/1.2.2 and clearml 1.3.2

2 years ago

Shows some logs, but nothing of relevance I think. Only Infos and Warning about deprecated stuff that is still used ;D ...

2 years ago

Thank you. The reports feature is super cool! Greetings to the team. One of the best features for educational use!

one year ago

Could be clean log after restart. Unfortunately, I restarted the server right away 😞 I gonna post if it happens again with the appropriate logs.

2 years ago

0 Hello Everyone, I Have A Question About Ssh/Credentials: Let'S Say I Have Multiple Users / Multiple Ssh Credentials That I Do Not Want To Share With The Clearml-Agent Workstations. Is There A Way To Send Credentials To The Agent In The Task? So For Exampl

But yeah, I see the point of enterprise having this feature and basic not 🙂

3 years ago

SuccessfulKoala55 I just had the issue again. The logs show nothing of interest. It looks like OOM to me, but I will test this again with way larger SWAP, so the server only slows down, but does not kill something. Unfortunately, kernel logs also do not show much (maybe I have my server logs misconfigured, I am no expert).
What is interesting though is that docker only showed my nginx, minio and docker-registry to have exited, while all the clearml containers were still running. I restarted ...

2 years ago

For example I run a task remotely. Now I decide I want to rerun it, but slightly change a parameter. So I clone the task and edit the parameter in the WebUI. Then I submit the task to a queue. When the clearml-agent pulls the tasks and tries to install the requirements, it will fail since the task requirements now contain packages that had been preinstalled in the environment (e.g. nvidia docker). These packages may not be available via pip, so the run will fail.

3 years ago

I see, so it is actually not related to clearml 🎉

3 years ago

Thank you!

3 years ago

I see, I just checked the logs and it shows
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f246f0d6c18>: Failed to establish a new connection: [Errno 111] Connection refused [2022-04-29 08:45:55,018] [9] [WARNING] [elasticsearch] POST [status:N/A request:0.000s]Unfortunetely, there are no logs in /usr/share/elasticsearch/logs to see what elastic was up to

2 years ago

0 Hi Guys, There Is A Bug Introduced With Clearml-Agent 1.5.0: The Resolution Of The Torch Version Is Broken: It Will Try To Find The Torch Version Matching The Cuda Version Of The System, As Opposed To Version 1.4.1, Where It Tries To Find The Cuda Version

AnxiousSeal95 Thanks a lot. Seems to be working fine for me. I see the clearml-agent version that pip installs in the docker is now fixed to the host version 🙂 PyTorch Nightly is also installed correctly now!

2 years ago

0 I Suddenly Get

Yea, the one script that is preinstalled.

3 years ago

0 Can Someone Point Me Whether/How The Services-Agent The Starts With The Clearml-Server Mounts The

Now the pip packages seems to ship with CUDA, so this does not seem to be a problem anymore.

3 years ago

0 I Have A Self-Hosted Clearm-Server And And Clearml-Agent Started With

Nvm. I think I understood. When the file has never been added to repository it is not tracked.

3 years ago

0 Anyone Here With Any Idea Why My Service Tasks Get Aborted When Going To Sleep?

With clearml==1.4.1 it works, but with the current version it aborts. Here is a log with latest clearml

one year ago

0 Hey Everyone, I Have Another Question: Is It Possible To Change Agent Config For Each Task? E.G.

Maybe this is something that is only possible with the vault of the enterprise version?

2 years ago

First one is the original, second one the clone

3 years ago

0 Can Someone Point Me Whether/How The Services-Agent The Starts With The Clearml-Server Mounts The

I was wrong: I think it uses the agent.cuda_version , not the local env cuda version.

3 years ago

0 I Cannot Get Clearml-Agent With Docker Containers To Work. Clearml Uses

For everyone who had the patience to read through everything, here is my solution to make clearml work with ssh-agent forwarding in the current version:
Start and ssh-agent Add ssh keys with ssh-add to agent echo $SSH_AUTH_SOCK and paste into clearml.conf as here: https://github.com/allegroai/clearml-agent/issues/45#issuecomment-779302144 (replace $SSH_AUTH_SOCKET with actually value) Move all the files except known_hosts out of ~/.ssh of the clearml-agent workstation. Start the...

3 years ago

0 Hello! Since Today I Get

The problem is that clearml installs cudatoolkit=11.0 but cudatoolkit=11.1 is needed. By setting agent.cuda_version=11.1 in clearml.conf it uses the correct version and installs fine. With version 11.0 conda will resolve conflicts by installing pytorch cpu-version.

3 years ago

0 I Have A Self-Hosted Clearm-Server And And Clearml-Agent Started With

clearml==0.17.4
` task dca2e3ded7fc4c28b342f912395ab9bc pulled from a238067927d04283842bc14cbdebdd86 by worker redacted-desktop:0
Running task 'dca2e3ded7fc4c28b342f912395ab9bc'
Storing stdout and stderr log to '/tmp/.clearml_agent_out.vjg4k7cj.txt', '/tmp/.clearml_agent_out.vjg4k7cj.txt'
Current configuration (clearml_agent v0.17.1, location: /tmp/.clearml_agent.us8pq3jj.cfg):

agent.worker_id = redacted-desktop:0
agent.worker_name = redacted-desktop
agent.force_git_ssh...

3 years ago

0 Hello, I Am Looking For A Way To Increase Number Of Images Saved In Results>Debug Samples. Looks Like There Is A Limit Of 100 Images Per Experiment, And All Images Saved After Are Not Displayed In Web Client. I Like To Have First Batch With Predictions V

MortifiedDove27 Sure did, but I do not understand it very well. Else I would not be asking here for an intuitive explanation 🙂 Maybe you can explain it to me?

3 years ago

0 With Clearml 1.0 It Seems That Console Logs Are Only Shown In The Web Ui When The Task Has Finished. Is This Expected Behaviour? With Previous Versions I Was Able To See "Live" Output. I Tested This With The Pytorch_Tensorboardx.Py Example. I Run The Scri

My clearml-server server crashed for some reason, so I won't be able to verify until tomorrow.

3 years ago

0 I Suddenly Get

It is server version 1.0 and everything that came with it.

3 years ago

0 I Cannot Get Clearml-Agent With Docker Containers To Work. Clearml Uses

However, I have not yet found a flexible solution other than ssh-agent forwarding.

3 years ago

Show more results