AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi, We Are Having An Interesting Issue Here. We Serve Many Users And Each User Has Their Own Credentials In Accessing The Private Git Repo. We Can'T Seem To Find A Way For The End User To Pass In Their Git Credentials When They Run Their Codes In Both Age

Hi SubstantialElk6

We can't seem to find a way for the end user to pass in their git credentials when they run their codes in both agent and non-agent scenarios. Any advice here?

The bottom line is the agent needs to have read-only access to all the repositories so it can launch any Task. I would recommend to create an agent git user with read-only credentials and configure the agent to use it. wdyt?

3 years ago

0 Hey, I'M Looking Into The Aws Autoscaler. I Couldn'T Find The Task In My Ui, So I Ran The

https://github.com/allegroai/clearml/blob/8658198f8bac976e5c2766190d8c593c2308bfa1/examples/services/aws-autoscaler/aws_autoscaler.py#L82

3 years ago

0 Hi All, I Am Getting A Bunch Of This Kind Of Log Messages "Clearml.Storage - Info - Starting Upload: /Tmp/.Clearml.Upload_Model_6Ou50Pb1.Tmp =>" I Am Pretty Sure They Happen As A Part Of The Model Initialization About 10 Of Those, My Guess Is That Every T

Hi RipeGoose2
What exactly is being uploaded ? Are those the actual model weights or intermediate files ?

3 years ago

0 Hey! I'M Having A Weird Issue When I Run Pip Freeze Locally It'S Showing Version "Clearml==0.17.5Rc6" But When I Initiate The Task It'S Always Starting With "Clearml==0.17.2" - This Version Isn'T Accepting Tags Through The Code Etc. (I'M Manually Fixing I

Hmmm, are you running inside pycharm, or similar ?

3 years ago

0 Trying To Setup A Trains-Agent Worker On A Remote Machine; When I Run Trains-Init And Follow The Steps To Give It Credentials For Our Trains Server I Get This

this issue on when trying to set up on our remote machines

You mean setting up the trains-server on remote machine?

3 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

Instead you can do: TRAINS_WORKER_NAME = "trains-agent":$DYNAMIC_INSTANCE_ID
Then the Worker ID will running instance appended to the worker name. This means that even if you use the same $DYNAMIC_INSTANCE_ID twice, you will not have two agent registering on the same name.

4 years ago

0 Hi, I Went Through This Slack'S History And The Problem Already Popped Up A Couple Of Times But Doesn'T Look Like Solved. On My Machine I Currently Have 4 Gpus, No Problems If I Want To Allocate All 4 Or Just 1 Using

Are you using zsh by any chance?

3 years ago

0 Hi, Can We Upload Our Project Repository To Trains Server? If We Can, How Should We Do? I Know When We Write "Task.Init()", It Uploads Our Experiment Into Server, But It Also Run The Experiment. However, I Want To Upload All My Experiments In Draft Status

MysteriousBee56 I see...
So yes, you can with the APIClient you have full RESTful access to the backend.
I think there was a similar discussion https://allegroai-trains.slack.com/archives/CTK20V944/p1593524144116300
HandsomeCrow5 how did you end up solving it? I think you had a similar use case?!

4 years ago

0 Hi! I Have Local Minio Setup, Via Minio Browser I Can Upload 50-100 Mb Per Second As Its Local. But When I Try To Use Task.Upload_Artifact It Uploads 500 Kb Per Second. Does Anyone Have An Idea About This?

Thanks MuddyCrab47 !!!
I found it!
It turns out the artifact upload will always upload from stream (aka no multi-upload). I will make sure we fix it in the next RC (I think the plan is to have it out this weekend)

4 years ago

0 Question About The Trains Agent And The Git Credentials When Setting A Trains Agent, It Is Possible To Configure Git Credentials For It And I'M Trying To Figure Out In Which Cases It Is Necessary. When Executing A Task Remotely (

WackyRabbit7 basically yes 🙂

4 years ago

0 Clearml-Session Question: I’M Using The Tool With An On-Prem Machine. Normal Tasks Are Being Executed Normally - But When Using

2023-02-15 12:49:22,813 - clearml - WARNING - Could not retrieve remote configuration named 'SSH'

This is fine, it means it uses the default identity keys

The thing is - when I try to connect with normal SSH there are no issues

Now I'm lost, so when exactly do you see the issue ?

one year ago

0 Hi, I'M Trying To Install A New Server, This Is A Fresh Ubuntu 18.04 Install. When I Try To Run The Docker Composer Up Command I Get Error Messages Like This One:

CourageousLizard33 Are you using the docker-compose to setup the trains-server?

4 years ago

0 Hi, I'M Trying To Install A New Server, This Is A Fresh Ubuntu 18.04 Install. When I Try To Run The Docker Composer Up Command I Get Error Messages Like This One:

SteadyFox10 I suspect you are correct 🙂
CourageousLizard33 see also section (4) here:
https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md#launching-the-trains-server-docker-in-linux-or-macos

4 years ago

0 Hey Since Hydra Does Not Work With

So I think there are two bugs here?
--args overrides="key=value" does not work request: add --hydra to override hydra arguments (and if this is added the first one is not needed)Is that correct?

one year ago

0 Hi, I'M Trying To Install A New Server, This Is A Fresh Ubuntu 18.04 Install. When I Try To Run The Docker Composer Up Command I Get Error Messages Like This One:

CourageousLizard33 specifically section (4) is the issue (and it's related to any elastic docker, nothing specific to trains-server)
echo "vm.max_map_count=262144" > /tmp/99-trains.conf sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf sudo sysctl -w vm.max_map_count=262144 sudo service docker restartDid you try the above, and you are still getting the same error ?

4 years ago

0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

Lambda’s are designed to be short-lived, I don’t think it’s a fine idea to run it in a loop TBH.

Yeah, you are right, but maybe it would be fine to launch, have the lambda run for 30-60sec (i.e. checking idle time for 1 min, stateless, only keeping track inside the execution context) then take it down)
What I'm trying to solve here, is (1) quick way to understand if the agent is actually idling or just between Tasks (2) still avoid having the "idle watchdog" short lived, to that it can...

2 years ago

0 Hi Guys, Does Anybody Have The Same Issue Like Me? Is There Any Workaround?

Hi VivaciousWalrus21 I tested the sample code, and the gap was evident in Tensorboard as well. This is not clearml generating this jump this is internal (like the auto de/serialization and continue of the code base)

2 years ago

0 Hi Everyone, I'M Using The

So as you say, it seems hydra kills these

Hmm let me check in the code, maybe we can somehow hook into it

2 years ago

0 Automatic Ssh Keys Export To Agent In Docker Mode

Many thanks! I'll pass on to technical writers 🙂

2 years ago

0 Hi, I'M Trying To Install A New Server, This Is A Fresh Ubuntu 18.04 Install. When I Try To Run The Docker Composer Up Command I Get Error Messages Like This One:

CourageousLizard33 VM?! I thought we are talking fresh install on ubuntu 18.04?!
Is the Ubuntu in a VM? If so, I'm pretty sure 8GB will do, maybe less, but I haven't checked.
How much did you end up giving it?

4 years ago

0 Hi There. I'M Trying To Switch Pipeline Code From A Local Run Using

I think that clearml should be able to do parameter sweeps using pipelines in a manner that makes use of parallelisation.

Use the HPO, it is basically doing the same thing with some more sophisticated algorithm (HBOB):
https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py

For example - how would this task-based example be done with pipelines?

Sure, you could do something like:
` from clearml import Pi...

2 years ago

0 Hi, I'M Trying To Install A New Server, This Is A Fresh Ubuntu 18.04 Install. When I Try To Run The Docker Composer Up Command I Get Error Messages Like This One:

CourageousLizard33 so you have a Linux server running Ubuntu VM with Docker inside?
I would imagine that you could just run the docker on the host machine, no?
BTW, I think 8gb is a good recommendation for a VM it's reasonable enough to start with, I'll make sure we add it to the docs

4 years ago

0 Hello Everyone. I'M Getting Started With Clearml. I'M Trying Hpo Atm And Have Successfully Run The Base Task. When Running The Clone Of The Base Task In One Of The Agents, I'M Getting Following Error. Any Suggestions? Tia

Hi YummyFish22
Looks like the task does not have "Task.init" call on the main script (or an import of clearml)? could that be the case?

one year ago

0 Hi, I'M Trying To Install A New Server, This Is A Fresh Ubuntu 18.04 Install. When I Try To Run The Docker Composer Up Command I Get Error Messages Like This One:

Probably less secure though :)

4 years ago

0 Hi. Is This Line In The Roadmap Article Still Valid, Is It Showing Up In Clearml-Serving?

Hi SubstantialElk6
ClearML-Serving is already out with a new version, the ETA for the next ClearML-serving full 1.0 (which is the new redesign version) is the end of May

2 years ago

0 Hey Everybody - I Am Using The Pipelinecontroller With Add_Function_Step To Add Different Step To The Pipeline. Is There A Way To Specify A Callback Upon An Abort Action From The User ? I Tried Using The Post_Execute_Callback Or The Status_Change_Callback

Hi @<1523715429694967808:profile|ThickCrow29>

Is there a way to specify a callback upon an abort action from the user

You mean abort of the entire pipeline?
None

one year ago

0 Hello I'M New Here, I Found This Error When Running This Command "Docker-Compose --Env-File Example.Env -F Docker-Compose-Triton.Yml Up". Actually, When I Run This Command For The First Time, It Worked. And Then When I Try To Change To My Friend'S Workspa

Hi MoodyCentipede68 , I think I saw something like it, can you post the full log? The triton error is above, also I think it restarted the container automatically and then it worked

2 years ago

0 Hello I'M New Here, I Found This Error When Testing My Tensorflow / Keras Model. I Already Create The Model Endpoint By Running Command 'Clearml-Serving --Id <Service_Id> Model Add --Engine Triton --Endpoint "Model_Name"... '. Also My Tensorflow / Keras M

NICE! MoodyCentipede68 this is awesome 🙂

2 years ago

MoodyCentipede68 could it be that the model is on one account (workspace) and your credentials (the ones provided to the docker compose) are from another workspace?
The error itself point to the triton helper failing to get the model ID from the backend. The models are uploaded to a a specific workspace, and it looks like a mismatch (I.e. the model Id is nowhere to be found) wdyt?

2 years ago

0 Hey, I'M Trying To Set Up A Clearml Server On Docker As Per Documentation. Everything Goes Well Until The Docker-Compose Up Step, That'S When I Get This Error; Error: Error Pulling Image Configuration: Download Failed After Attempts=6: X509: Certificate

WickedElephant66 this seems like a general network issue, like the docker service is missing your companies firewall certificate.
Can you pull any container from docker hub ?

2 years ago

Show more results