AgitatedDove14

48 Questions, 8051 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8051

0 Hello! I Am Using Clearml Enterprise, And Want To Pass Env Vars With Environment Section In Configuration Vault So, In Logs Of Remote Task (Config Parsing Part) I Can See My Env Vars (Pics Related), But I Can'T Get Them Via

Nice! So out of curiosity why didn't it work this time and you had to do it manually?

7 months ago

0 Please Tell Me, When Migrating A Local Server, We Have Problems That The Saved Images Are Not Displayed, It Says "Unable To Load Image" And Links To The Old Server If You Click "Copy Image Url" Or "Open Image". The Migration Was Done According To Backup'

CheerfulGorilla72 could it be the server address has changed when migrating ?

2 years ago

0 Hello, We Have A Self Hosted Clearml Server Connected To Different Queues And Use It To Launch Remote Experiments (Clearml==1.9.3, Clearml-Agent==1.5.2Rc0). It Is Working Really Well For Us Unless One Workflow :) We Would Like To Abort An Experiment And E

Yes it is reproducible do you want a snippet?

Already fixed 🙂 please ping tomorrow, I think an RC should be out soon with the fix

one year ago

0 Hi All! I Am A Bit Confused As To How The Python Environment Is Set. I Can Submit Jobs That Build The Environment And Run Perfectly Fine. But, If I Abort The Job -> Requeue It From The Gui, Then A Different Environment Is Installed (Which Has Some Package

this is very odd, can you post the log?

8 months ago

0 Please Tell Me, Is It Possible To Do Something So That You Can Indicate Some Other S3 Sources, Host And Providers Other Than

@<1523702932069945344:profile|CheerfulGorilla72> use the following bucket name when you are configuring your files/output uri

s3://<iphere>:<porthere>/<bucket_here>

From there everything should work as expected

9 months ago

0 Am I Doing Something Wrong Or Is Should I Open An Issue About It (Bug?)? I'M Using The

Yep it is the scale 🙂 and yes it should appear once you upgrade

3 years ago

0 Hi People Im Tryng To Install A Worker In A Training Machine I Cant Install Clearml In The Local Environment, I Can Install Clearml And Clearml-Agemt And Run The Worker Inside A Docker But I'M Facing Some Problems I Have A File .Crt But For More That I Co

Hi @<1739818374189289472:profile|SourSpider22>
could you send the entire console log? maybe there is a hint somewhere there?
(basically what happens after that is the agent is supposed to be running from inside the container, but maybe it cannot access the clearml-server for some reason)

2 months ago

0 Good Evening Again)) Tell Me Please, Does The Agent Always Create A Virtual Environment? Is It Possible To Make The Agent Run The Script In An Already Prepared Docker Container Without Creating A Virtual Environment In The Container?

we run in containers without venv, in the main section, and then delete it or use it for similar experimentsIf this is the case then the idea is the venv creation is actually cached, you can turn it on here (unmark the line)
https://github.com/allegroai/clearml-agent/blob/51eb0a713cc78bd35ca15ed9440ddc92ffe7f37c/docs/clearml.conf#L116

2 years ago

0 Hi, I Wanted To Try Model Versioning, Suppose That I'Ve A Model And Want To Have Multiple Versions Of The Same Model And To Be Able To Have Inference On These Models(For Example

@<1671689437261598720:profile|FranticWhale40> could you test the fix? just pull & run

allegroai/clearml-serving-triton:1.3.1
allegroai/clearml-serving-inference:1.3.1

9 months ago

0 Hii Everyone! I'M Having An Issue Using An Agent Without A Gpu. I'M Using It On Docker Mode (To Allow Ssh), I Changed The Default Docker Image On The Config File To Python 3.9.6 But It Seems It Is Still Trying To Use The Nvidia Image. The Error Message G

Hi GrotesqueOctopus42 ,

BTW: is it better to post the long error message on a reply to avoid polluting the channel?

Yes, that is appreciated 🙂
Basically logs in the thread of the initial message.

To fix this a had to spin the agent using --cpu-only flag (--docker --cpu-only)

Yes if you do not specify --cpu-only it will default to trying to access gpus
Nice!

one year ago

0 Hi

ElegantKangaroo44 I think TrainsCheckpoint would probably be the easiest solution. I mean it will not be a must, but another option to deepen the integration, and allow us more flexibility.

4 years ago

0 Hi, I'M Having Problems With The Installed Packages When Creating An Experiment. The Installed Packages Used To Be A List With The Versions Of All The Installed Packages In The Venv. However, Now I Get The Following:

which part of the code?

the main script?!

but is not part of the package

is the repo it self a package ?

3 years ago

0 Hi All, I Am Running Into Ssl Verification Issues With Trying To Upload Model Artifacts To Minio. We Are Running The Clearml Agent In A Container, Have Mounted A Ca Bundle To The Container And Referenced It On Env Vars So That Aws Cli/Boto And Requests Us

But first I want to make sure the verify argument is actually used, hence False

3 years ago

0 Hi All, I'M Using Clearml 1.0.3 With Clearml-Server <1 (How Do I Get The Current Running Version?) In Pytorch-Lightning I Use Ddp And I See Multiple Tasks (As The Number Of Gpus) Being Created And Remaining In Draft Mode. Is It A Problem Running Clearml

Maybe we should rename it?! it actually creates a Task but will not auto connect it...

3 years ago

0 My Agent (Running On Gcp In Docker Mode) Is Having Trouble With Git Fetch --All. I'M Using Ssh For Authentication, However, Known_Hosts Doesn'T Seem To Be Passed To The Docker So It Prompts For Authentification/Fingerprint. Any Ideas?

Added -v /home/uname/.ssh:/root/.ssh and it resolved the issue. I assume this is some sort of a bug then?

That is supposed to be automatically mounted the SSH_AUTH_SOCK defined means that you have to add the mount to the SSH_AUTH_SOCK socket so that the container can access it.
Try to run when you undefine SSH_AUTH_SOCK and keep the force_git_ssh_protocol (no need to manually add the .ssh mount it will do that for you)

11 months ago

0 Hi

p.s. clearml v0.17.1 is out, fixing the missing link to clearml-task 😥

3 years ago

0 I Have An On-Prem/Free Clearml-Server Setup With Custom S3 Back-End Storage. I'M Trying Out The Clearml-Serving Capability And Not Sure What'S Failing. When I Start The Serving Containers It Can'T Retrieve The Model:

When I start the serving containers it can't retrieve the model:

Hi BrightRabbit75
I think you need to pass the credentials for your S3 account to the clearml-serving containers
Basically just add AWS_ACCESS_KEY_ID , AWS_SECRET_ACCESS_KEY to your docker compose:

https://github.com/allegroai/clearml-serving/blob/4b52103636bc7430d4a6666ee85fd126fcb49e2e/docker/docker-compose-triton-gpu.yml#L110
https://github.com/allegroai/clearml-serving/blob/4b52103636bc7430d4a6666e...

2 years ago

0 Hey! I Would Like To Connect To Same Task From Multiple Consumer And Upload Debug Image. Is It Possibile? It Seems Like I Can Connect To The Task. Get The Logger But Nothing Is Uploaded.

os.environ['TRAINS_PROC_MASTER_ID'] = args.trains_idit should be '1:'+args.trains_id

os.environ['TRAINS_PROC_MASTER_ID'] = '1:{}'.format(args.trains_id)Also str(randint(1, sys.maxsize))

4 years ago

0 Hey Guys Trying To Save A Model Via The Outputmodel.Update_Weights Function I Get The Following Error:

Could it be in a python at_exit event ?

one year ago

0 Hello Everyone, How Do I Tell The Agent That It Needs To Install A Local Module Of The Repo? If I Put Git+<Repopath> In The Requirements It Will Install The Module Version In The Repo And Not Necessarily The Version That Launched The Task. I Basically Wan

Hi AttractiveCockroach17
In your "Installed Packages" (when the task is in draft mode, you can edit it like any requirements.txt), you need to add:
package @ git+You can also make sure you have in in the first place bu adding
Task.add_requirements("package", "@ git+ ") task = Task.init(...)

2 years ago

0 Hey! I Would Like To Connect To Same Task From Multiple Consumer And Upload Debug Image. Is It Possibile? It Seems Like I Can Connect To The Task. Get The Logger But Nothing Is Uploaded.

so if i plot image with matplot lib..it would not upload? i need use the logger.

Correct, if you have no "main" task , no automagic 😞

so how can i make it run with the "auto magic"

Automagic logs a single instance... unless those are subprocesses, in which case, the main task takes care of "copying" itself to the subprocess.

Again what is the use case for multiple machines?

4 years ago

0 Hi Community, Using Clearml And Loving It So Far! I Had A Question Around The Models That Clearml Automatically Picks Up From The Frameworks And Saves Them. Is There A Way That I Can Control Which Models Actually Get Saved Though? I Would Like To Save Onl

Hi @<1523701132025663488:profile|SlimyElephant79>

I would like to save only the last & best checkpoints and not all of them if possible.

Basically it will mimic the local file system, so if you overwrite the local files it will overwrite the remote model.
You can also disable auto logging, and manually upload the models
In Task.init pass auto_connect_frameworks False for the specific framework
see:
[None](https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk/#automatic-lo...

one year ago

0 Hi, We Have Quite An Unusual Issue. We Run Some Agents, We Attach Them To Queue. They Are Doing The Job (They Are Doing Hyperparameter Optimization), However They Are Not Visible Either In:

RoundMosquito25 how is that possible ? could it be they are connected to a different server ?

2 years ago

0 Hi, Are There Available Somewhere Examples Of Testing In Clearml? For Example Unit Tests That Check If Parameters Are Passed Correctly To New Tasks Etc.?

Hi RoundMosquito25

Hi, are there available somewhere examples of testing in ClearML? For example unit tests that check if parameters are passed correctly to new tasks etc.?

What do you mean by "testing in ClearML" ?

For example unit tests that check if parameters are passed correctly

Passed where / how? Are we thinking agents here ?

one year ago

0 Hi, I Have Such A Problem, After I Restore The Experiment From The Checkpoint, My Scalar Metrics Have Gaps Due To The Fact That My Iterations Are Not Zero. If The Smart Way Is How To Get Rid Of It?

Hmm, I see the jump from 50 to 100, is that consistent with the last iteration on the aborted Task (before continuing )?

3 years ago

0 Any Ideas Why This Is Happening? It Was Fine Yesterday

TenseOstrich47 this looks like elasticserach is out of space...

3 years ago

0 And One More Question. How Can I Get Loaded Model In Preporcess Class In Clearml Serving?

we will try to use Triton, but it’s a bit hard with transformer model.

Yes ...

All extra packages we add in serving)

So it should work, you can also run your preprocess class manually from your own machine (for debugging), if you pass to it a local file (basically the downloaded model file from the UI, it should work

it. But it’s maybe not the best solution

Yes... it is not, separating the pre/post to CPU instance and letting triton do the GPU serving is a lot more effici...

2 years ago

0 Hi Everyone, I'M Running Into A Weird Error When Trying To Clone And Run And Task That Has Completed Successfully. I Have A Test Task That Loads A Dummy Dataset And Trains A Toy Model With Pytorch. When Running Remotely, I Use My Own Docker Image That Has

Sure thing, anyhow we will fix this bug so next version there is no need for a workaround (but the workaround will still hold so you won't need to change anything)

7 months ago

0 Hi Everyone, How Do I Integrate Sagemaker With Clearml , Currently I Only See Wandb Integrated With The Hugging Face And Don'T See Any Tutorials On Clearml , I Am Fine Tuning A Llama Model And Following This

Oh, then just make sure you call Task.init in your code,
as long as you have clearml.conf in the container or pass the ENV variables to configure your clearml, it should just work

one year ago

@<1533620191232004096:profile|NuttyLobster9> I think we found the issue, when you are passing a direct link to the python venv, the agent fails to detect the python version and since the python version is required for fetching the correct torch it fails to install it. This is why passing CLEARML_AGENT_PACKAGE_PYTORCH_RESOLVE=none because it skipped resolving the torch / cuda version (that requires parsing the python version)

7 months ago

Show more results