Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
AgitatedDove14
Moderator
49 Questions, 8126 Answers
  Active since 10 January 2023
  Last activity one year ago

Reputation

0

Badges 1

25 × Eureka!
0 Hey There! I’M Having A Problem With Clearml-Sessions, Maybe Someone Had A Similar Problem Already: I’M Running An Agent In Docker Mode On A Remote Machine. When I Run

BitingKangaroo95 nice work 🎊
I think that what did it was:
change the sshd_config so that it allows port forwarding , agent forwarding and x11 forwardingBut just in case, it might be there was a pre existing SSH identifier on your machine, and hence the error.
clear known_hosts under ~/.ssh was also something I would try πŸ™‚

3 years ago
0 Hello, I Want To Set Up Clean Up Services For Our A Self-Hosted Clearml (I Used Aws Ami To Spin Up A Server). On What Machine Is It Best Practice To Run The Clean Up Service, Local Machine Or Should It Be On The Clearml Server ?

Hi @<1573119955400921088:profile|CloudyPelican46>

On what machine is it best practice to run the clean up service, local machine or should it be on the clearml server ?

The easiest is to run it on the server machine itself, even though in practice you can put it anywhere, but most of the time this service is sleeping and not using so much RAM so it kind of makes sense

2 years ago
0 Hi All, There Is A Way To Get From A Task-Object The Experiment Source Code? In Other Words, Assume I Have Access To A Specific Trains Server And Want To Store From A Particular Task The Experiment Source Code In A Temp File. There Is A Convenient Way To

It should be under script.diff:
'script': {'binary': '', 'repository': '', 'tag': '', 'branch': '', 'version_num': '', 'entry_point': '', 'working_dir': '', 'requirements': {'pip': ''}, 'diff': ''}For some reason this is empty in your case, are you seeing it in the UI?
If you are querying the current task (i.e. running) it might not be there yet.
You can call this internal function that returns only after the repo detection is done.
task._wait_for_repo_detection()

5 years ago
0 What Happens If The Task.Init Doesn'T Happen In The Same Py File As The "Data Science" Stuff I Have A List Of Classes That Do The Coding And I Initialise The Task Outside Of Them. Something Like

but here I can tell them: return a dictionary of what you want to save

If this is the case you have two options, either store the dict as an artifact (this makes sense if this is not standalone model you would like to later use), or store as an artifact.
Artifact example:
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts.py
getting them back
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts_retrieval.py
Model example:
https:/...

3 years ago
0 Performance Under Docker Is 10% Lower Than On Bare Metal

Hi DullCamel78

Hi everyone! Has anyone tried running

aws_autoscaler.py without docker?

Well generally since this is a remote machine the easiest way to control environment is with containers, hence the default use case. In theory you can change it to use venv, but then of course your a somewhat limited with the diff drivers/cuda/python environement.

performance under docker is 10% lower than on bare metal

add to your extra docker args
` extra_docker_arguments: ["...

3 years ago
0 Another Question Is If I Have A Conda Env Available On My Workers Systemwide.. Can I Use That Env Directly When Running Tasks With

hmm... try to run the trains-agent from the ml environment with "system_site_packages: true", it might do the trick. Anyhow please let me know if it worked πŸ™‚

5 years ago
0 Hi, I Have A Question Regarding The Autoscaler. I Implemented A Custom Driver For Gcp And I Manager To Launch The Clearml.Automation.Auto_Scaler.Autoscaler Which Runs Smoothly (Kudos!!). I Can See Instance Being Created/Destroyed On Demand As Expected. Th

Hi @<1523715429694967808:profile|ThickCrow29>

clearml.automation.auto_scaler.AutoScaler which runs smoothly (kudos!!).

NICE!

The only thing I am missing is the in the clearml dashboard/orchestration --> Is there a way to make it

hmm kind of needs backend support for that 😞

For now, I can just see the log of the clearML task to monitor what’s happening
Or is this retricted to pro user ?

Yeah the GCP and AWS autoscalers dashboards are paid tier feature. But...

one year ago
0 Okay Another Question !! Okay So I Would Like To Edit Parameters Through The Ui And Run It. So This Is My Script

Hi CluelessElephant89
When you edit the args (General section) in the UI, you are editing the args for "remote execution"
(i.e. when executed by the agent, the args dict will get the values from the UI , as oppsed to "manual execution" where there UI gets the values from code)
In order to simulate the "remote execution" inside your development environment
Try:
` from clearml import Task

simulate remote execution of a specific Task instance

Task.debug_simulate_remote_task(task_id='R...

4 years ago
0 Hi, I Have A Question Regarding The Autoscaler. I Implemented A Custom Driver For Gcp And I Manager To Launch The Clearml.Automation.Auto_Scaler.Autoscaler Which Runs Smoothly (Kudos!!). I Can See Instance Being Created/Destroyed On Demand As Expected. Th

so I wanted to keep our β€œfork” of the autoscaler but I guess this is not supported.

you are correct 😞
I wonder, " I customized it a bit to our workflow " what did you add?

one year ago
0 Question About Using S3 As Artifact Storage - Do We Need To Setup S3 Credentials On Every System That Is Using Those Artifacts (E.G. In Clearml-Agent Where Model Upload Happens, Or In A Prediction Service, That Needs To Download The Latest Model)

Hi FiercePenguin76
So currently the idea is you have full control over per user credentials (i.e. stored locally). Agents (depending on how deployed) can have shared credentials (with AWS the easiest is to push to the OS env)

4 years ago
0 Hi, It Seems Like We Have A Bug In Metrics Reporting While Comparing Between Several Experiments (Under Scalars). The Loss Report Includes Only One Experiment Results While All The Other Metrics Show All Of Them. The Data Is Exist At Each Experiment, But

Hi GrotesqueMonkey62 any chance you can be a bit more specific? Maybe a screen grab?
Here is how it works, if you look at an individual experiment scalars are grouped by title (i.e. multiple series on the same graph if they have the same title)
When comparing experiments, any unique combination of title/series will get its own graph, then the different series on the graph are the experiments themselves.
Where do you think the problem lays ?

5 years ago
4 years ago
0 I Am Using `

Hi SarcasticSparrow10

Is it better to post such questions on Stackoverflow so they benefit everybody?

Yes, I think you are correct it would please do πŸ™‚

Try to do " reuse_last_task_id='task_id_here'" ,t o specify the exact Task to continue )click on the ID button next to the task name in the UI)
If this value is true it will try to continue the last task on the current machine (based on project/name, combination) if the task was executed on another machine, it will just start a ...

5 years ago
0 I Am Using `

Many thanks!

5 years ago
0 Hi Everyone, I Was Looking Into Clearml Integration With Nvidia For Transfer Learning. Does Clearml Have Plans To Integrate With The New Tao? Looks Like Nvidia Is Focusing Tao As A Low Code Transfer Learning Tool With Everything Done In Command Line, Whic

The latest TAO doesn't use python for fine tuning, rather it uses the CLI entirely

It's a good question, but I think the CLI actually just runs a python code (the CLI is their interface). Generally speaking I'm pretty sure it will not be complicated to convert the TLT integration to support TAO (Nvidia helps with that, and I think we had a similar proces with Nvidia Clara/MONAI)
BTW: how are you using Nvidia TAO ?

3 years ago
0 Hi, I Am Trying To Run Experiment From Clearml Web Ui. I Did Experiment Copy, Enqueue, But In The Execution Log I See That It Runs Command

orchestration module
When you previously mention clone the Task I the UI and then run it, how do you actually run it?
regarding the exception stack
It's pointing to a stdout that was closed?! How could that be? Any chance you can provide a toy example for us to debug?

4 years ago
0 Hi Again, I Was Wondering What Would Be A Good Practice With Respect To Saving Different Datasets (While Preprocessing It In Several Steps/Stages). Mainly With The Use Of Remove_Files(). Is It Ok To Delete Raw Data After Preprocessing For Example? In That

Hi CostlyElephant1
What do you mean by "delete raw data"? Data is always fetched to cached folders and clearml takes care of cache cleanup
That said notice that get mutable copy is a target you specify, in this case you should definetly delete after usage. Wdyt ?

2 years ago
0 Clearml (Remote Execution) Sometimes Doesn'T "Pick-Up" Gpu. After I Rerun The Task It Picks It Up. Seems Random, Doesn'T Happen Too Often (Maybe Once In 30-40 Times) And I Cannot Seem To Detect Any Pattern. Did Anyone Else Notice This? Agents Are Vms On G

Is there an easy way to add a docker argument in the python script?

On the task it self in the UI you can edit the docker arguments and add any missing flags
(task.set_base_docker will do the same from code)
You can also edit the configuration and always add this flag:
None

one year ago
0 Clearml (Remote Execution) Sometimes Doesn'T "Pick-Up" Gpu. After I Rerun The Task It Picks It Up. Seems Random, Doesn'T Happen Too Often (Maybe Once In 30-40 Times) And I Cannot Seem To Detect Any Pattern. Did Anyone Else Notice This? Agents Are Vms On G

I know about clearml.conf but wanted to avoid ssh-ing through 50 instances to edit it.

LOL yeah, btw: this is exactly the reason the enterprise version has a vault feature, so one could edit the base configuration in the UI and it automatically propagates everywhere

but docker_arguments doesn't propagate if I leave docker_image as None

yeah, that's correct, you have to select a container to be used

one year ago
one year ago
0 Clearml (Remote Execution) Sometimes Doesn'T "Pick-Up" Gpu. After I Rerun The Task It Picks It Up. Seems Random, Doesn'T Happen Too Often (Maybe Once In 30-40 Times) And I Cannot Seem To Detect Any Pattern. Did Anyone Else Notice This? Agents Are Vms On G

I'm not sure how to debug it, that would be my first question. So I should first check if docker is executed with --gpus? I'll pay attention to this next time this happens, thanks.

The first line of the Task console log should have the exact docker command that was used, this could be a good start
also check if there is any chance there is another agent listening to this queue, maybe it actually runs somewhere without a gpu at all?

one year ago
0 Clearml (Remote Execution) Sometimes Doesn'T "Pick-Up" Gpu. After I Rerun The Task It Picks It Up. Seems Random, Doesn'T Happen Too Often (Maybe Once In 30-40 Times) And I Cannot Seem To Detect Any Pattern. Did Anyone Else Notice This? Agents Are Vms On G

Hi @<1631102016807768064:profile|ZanySealion18>

ClearML (remote execution) sometimes doesn't "pick-up" GPU. After I rerun the task it picks it up.

what do you mean by "does not pick up"? is it the container is up but not executed with --gpus , so no GPU access?

one year ago
0 Hey, Just Trying Out Clearml-Serving And Getting The Following Error

Hi RobustRat47

My guess is it's something from the converting PyTorch code to TorchScript. I'm getting this error when trying the

I think you are correct see here:
https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/examples/pytorch/train_pytorch_mnist.py#L136
you have to convert the model to TorchScript for Triton to serve it

3 years ago
0 Another Question, I Have Written A Code That Includes A Task Scheduler That Calls A Function. That Function Watches A Folder And If There Are Sufficient Images, It Creates And Publishes The Dataset, After Which It Clears The Folder. Problem, For Some Rea

VexedCat68

a Dataset is published, that activates a Dataset trigger. So if every day I publish one dataset, I activate a Dataset Trigger that day once it's published.

From this description it sounds like you created a trigger cycle, am I missing something ?
Basically you can break the cycle by saying, trigger only on New Dataset with a specific Tag (or create the auto dataset in a different project/sub-project).
This will stop your automatic dataset creation from triggering the "orig...

3 years ago
0 Hey Has Anyone Managed To Capture Darts Logging With Clearml When Using The Temporal Fusion Transformers ? Even When Overriding Their Trainer With A Custom Pytorch Lightning Trainer It Seems That Clearml Cannot Retrieve The Iteration Log...

No I was was pointing out the lack of one

Sounds like a great idea, could you open a github issue (if not already opened) ? just so we do not forget

set the pytorch lightning trainer argument

log_every_n_steps

to

1

(default

50

) to prevent the ClearML iteration logger from timing-out

Hmm that should not have an effect on the training time, all logs are send in the background, that said checkpoints might slow it a bit (i.e.; i...

2 years ago
0 Another Question, I Have Written A Code That Includes A Task Scheduler That Calls A Function. That Function Watches A Folder And If There Are Sufficient Images, It Creates And Publishes The Dataset, After Which It Clears The Folder. Problem, For Some Rea

why are there indefinitely growing anonymous tasks, even after i've closed the main schedulers.

The anonymous Tasks are The Dataset you are creating (a Dataset version is also a Task of a certain type with artifacts, the idea is usually Datasets are created from code, hence the need to combine the two).
Make sense ?

4 years ago
Show more results compactanswers