Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
AgitatedDove14
Moderator
48 Questions, 8051 Answers
  Active since 10 January 2023
  Last activity 7 months ago

Reputation

0

Badges 1

25 × Eureka!
0 Is It Possible To Avoid The Clearml-Agent For Local Installations, And Have The File Server Automatically Use An S3 Bucket? I'Ve Found

Do I set the 

CLEARML_FILES_HOST

 to the end point instead of an s3 bucket?

Yes you are right this is not straight forward:
CLEARML_FILES_HOST=" s3://minio_ip:9001 "
Notice you must specify "port" , this is how it knows this is not AWS. I would avoid using an IP and register the minio as a host on your local DNS / firewall. This way if you change the IP the links will not get broken 🙂

2 years ago
0 Automatic Ssh Keys Export To Agent In Docker Mode

Many thanks! I'll pass on to technical writers 🙂

2 years ago
0 Is There Any Difference In:

Hi HelplessCrocodile8
yes there is:
in the first case, the new_key will be automatically logged:
a_dict = {} a_dict = task.connect(a_dict) a_dict['new_key'] = 42In the second example changes to the "object" passed to connect are not tracked
make sense ?

3 years ago
0 Are There Any Particular System Dependencies Needed To Enable

Hi @<1533620191232004096:profile|NuttyLobster9>

I, but no system stats. ,,,

If the job is too short (I think 30 seconds), it doesn't have enough time to collect stats (basically it collects them over a 30 sec window, but the task ends before it sends them)
does that make sense ?

7 months ago
0 Hello! Since Today I Get

Where again does clearml place the venv?

Usually ~/.clearml/venvs-builds/<python version>/
Multiple agents will be venvs-builds.1 and so on

3 years ago
0 Another Strange Behavior Of The Python Sdk Cli: After Executing Python My_Task.Py, Where My_Task.Py Creates And Send To The Queue An Experiment, The Command Returns But After Some Time Some Messages Are Printed In The Console, Such As

yes, so it does exist the local process (at least, the command returns),

What do you mean the command returns ? are running the scipt from bash and it returns to bash ?

3 years ago
0 Task Struck At

ShallowGoldfish8 the models are uploaded in the background, task.close() is actually waiting for them, but wait_for_upload is also a good solution.

where it seems to be waiting for the metrics, etc but never finishes. No retry message is shown as well.

From the description it sounds like there is a problem with sending the metrics?! the task.close is waiting for all the metrics to be sent, and it seems like for some reason they are not, and this is why close is waiting on them
A...

one year ago
0 Task Struck At

I think this was the issue: None
And that caused TF binding to skip logging the scalars and from that point it broke the iteration numbering and so on.

one year ago
0 Hello! Since Today I Get

I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)

My thinking is the issue might be on the env file we are passing to conda, I can't find any other diff.
BTW:
@<1523701868901961728:profile|ReassuredTiger98> Can I send a specific wheel with mode debug prints for you to check (basically it will print the conda env YAML it is using)?

3 years ago
0 Running Into A Strange Issue—

Could it be you have old OS environment overriding the configuration file ?
Can you change the IP of the server in the conf file, and make sure it has an effect (i.e. the error changed)?

3 years ago
0 What Sort Of Integration Is Possible With Clearml And Sagemaker? On The Page

Hmm what do you have here?

os.system("cat /var/log/studio/kernel_gateway.log")
one year ago
0 I Am Using `

Many thanks!

4 years ago
0 We Are Facing Performance Issues Of Our Self-Hosted Clearml Server Looking At The Cpu Utilization \ Memory \ Networking We Couldn'T Identify A Bottleneck We Are At The Moment Using ~100 Workers For Some Hpo, And The Main Performance Issues We Observe Are

Hi DepressedChimpanzee34
I think main issue here is slow response time from the API server, I "think" you can increase the number of API server processes, but considering the 16GB, I'm not sure you have the headroom.
At peak usage, how much free RAM so you have on the machine ?

3 years ago
0 Another Question: Is It Possible To Specify In Which Directory To Save All The Files That Clearml-Agent Creates (E.G. Cache Files Or Results Of The Currently Running Experiments)

What do you mean cache files ? Cache is machine specific and is set in the clearml.conf file.
Artifacts / models are uploaded to the files server (or any other object storage solution)

3 years ago
0 Task Struck At

it was uploading fine for most of the day

What do you mean by uploading fine most of the day ? are you suggesting the upload stuck to the GS ? are you seeing the other metrics (scalars console logs etc) ?

2 years ago
0 Question About Pipeline And Long-Waiting Tasks: Say I Want To Generate A Dataset. The Workflow I Have Requires

RoughTiger69 I think this could work, a pseudo example:
` @PipelineDecorator.component(...)
def the_last_step_before_external_stuff():
print("doing some stuff")

@PipelineDecorator.pipeline()
def logic():
the_last_step_before_external_stuff()
if not check_if_data_was_ingested_to_the_system:
print("aborting ourselves")
Task.current_task().abort()
# we will not get here, the agent will make sure we are stopped
sleep(60)
# better safe than sorry
exit(0) `wdyt? (the...

2 years ago
0 Running Into A Strange Issue—

Can you share the clearml.conf ? Maybe something will pop ?

3 years ago
0 Hi Guys, Is It Possible To Spin Up Two Agents On One Gpu? Something Like

Hi JitteryCoyote63 you can bus obviously you should be careful they might both try to allocate more GPU memory than they the HW actually has.
TRAINS_WORKER_NAME=machine_gpu0A trains-agent daemon --gpus 0 --queue default --detached TRAINS_WORKER_NAME=machine_gpu0B trains-agent daemon --gpus 0 --queue default --detached

3 years ago
0 Hi, Can I Run An

RoundMosquito25 you are absolutely correct !

2 years ago
0 Hello! I’M Wondering If There Is An Option To Run A Termination Hook Script

And maybe adding idle time spent without a job to API is not that a bad idea 😉
yes, adding that to the feature list 🙂

What if I write the last active state in an instance tag? This could be a solution…

I love this hack, yes this should just work.
BTW: if you lambda is a for loop that is constantly checking there is no need to actually store "last idle timestamp check as tag", no?

2 years ago
0 Hi, I'M Following The Instructions For

This really makes little sense to me...

Can you send the full clearml-session --verbose console output ?

Something is not working as it should obviously, console output will be a good starting point

2 years ago
0 Hello I'M Running A Local Agent . While Its Running The Task I Get This Error. Any Suggestion? Uccessfully Installed Numpy-1.24.4 Found Pytorch Version Torch==2.0.1 Matching Cuda Version 0 Found Pytorch Version Torchaudio==2.0.2 Matching Cuda Version 0 Er

Can you do the following
Clone the Task you previously sent me the installed packages of, then enqueue the cloned task to the queue the agent with the conda.
Then send me the full log of the task that the agent run

one year ago
Show more results compactanswers