Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
ScantChimpanzee51
Moderator
15 Questions, 49 Answers
  Active since 10 January 2023
  Last activity 8 months ago

Reputation

0

Badges 1

49 × Eureka!
0 Votes
4 Answers
1K Views
0 Votes 4 Answers 1K Views
2 years ago
0 Votes
16 Answers
1K Views
0 Votes 16 Answers 1K Views
[Injecting secrets into a ClearML Agent / accessing clearml.conf from within a Task] Hi everyone, we are using the ClearML AWS Autoscaler (still awesome šŸ˜‰ )...
2 years ago
0 Votes
12 Answers
1K Views
0 Votes 12 Answers 1K Views
[Task gets interrupted / aborted / reset when in offline mode] For local testing, we have added a --no-clearml option to our code that sets task.set_offline(...
2 years ago
0 Votes
6 Answers
934 Views
0 Votes 6 Answers 934 Views
[Errors when migrating ClearML Server from AWS to GCP] Hi everyone! As we’re using ClearML quite a bit, we’d love to take it with us when migrating our cloud...
one year ago
0 Votes
18 Answers
1K Views
0 Votes 18 Answers 1K Views
How do I view Debug Samples images in the browser when the output_uri is on Google Cloud Storage ( None )? Unlike for AWS storage, I do not get a popup windo...
one year ago
0 Votes
10 Answers
1K Views
0 Votes 10 Answers 1K Views
[ClearML with Pytorch-based distributed training} Hi everyone! Is the combination of ClearML with torch.distributed.launch or torchrun actively supported? A ...
one year ago
0 Votes
5 Answers
1K Views
0 Votes 5 Answers 1K Views
Hi everyone, quick question: Is there any easy way to get a task's full output directory ? E.g. when I create a task with task = Task.init(..., output_uri=" ...
2 years ago
0 Votes
1 Answers
847 Views
0 Votes 1 Answers 847 Views
Quick question: Is there a way for a task that is executing remotely to find out which ClearML queue it is in or was in?
one year ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
[Potential bug where the script path option is changed for remote runs] Hi everyone! We’re still using ClearML quite a bit, usually by running the first, sma...
one year ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
one year ago
0 Votes
2 Answers
952 Views
0 Votes 2 Answers 952 Views
[Auto scaler / API client does not see tasks in queue] We had used the AWS auto scaler (based on the aws_autoscaler.py script in the repo) and it worked grea...
one year ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
Hi everyone, I’m getting an error during model upload to S3. The error shows up in the console like below and I don’t see any uploaded objects in S3: 2022-10...
2 years ago
0 Votes
4 Answers
991 Views
0 Votes 4 Answers 991 Views
[Caching of environment and storage when using AWS auto scaler] First off : We are aiming to set up ClearML for large-scale DL training for multiple projects...
2 years ago
0 Votes
2 Answers
1K Views
0 Votes 2 Answers 1K Views
[WebUI-based options injection not working] Hey everyone! Since our training repo has gotten quite complex, we configure all setup in an options.yml file whi...
2 years ago
0 Votes
3 Answers
916 Views
0 Votes 3 Answers 916 Views
[Instance AutoScaler for GCP] In case someone else is interested, we have build an AutoScaler for GCP, too. It works similar to the AWS one in the ClearML re...
one year ago
0 [Injecting Secrets Into A Clearml Agent / Accessing

SuccessfulKoala55 AgitatedDove14 So Iā€™ve tried the approach and it does work, however, this of course results in the credentials being visible in the ClearML web interface output, which comes close to just hard-coding them inā€¦
Is there any way to send the secrets safely?
Is there any way to access the clearml.conf file contents from within code? (afaik, the file does not get send over to the container - otherwise I could just yml-read it myselfā€¦)

2 years ago
0 [Injecting Secrets Into A Clearml Agent / Accessing

Although, some correction here: While the secret is indeed hidden in the logs, it is still visible in the ā€œexecutionā€ tab of the experiment, see two screenshots below.
One again I set them with
task.set_base_docker(docker_arguments=["..."])

2 years ago
0 [Injecting Secrets Into A Clearml Agent / Accessing

Hi SuccessfulKoala55 , thanks for getting back to me!
In the docs of Task.set_base_docker() it says ā€œWhen running remotely the call is ignoredā€. Does that mean that this function call is executed when running locally to ā€œrecordā€ the arguments and then when I duplicate the experiment and clone it remote, the call is ignored and the recorded values are used?

2 years ago
0 [Injecting Secrets Into A Clearml Agent / Accessing

Hey guys, really appreciating the help here!
So what I meant by ā€œit does workā€ is that the environment variables go through to the container, I can use them there, everything runs.

The remaining problem is that this way, they are visible in the ClearML web UI which is potentially unsafe / bad practice, see screenshot below.

2 years ago
0 [Injecting Secrets Into A Clearml Agent / Accessing

Sorry to ask again, but the values are still showing up in the WebUI console logs this way (see screenshot.)
Here is the config that I paste into the EC2 Autoscaler Setup:
` agent {
extra_docker_arguments: ["-e AWS_ACCESS_KEY_ID=XXXXXX", "-e AWS_SECRET_ACCESS_KEY=XXXXXX"]

hide_docker_command_env_vars {
    enabled: true
    extra_keys: ["AWS_SECRET_ACCESS_KEY"]
    parse_embedded_urls: true
}    

} `Never mind, it came from setting the options wrong, it has to be ...

2 years ago
0 [Injecting Secrets Into A Clearml Agent / Accessing

Ahhh, ok got it! Thanks šŸ‘

2 years ago
0 Is It Possible To Run Multiple Agent On Ec2 Machines Started By The Autoscaler? Or Have The One Agent Run Multiple Queue Jobs At Once? E.G. Having The Autoscaler Start 1X P3.8Xlarge (4 Gpu) On Aws Might Be Better Than 4X P3.2Xlarge (1 Gpu) In Terms Of Ava

Yes totally, but weā€™ve been having problems of getting these GPUs specifically (even manually in the EC2 console and across regions), so I thought maybe itā€™s easier to get one big one than many small ones, but Iā€™ve never actually checked if that is true šŸ™‚ Thanks anyhow!

one year ago
0 [Injecting Secrets Into A Clearml Agent / Accessing

Wonā€™t they be printed out as well in the Web UI? That shows the full Docker command for running the task rightā€¦

2 years ago
0 [Injecting Secrets Into A Clearml Agent / Accessing

That was the missing piece - thank you!
Awesome to all the details you have considered in ClearML šŸ˜‰

2 years ago
0 [Task Gets Interrupted / Aborted / Reset When In Offline Mode] For Local Testing, We Have Added A

Hi AgitatedDove14 , so it took some time but Iā€™ve finally managed to reproduce. The issue seems to be related to writing images via Tensorboard:
` from torch.utils.tensorboard import SummaryWriter
import torch
from clearml import Task, Logger

if name == "main":
task = Task.init(project_name="ClearML-Debug", task_name="[Mac] TB Logger, offline")
tb_logger = SummaryWriter(log_dir="tb_logger/demo/")
image_tensor = torch.rand(256, 256, 3)
for iter in range(10):
t...

2 years ago
0 [Task Gets Interrupted / Aborted / Reset When In Offline Mode] For Local Testing, We Have Added A

It might be broken for me, as I said the program works without the offline mode but gets interrupted and shows the results from above with offline mode. But there might be another issue in between of course - any idea how to debug?
The environment variable is good to know, I will try with that as well and report back.

2 years ago
0 [Auto Scaler / Api Client Does Not See Tasks In Queue]

Hi @<1523701087100473344:profile|SuccessfulKoala55> , sorry there was a mistake on my end - clearml.conf pointed to the wrong URL šŸ™ˆ

one year ago
0 [Errors When Migrating Clearml Server From Aws To Gcp]

@<1523701070390366208:profile|CostlyOstrich36> thank you, now everything works so far!
Last thing: Is there any way to change all the links in the new ClearML server such that an artifact that was previous under s3://ā€¦ is now taken from gs://ā€¦ ? The actual data is already available under the gs:// link of course

one year ago
0 Hi Everyone, Quick Question: Is There Any Easy Way To

Unfortunately not, task.data.output just contains <tasks.Output: { "destination": " s3://some_bucket " }> and when I convert task.data to a string and search for the desired uri, I cannot find it either.
But on the other hand, putting the url together from its name, id, etc. seems to work - it might be a little unsafe if the task gets re-named or something, but otherwise it should be fine.

2 years ago
0 [Clearml With Pytorch-Based Distributed Training} Hi Everyone! Is The Combination Of Clearml With

When running on our bigger research repository which includes saving checkpoints and uploading to S3, the training ends with errors as shown below and a Killed message for the main process (I do not abort the main process manually):

2023-01-26 17:37:17,527 INFO: Save the latest model.
2023-01-26 17:37:19,158 - clearml.storage - INFO - Starting upload: /tmp/.clearml.upload_model_cvqpor8r.tmp => glass-clearml/RealESR/Glass-ClearML Demo/[Lambda] FMEN distributed check, v10 fileserver u...
one year ago
0 [Potential Bug Where The

Hi John, thanks for getting back to me!
So it shows up in the UI like shown below. It happens both when ā€œrecordingā€ the local run on Mac and on Linux.

one year ago
0 [Injecting Secrets Into A Clearml Agent / Accessing

Yes for example, or some other way to get credentials over to the container safely without them showing up in the checked-in code or web UI

2 years ago
0 Hi Everyone, Quick Question: Is There Any Easy Way To

I actually wanted to load a specific artifact, but didnā€™t think of looking through the tasks output models. I have now changed to that approach which feels much safer, so we should be all done here. Thanks!

2 years ago
0 [Task Gets Interrupted / Aborted / Reset When In Offline Mode] For Local Testing, We Have Added A

I meant maybe me activating offline mode, somehow changes something else in the runtime and that in turn leads to the interruption. Let me try to build a minimal reproducible version šŸ™‚

2 years ago
0 [Clearml With Pytorch-Based Distributed Training} Hi Everyone! Is The Combination Of Clearml With

So my own repo Iā€™m launching with either
torchrun --nproc_per_node 2 --standalone --master_addr 127.0.0.1 --master_port 29500 -m http://my_folder.my _script --some_option
or
python3 -m torch.distributed.launch --nproc_per_node 2 --master_addr 127.0.0.1 --master_port 29500 -m http://my_folder.my _script --some_option

one year ago
0 [Errors When Migrating Clearml Server From Aws To Gcp]

More stack trace:

clearml-elastic   | ElasticsearchException[failed to bind service]; nested: AccessDeniedException[/usr/share/elasticsearch/data/nodes];
clearml-elastic   | Likely root cause: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes
clearml-elastic   |     at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
clearml-elastic   |     at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
clearml-el...
one year ago
0 [Clearml With Pytorch-Based Distributed Training} Hi Everyone! Is The Combination Of Clearml With

AgitatedDove14 maybe to come at this from a broader angle:
Is ClearML combined with DataParallel or DistributedDataParallel officially supported / should that work without many adjustments? If so, would it be started via python ... or via torchrun ... ? What about remote runs, how will they support the parallel execution? To go even deeper, what about the machines started via ClearML Autoscaler? Can they either run multiple agents on them and/or start remote distribu...

one year ago
0 [Clearml With Pytorch-Based Distributed Training} Hi Everyone! Is The Combination Of Clearml With

Sorry that these issues go quite deep and chaotic - we would appreciate any help or ideas you can think of!

one year ago
0 [Errors When Migrating Clearml Server From Aws To Gcp]

To recap, the server started up on GCP as expected before migrating the data over. The migration was done by

  • deleting the current data sudo rm -fR /opt/clearml/data/*
  • unpacking the backup sudo tar -xzf ~/clearml_backup_data.tgz -C /opt/clearml/data
  • setting permissions sudo chown -R 1000:1000 /opt/clearml
one year ago
0 [Task Gets Interrupted / Aborted / Reset When In Offline Mode] For Local Testing, We Have Added A

By the way, if we donā€™t wrap other calls in is_offline() we get errors like ā€œDateTime object is not serializableā€, but thatā€™s a secondary issue.

2 years ago
0 [Webui-Based Options Injection Not Working] Hey Everyone! Since Our Training Repo Has Gotten Quite Complex, We Configure All Setup In An

Well duh, now it makes total sense! Should have checked docs or examples more closely šŸ™
Yes if that works reliably then I think that option could make sense, it would have made things somewhat easier in my case - but this is just as good.

2 years ago
0 [Caching Of Environment And Storage When Using Aws Auto Scaler]

Ok, I re-checked and saw that the data was indeed cached and re-loaded - maybe I waited a little too long last time and it was already a new instance. Awesome implementation guys!

2 years ago
0 [Clearml With Pytorch-Based Distributed Training} Hi Everyone! Is The Combination Of Clearml With

Ok great! I will debug starting with a simpler training script.
Just as a last question, is torchrun also supported rather than the (now deprecated but still usable) torch.distributed.launch ?

one year ago
0 [Plot Not Showing Up In Ui When Setting File_Server To S3 Bucket] As A Somewhat In Depth Question, We’Ve Set Our Output_Uri And File_Server To An S3 Bucket To Prevent The Server From Running Out Of Space As Discussed In This Message. However, I’Ve Noticed

Yes, when the WebUI prompted me for them. They also seem to work since images in Debug Samples (also in S3) show up after I entered them.
Also, I can see that the plot is also saved in Debug Samples after explicit reporting, even though I donā€™t set report_interactive=False

2 years ago
Show more results compactanswers