Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
ScantChimpanzee51
Moderator
15 Questions, 49 Answers
  Active since 10 January 2023
  Last activity 5 months ago

Reputation

0

Badges 1

49 × Eureka!
0 Votes
3 Answers
736 Views
0 Votes 3 Answers 736 Views
[Instance AutoScaler for GCP] In case someone else is interested, we have build an AutoScaler for GCP, too. It works similar to the AWS one in the ClearML re...
one year ago
0 Votes
4 Answers
815 Views
0 Votes 4 Answers 815 Views
[Caching of environment and storage when using AWS auto scaler] First off : We are aiming to set up ClearML for large-scale DL training for multiple projects...
one year ago
0 Votes
5 Answers
876 Views
0 Votes 5 Answers 876 Views
Hi everyone, quick question: Is there any easy way to get a task's full output directory ? E.g. when I create a task with task = Task.init(..., output_uri=" ...
one year ago
0 Votes
1 Answers
682 Views
0 Votes 1 Answers 682 Views
Quick question: Is there a way for a task that is executing remotely to find out which ClearML queue it is in or was in?
one year ago
0 Votes
2 Answers
857 Views
0 Votes 2 Answers 857 Views
[Potential bug where the script path option is changed for remote runs] Hi everyone! We’re still using ClearML quite a bit, usually by running the first, sma...
one year ago
0 Votes
16 Answers
855 Views
0 Votes 16 Answers 855 Views
[Injecting secrets into a ClearML Agent / accessing clearml.conf from within a Task] Hi everyone, we are using the ClearML AWS Autoscaler (still awesome 😉 )...
one year ago
0 Votes
6 Answers
748 Views
0 Votes 6 Answers 748 Views
[Errors when migrating ClearML Server from AWS to GCP] Hi everyone! As we’re using ClearML quite a bit, we’d love to take it with us when migrating our cloud...
one year ago
0 Votes
18 Answers
940 Views
0 Votes 18 Answers 940 Views
How do I view Debug Samples images in the browser when the output_uri is on Google Cloud Storage ( None )? Unlike for AWS storage, I do not get a popup windo...
one year ago
0 Votes
4 Answers
903 Views
0 Votes 4 Answers 903 Views
one year ago
0 Votes
2 Answers
756 Views
0 Votes 2 Answers 756 Views
[Auto scaler / API client does not see tasks in queue] We had used the AWS auto scaler (based on the aws_autoscaler.py script in the repo) and it worked grea...
one year ago
0 Votes
2 Answers
894 Views
0 Votes 2 Answers 894 Views
[WebUI-based options injection not working] Hey everyone! Since our training repo has gotten quite complex, we configure all setup in an options.yml file whi...
one year ago
0 Votes
12 Answers
957 Views
0 Votes 12 Answers 957 Views
[Task gets interrupted / aborted / reset when in offline mode] For local testing, we have added a --no-clearml option to our code that sets task.set_offline(...
one year ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
Hi everyone, I’m getting an error during model upload to S3. The error shows up in the console like below and I don’t see any uploaded objects in S3: 2022-10...
one year ago
0 Votes
3 Answers
1K Views
0 Votes 3 Answers 1K Views
one year ago
0 Votes
10 Answers
917 Views
0 Votes 10 Answers 917 Views
[ClearML with Pytorch-based distributed training} Hi everyone! Is the combination of ClearML with torch.distributed.launch or torchrun actively supported? A ...
one year ago
0 [Clearml With Pytorch-Based Distributed Training} Hi Everyone! Is The Combination Of Clearml With

Ok great! I will debug starting with a simpler training script.
Just as a last question, is torchrun also supported rather than the (now deprecated but still usable) torch.distributed.launch ?

one year ago
0 How Do I View Debug Samples Images In The Browser When The Output_Uri Is On Google Cloud Storage (

@<1523701070390366208:profile|CostlyOstrich36> , you mean the ClearML server needs access to Cloud Storage in its clearml.conf file?
Just tried it by creating a ~/clearml.conf file and setting the entry as below - unfortunately the same result. I’ve re-started the docker-compose of course.

Did I miss something here?

    google.storage {
        credentials_json: "/home/.../my-crendetials.json"
    }
one year ago
one year ago
0 [Instance Autoscaler For Gcp]

Hi Jake, yes I’d love to! Just a question: how clean and complete does the example need to be? For example, this code relies on you building a correct Machine Image on GCP (which is somewhat unrelated to ClearML) and it does not get the logs from the agent instances - is that still good enough?

one year ago
0 [Webui-Based Options Injection Not Working] Hey Everyone! Since Our Training Repo Has Gotten Quite Complex, We Configure All Setup In An

Well duh, now it makes total sense! Should have checked docs or examples more closely 🙏
Yes if that works reliably then I think that option could make sense, it would have made things somewhat easier in my case - but this is just as good.

one year ago
0 How Do I View Debug Samples Images In The Browser When The Output_Uri Is On Google Cloud Storage (

Ok I see, that is what I thought. But do you have any idea why I am not seeing these images? I am logged into my Gmail account and into the Google Cloud Console and can access both in another tab of the same browser. Am I missing something here?

one year ago
0 [Plot Not Showing Up In Ui When Setting File_Server To S3 Bucket] As A Somewhat In Depth Question, We’Ve Set Our Output_Uri And File_Server To An S3 Bucket To Prevent The Server From Running Out Of Space As Discussed In This Message. However, I’Ve Noticed

Yes, when the WebUI prompted me for them. They also seem to work since images in Debug Samples (also in S3) show up after I entered them.
Also, I can see that the plot is also saved in Debug Samples after explicit reporting, even though I don’t set report_interactive=False

one year ago
0 [Errors When Migrating Clearml Server From Aws To Gcp]

More stack trace:

clearml-elastic   | ElasticsearchException[failed to bind service]; nested: AccessDeniedException[/usr/share/elasticsearch/data/nodes];
clearml-elastic   | Likely root cause: java.nio.file.AccessDeniedException: /usr/share/elasticsearch/data/nodes
clearml-elastic   |     at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
clearml-elastic   |     at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
clearml-el...
one year ago
0 [Errors When Migrating Clearml Server From Aws To Gcp]

To recap, the server started up on GCP as expected before migrating the data over. The migration was done by

  • deleting the current data sudo rm -fR /opt/clearml/data/*
  • unpacking the backup sudo tar -xzf ~/clearml_backup_data.tgz -C /opt/clearml/data
  • setting permissions sudo chown -R 1000:1000 /opt/clearml
one year ago
0 [Errors When Migrating Clearml Server From Aws To Gcp]

@<1523701070390366208:profile|CostlyOstrich36> thank you, now everything works so far!
Last thing: Is there any way to change all the links in the new ClearML server such that an artifact that was previous under s3://… is now taken from gs://… ? The actual data is already available under the gs:// link of course

one year ago
0 How Do I View Debug Samples Images In The Browser When The Output_Uri Is On Google Cloud Storage (

I’m on Safari actually, but I just checked on Chrome (which shows this unsecure connection indicator) and images are activated. Might it still be due to non-HTTPS connection? We should get on that anyhow

one year ago
0 How Do I View Debug Samples Images In The Browser When The Output_Uri Is On Google Cloud Storage (

Yes and yes - is that the issue and it might likely go away if we host it via HTTPS?

one year ago
0 [Clearml With Pytorch-Based Distributed Training} Hi Everyone! Is The Combination Of Clearml With

So my own repo I’m launching with either
torchrun --nproc_per_node 2 --standalone --master_addr 127.0.0.1 --master_port 29500 -m http://my_folder.my _script --some_option
or
python3 -m torch.distributed.launch --nproc_per_node 2 --master_addr 127.0.0.1 --master_port 29500 -m http://my_folder.my _script --some_option

one year ago
0 [Clearml With Pytorch-Based Distributed Training} Hi Everyone! Is The Combination Of Clearml With

Hi @<1523701205467926528:profile|AgitatedDove14> , so I’ve managed to reproduce a bit more.
When I run very basic code via torchrun or torch.distributed.run then multiple ClearML tasks are created and visible in the UI (screenshot below). The logs and scalars are not aggregated but the task of each rank reports its own.

If however I branch out via torch.multiprocessing like below, everything works as expected. The “script path” just shows the single python script, all logs an...

one year ago
0 [Clearml With Pytorch-Based Distributed Training} Hi Everyone! Is The Combination Of Clearml With

Results of a bit more investigation:

The ClearML example does use the Pytorch dist package but none of the DistributedDataParallel functionality, instead, it reduces gradients “manually”. This script is also not prepared for torchrun as it launches more processes itself (w/o using the multiprocessing of Python or Pytorch.)

When running a simple example (code attached...

one year ago
0 [Caching Of Environment And Storage When Using Aws Auto Scaler]

Ok, I re-checked and saw that the data was indeed cached and re-loaded - maybe I waited a little too long last time and it was already a new instance. Awesome implementation guys!

one year ago
Show more results compactanswers