Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
GrievingTurkey78
Moderator
34 Questions, 125 Answers
  Active since 10 January 2023
  Last activity one year ago

Reputation

0

Badges 1

119 × Eureka!
0 Votes
13 Answers
869 Views
0 Votes 13 Answers 869 Views
3 years ago
0 Votes
6 Answers
967 Views
0 Votes 6 Answers 967 Views
Hi! I am saving some intermediate .pt files on the experiments and clearml automatically detects them as models, this makes the clearml.model - INFO message ...
3 years ago
0 Votes
4 Answers
902 Views
0 Votes 4 Answers 902 Views
Hi! I am having some problems with a loss after a good amount of training, what would be the best way to log a value to have a better idea of what is happening?
2 years ago
0 Votes
2 Answers
879 Views
0 Votes 2 Answers 879 Views
Hi all! Currently I am trying to create a tool that can perform certain operations on dataset ids, this is a skeleton of what I have in mind (based on the ex...
3 years ago
0 Votes
10 Answers
887 Views
0 Votes 10 Answers 887 Views
I am also experiencing a weird behaviour when running a script using the module flag. For example I run: python -m module.script arg1 arg 2And after the scri...
4 years ago
0 Votes
30 Answers
917 Views
0 Votes 30 Answers 917 Views
Hi! Is there something happening with the ModelCheckpoint callback on tensorflow==2.4.0 ? Using 2.2.0 gave me an input model on the artifacts tab in the GUI 😢
3 years ago
0 Votes
15 Answers
933 Views
0 Votes 15 Answers 933 Views
Hi
Hi 👋 I am trying to set up a trains server on GCP. I followed all the steps listed here https://allegro.ai/docs/deploying_trains/trains_server_gcp/ . I also...
4 years ago
0 Votes
11 Answers
941 Views
0 Votes 11 Answers 941 Views
Hi! Is there a way to run a task without reporting to the server? For example if I want to debug a script by running it locally without it appearing on the s...
3 years ago
0 Votes
2 Answers
841 Views
0 Votes 2 Answers 841 Views
Hi! Regarding the artifact.get_local_copy() method, since there is no way to specify the path where the artifact will be downloaded, I wanted to confirm that...
4 years ago
0 Votes
3 Answers
941 Views
0 Votes 3 Answers 941 Views
Hi! I am trying to run some experiments on an agent I have configured to use the requirements.txt the problem is it only shows Cython on the list of installe...
3 years ago
0 Votes
2 Answers
907 Views
0 Votes 2 Answers 907 Views
Hi ! While restarting the server I got ERROR: for agent-services removal of container 8f1d8539340d6d073eb5b51294f5f5d802048a3614d459b5c4fb1d38a05ce538 is alr...
3 years ago
0 Votes
9 Answers
984 Views
0 Votes 9 Answers 984 Views
Hi! Does ClearML have a way to turn on/off virtual machines depending if there are experiments on queue?
3 years ago
0 Votes
17 Answers
849 Views
0 Votes 17 Answers 849 Views
3 years ago
0 Votes
2 Answers
859 Views
0 Votes 2 Answers 859 Views
Hi! I have the previous trains server configured with multiple experiments; I created it using the gcloud images provided. If I want to update the server to ...
3 years ago
0 Votes
12 Answers
877 Views
0 Votes 12 Answers 877 Views
Hi all! Is there a way for trains to recognize the CLI arguments when using https://github.com/google/python-fire instead of argparse?
4 years ago
0 Votes
2 Answers
905 Views
0 Votes 2 Answers 905 Views
Hi! What would be the way for manually uploading a model? I have intermediate .pt files which I don't want to upload. Is there a way to turn off clearml capt...
3 years ago
0 Votes
2 Answers
935 Views
0 Votes 2 Answers 935 Views
Hi! I changed from trains to clearml and ran some experiments using keras but it seems the metrics are not being tracked automagically, has anyone ran into t...
3 years ago
0 Votes
2 Answers
919 Views
0 Votes 2 Answers 919 Views
I am trying to upgrade from clearml server 0.16 to the newest version but I am getting some errors when spinning up the new containers: WiredTiger error (-31...
3 years ago
0 Votes
3 Answers
924 Views
0 Votes 3 Answers 924 Views
Hi! I have some ClearML agents on GCP and sometimes the instance seems to reboot making the experiment fail and all the progress is lost. What is the best wa...
2 years ago
0 Votes
5 Answers
964 Views
0 Votes 5 Answers 964 Views
Hi! I was taking a look at the https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_cli.html and wanted to know if anyone has used clearml wit...
3 years ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
Hi! If I have a pipeline on gitlab that uses ClearML for some tests is there some way to setup the credentials so that it doesn’t fail?
3 years ago
0 Votes
21 Answers
908 Views
0 Votes 21 Answers 908 Views
Hi! Any idea why clearml fails to detect iteration reporting? ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-fr...
3 years ago
0 Votes
8 Answers
769 Views
0 Votes 8 Answers 769 Views
Hello 👋 I am using a self hosted clearml setup using the requirments file of the project. When I run the task it is failing and I get: Collecting torch==2.0...
one year ago
0 Votes
5 Answers
881 Views
0 Votes 5 Answers 881 Views
Hi, with the upcoming version of Hydra it seems the binding breaks. Specifically in the run_job function the argument order changed from https://github.com/f...
3 years ago
0 Votes
10 Answers
903 Views
0 Votes 10 Answers 903 Views
Hi! I am getting the following error on an agent: /usr/local/bin/python3.8: No module named virtualenv clearml_agent: ERROR: Command '['python3.8', '-m', 'vi...
2 years ago
0 Votes
3 Answers
883 Views
0 Votes 3 Answers 883 Views
Hi! I recently updated my server and my clearml version, now when I set a task to be executed remotely its default state is aborted hence I have to reset and...
3 years ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
Hi! I am currently using Hydra+ClearML and wanted to know if there are still some updates coming. At the moment, if I change the defaults hydra uses from the...
3 years ago
0 Votes
2 Answers
897 Views
0 Votes 2 Answers 897 Views
Hi
Hi AgitatedDove14 ! Regarding the Hydra integration, which pattern should be used? Call the task inside the decorated function? Will this store the parameter...
3 years ago
0 Votes
4 Answers
912 Views
0 Votes 4 Answers 912 Views
Hi! If I have a folder with multiple ckpt files would the manual way to upload them be the following: output_model = OutputModel(task) output_model.update_we...
2 years ago
0 Votes
6 Answers
924 Views
0 Votes 6 Answers 924 Views
Hi! I have some agents on GCP. Lately I have been getting some experiments that simply stop running (no signs that the experiment crashed). Here is a plot th...
3 years ago
Show more results questions
0 Hello

What additional context do you need?

one year ago
0 Hi! Regarding The

Thanks for the info AgitatedDove14 !

4 years ago
0 Hi! I Have Some Agents On Gcp. Lately I Have Been Getting Some Experiments That Simply Stop Running (No Signs That The Experiment Crashed). Here Is A Plot That Shows The Resource Monitoring. Any Ideas On What Could Be Causing This?

Hey CostlyOstrich36 ! I am using clearml==1.1.2 and clearml-agent==1.1.0 . Stopped is not the right word, more like frozen, it just froze at an epoch. The console on the agent shows epoch 33 first batch and the one at the server epoch 32 last batch. The experiment was running for ~6 hours.

3 years ago
0 Hi

The plot is generated and added to tensorboard but seems clearml is not catching it.

2 years ago
0 Hey Everyone- I Have An Issue Started Today With Trains-Agent Which I’M Getting This Error On Startup:

` File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/trains/backend_api/session/token_manager.py", line 72, in _get_token_exp
return jwt.decode(token, verify=False).get('exp', sys.maxsize)
File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/jwt/api_jwt.py", line 113, in decode
decoded = self.decode_complete(jwt, key, algorithms, options, **kwargs)
File "/home/ramon/.trains/venvs-builds/3.7/lib/python3.7/site-packages/jwt/api_jwt.py", line 80, in decode_c...

3 years ago
0 Hey Everyone- I Have An Issue Started Today With Trains-Agent Which I’M Getting This Error On Startup:

I am still getting the error even with the v0.16.3 agent, is there something else we have to do other than updating it?

3 years ago
0 Hello! There Is Great Alternative For Argparse Developed By Facebook For Ml Named

AgitatedDove14 from this thread I understand hydra is not supported and therefore overriding the parameters from the UI wont work, but is there still a way to track and add the parameters to the experiment? Will task.connect_configuration work with the yaml files?

3 years ago
3 years ago
0 Hello! There Is Great Alternative For Argparse Developed By Facebook For Ml Named

Just to make sure get everything right AgitatedDove14 :
We have to define the Task inside the function decorated with the @hydra.main We can modify the parameters that are overridden on UI on : configuration tab -> Args -> overrides -> modify the listAdditional question:
Will the sweep functionality work?

3 years ago
0 Hi! I Am Getting The Following Error On An Agent:

It is the latest RC, I get the following:
` Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults 'pip<20.2' --quiet --json
Pass
Trying pip install: /home/ramon/.clearml/venvs-builds/3.8/task_repository/my-rep.git/requirements.txt
Executing Conda: /opt/conda/bin/conda install -p /home/ramon/.clearml/venvs-builds/3.8 -c pytorch -c conda-forge -c defaults numpy==1.20.3 --quiet --json
Pass
Warning, could not locate PyTorch to...

2 years ago
0 Hi! I Am Getting The Following Error On An Agent:

I have the agent configured to force install requirements.txt

2 years ago
0 Hi! I Am Getting The Following Error On An Agent:

Give me a couple of minutes 🙌

2 years ago
2 years ago
0 Hi, I Was Getting A Really Weird Error Due To Mismatch On The Versions Between The Installed Libraries In My Environment And The Ones Ran In The Node (I Manually Changed The Installed Packages And Everything Worked). How Can I Force Trains To Use Exactly

Pigar is capturing different versions that the ones I have installed on my local machine (not a problem except for one). I just want to force the version of that package in a way that I don’t have to manually change it from the UI for every experiment.

3 years ago
0 Hi, I Was Getting A Really Weird Error Due To Mismatch On The Versions Between The Installed Libraries In My Environment And The Ones Ran In The Node (I Manually Changed The Installed Packages And Everything Worked). How Can I Force Trains To Use Exactly

TimelyPenguin76 I found out its just one package that is causing the error ( cloudpickle breaks everything). Is there a way to use Pigar but force a single package to have a version?

3 years ago
0 Hi, I Was Getting A Really Weird Error Due To Mismatch On The Versions Between The Installed Libraries In My Environment And The Ones Ran In The Node (I Manually Changed The Installed Packages And Everything Worked). How Can I Force Trains To Use Exactly

No, I have all the packages with a version. I just want to know if there is a way to override the requirements versions detected by Pigar when using detect_with_pip_freeze: false . I have locally cloudpickle==1.4.1 but when running the code and sending the task to the node the environment uses cloudpickle==1.6.0 . I have to manually change the version on the UI. Is there a way to force this single package to have a version? Maybe on the requirments.txt or something similar

3 years ago
0 Hi

I enabled both https and http

4 years ago
0 Hi

SuccessfulKoala55 just to let you know: since I opened the link straight from the GCP console it was using https on the address instead of http hence the error. Thanks a lot for your help!

4 years ago
0 Hi

SuccessfulKoala55 on both 8080 and 8008 I get: Safari can’t open the page http://<External IP>:80XX because Safari can’t establish a secure connection to the server http://<External IP>:80XX .

4 years ago
0 Hi

I configured a firewall rule that opened the ports for the instance (not 100% sure if this is the right way) using network tags. Yes, the whole screen is black and no trains logo show up: Safari can’t open the page because the server where this page is located isn’t responding.

4 years ago
0 Hi All! Is There A Way For Trains To Recognize The Cli Arguments When Using

Yes, exactly! Unfortunately I am not so familiar with the internals of the library but I could take a look and figure that out.

4 years ago
0 Hi All! Is There A Way For Trains To Recognize The Cli Arguments When Using

Yes AgitatedDove14 , I added git user name and password on the trains.conf file. On the results tab of the UI the logs clone command shows the SSH command instead of the HTTPS :
Repository cloning failed: Command ['clone', mailto:'git@gitlab.com : ...

4 years ago
0 Hi All! Is There A Way For Trains To Recognize The Cli Arguments When Using

Yes, it’s similar; somewhat more automatic since it detects the classes of functions arguments and generates the CLI. What do you mean by that AgitatedDove14 get all the parameters and use task.connect ?

4 years ago
0 Hi! I Am Trying To Download Data From Gs Using

Yes! How can I help? AgitatedDove14

4 years ago
Show more results compactanswers