Reputation
Badges 1
25 × Eureka!Hi CheekyFox58
If you are running the HPO+training on your own machine, it should work just fine in the Free tier
The HPO with the UI and everything, is designed to run the actual training on remote machines, and I think this makes it a Pro feature.
Hi PunyGoose16 ,
next release includes it (eta after this weekend π )
Now in case I needed to do it, can I add new parameters to cloned experiment or will these get deleted?
Adding new parameters is supported π
Full markdown edit on the project so you can create your own reports and share them (you can also put links to the experiments themselves inside the markdown). Notice this is not per experiment reporting (we kind of assumed maintaining a per experiment report is not realistic)
JitteryCoyote63 what am I missing?
What are the errors you are getting (with / without the envs)
What's the trains-server version ?
GloriousPanda26 Are you getting multiple Tasks or is it a single Task ?
Hi SpotlessLeopard9
I got many tasks that were just hang at the end of the script without ...
I remember this exact issue was fixed with 1.1.5rc0, see here:
https://clearml.slack.com/archives/CTK20V944/p1634910855059900
Can you verify with the latest RC?pip install clearml==1.1.5rc3
Ohh try to add --full-monitoring to the clearml-agent execute
None
I think that listing them all would just clutter up the results tab for that pipeline task
Can you share a screen so we better understand the clutter ?
Also "1000 components" ?! and not using them ? could you expand on how/why?
at means I need to pass a single zip file toΒ
path
Β argument inΒ
add_files
Β , right?
actually the opposite, you pass a folder (of files) to add_files. Then add_files remembers the files location (and pre calculates the hash of the files content). When you call upload it will actually compress the files that changed into a zip file (or files depending on the chunk size), and upload the files to the destination (as specified in the upload call...
Hi @<1625303806923247616:profile|ItchyCow80>
Could you add some prints ? Is it working without the Task.init call? the code looks okay and the - No repository found, message basically says it logs it as a standalone script (which makes sense)
I'm so glad you mentioned the cron job, it would have taken us hours to figure
Hi DepressedChimpanzee34
How do I reproduce the issue ?
What are we expecting to get there ?
Is that a Colab issue or hyper-parameter encoding issue ?
WackyRabbit7 If you have an idea on an interface to shut it down, please feel free to suggest?
If i point directly to the data.yaml the training starts without any problem
what do you mean? how do you know where the extracted file is?
basically:
data_path = Dataset.get(...).get_local_copy()
then you should be able to open your file with open(data_path + "/data.yaml", "rt")
doe that work?
Hi ZealousSeal58
What's the clearml version you are using ?
If there was a "debug mode" for viewing the stack trace before the crash that would've been most helpful...
import traceback traceback.print_stack()
HiΒ SmoggyGoat53
There is a storage limit on the file server (basically 2GB per file limit), thisΒ is the cause of the error.
You can upload the 10GB to any S3 alike solution (or a shared folder). Just set the "output_uri" on the Task (either at Task.init or with Task.output_uri = " s3://bucket ")
Hmm so yes that is true, if you are changing the bucket values you will have to manually also adjust it in grafana. I wonder if there is a shortcut here, the data is stored in Prometheus, and I would rather try to avoid deleting old data, Wdyt?
I can't find out how to pass my custom clearml.conf
Hi @<1544491301435609088:profile|TeenyElk27>
The easiest is to map it into the container in your docker-compose
(map a host clearml.conf into /root/clearml.conf inside the container)
No TB (Tesnorboard) is not enabled.
That explains it π did you manage to get it working ?
Can you see the repo itself ? the commit id ?
From the docs I think what's going on is that the https://opennmt.net/OpenNMT-tf/package/opennmt.Runner.html#opennmt.Runner.train is spinning a new subprocess, and the training itself happens on the subprocess.
If this is the case this will explain the lack of automagic, as the subprocess is lacking the "Task.init" call
wdyt, could that be the case ?
Hmm DepressedChimpanzee34 my bad it seems the loading is done via YAML loader, but the dumping is straight forward str casting...
https://github.com/allegroai/clearml/blob/6e6271fb91f2aeb2aa7a13c6d07d4e635baaa670/clearml/backend_interface/task/task.py#L934
What would you expect to get (BTW "value\blah" is Not a valid string assignment in python as there is no \b escape character, it should be "value\blah" which translates into the text "value\blah")
This is by design, they cannot use the exact same venv because if the code starts creating files/change them it happens inside the venv and might cause them to crash.
That said if you are running with venv cache, the first one will create the venv and the second one will create a copy from the cache.
Seems like a okay clearml.conf file
Notice this is the error:404can you curl to this address ? are you sure you have httpS and not http ? was the dns configured ?
Hi @<1528908687685455872:profile|MassiveBat21>
However
no useful
template
is created for down stream executions - the source code template is all messed up,
Interesting, could you provide the code that is "created", or even better some way to reproduce it ? It sounds like sort of a bug? or maybe a feature support that is missing.
My question is - what is a best practice in this case to be able to run exported scripts (python code not made availa...
UnevenDolphin73 i would use apiclient:
APIClient().projects.edit(project=project_id, system _tags=[])
*I might have a few typos above but that should be the gist