Reputation
Badges 1
25 × Eureka!So I shouldnβt even need to call theΒ
task.set_initial_iteration
Β function
I think just removing this call should solve it, I think that what's going on is that this is called twice (once internal once manually by your code)
UnsightlyShark53 Awesome, the RC is still not available on pip, but we should have it in a few days.
I'll keep you posted here :)
One last question: Is it possible to set the pip_version task-dependent?
no... but why would it matter on a Task basis ? (meaning what would be a use case to change the pip version per Task)
(as i see the services worker is only in the services-queue, and not my default queue (where my other servers/workers are)
So basically the service-mode is just a flag passed to the agent, and the services queue is the name of the queue it will pull from.
If i want a normal worker also
You can just add another section to the docker-compose, or run it manually after you spin the docker-compose.
LazyFox65 wdyt ?
Should pass only_published:
https://github.com/allegroai/clearml/blob/071caf53026330f3bb8019ee5db3d039562072f3/clearml/model.py#L444
This is exactly what I did here, and it is working π
https://demoapp.demo.clear.ml/projects/0e919ea1cc5c499b99e1ab85004b6e97/experiments/887edef09d4549e88b829a34c87d4d5b/output/execution
in my repo I maintain a bash script to setup a separate python env.
Hmm interesting, now I have to wonder what is the difference ? meaning why doesn't the agent build a similar one based on the requirements ?
It's the same but done from outside, you want the same and "offline" as well right?
Hi DefeatedCrab47
You mean by trains-agent, or accumulated over all experiences ?
simply record the type of each argument when you store it, and keep it in the database, unbeknownst to the user, what do you say?
This is now supported, but then you still need to flatten the dict.
Maybe we can just support "empty_dict/new_value = 42" if the original was "empty_dict = {}"
WDYT?
I think latest:
clearml==1.17.0
matplotlib==3.6.2
shap==0.46.0
Python 3.10
This will set more time before the timeout right?
Correct.
task.freeze_monitor()
download()
task.defrost_monitor()
Currently there isn't, but that's a good ides.
What would be the argument of using it vs increasing the timeout ?
btw: setting the resource timeout to 99999 will basically mean that it will wait until the first reported iteration, Not that it will just sleep for 99999sec π
(this is the part that is not in the background, so if the epoch is short it might have an effect)
I think this was the issue: None
And that caused TF binding to skip logging the scalars and from that point it broke the iteration numbering and so on.
Okay I found it, this is due to the fact the newer versions are sending the events/images in a subprocess (it used to be a thread).
The creation of the object is done on he main process, updating file index (round robin manner), but the check itself, happens on the subprocess., which is not "aware" of the used indexes (i.e. it is always 0, hence when exceeding the history side, it skips it)
Can you fix locally, just to verify ?
help_models is a dir in the git
And the git is registered on the experiment correctly ?
Sorry, you are correct this is where the json is created:
https://github.com/huggingface/transformers/blob/040283170cd559b59b8eb37fe9fe8e99ff7edcbc/src/transformers/feature_extraction_utils.py#L470
other links are the function calling it. make sense ?
Hi ShallowArcticwolf27
However, the AMI for version 0.16.1 has the following docker-compose file
I think we moved the docker-compose yaml when we upgraded from trains to clearml. Any reason your are installing the old docker-compose ?
So you want these two on two different graphs ?
Hi @<1668065560107159552:profile|VivaciousPenguin20>
I think you are looking at the wrong experiment, this is a 3 year old experiment ? this does not seem to be your currently executed experiment, right?
StorageHelper is used internally.
I'll make sure we remove it from the examples/docs
Hmm that is odd, could it be you are changing the sys.path ?
(What I'm assuming is happening is that it detects the packages in the PYTHONPATH and for some reason the order is different so it finds the "system" package before the "venv" package, hence the incorrect version)
Seems correct.
I'm assuming something is wrong with the key/secret quoting ?!
Could you generate another one and test it ?
(you can have multiple key/secretes on the same user)
What about output_uri?
If you are using StorageManager directly, output_uri is not relevant
Correct, (if this is running on k8s it is most likely be passed via env variables , CLEARML_WEB_HOST etc,)
Done π
Basically try with the latest RC π
pip install trains 0.15.2rc0