Task.debug_simulate_remote_task
simulates the Task being executed by the agent (basically same behaviour, only local). the argument it gets is the Task ID (string).
The to see how it works is to run the code once (no debug_simulate call), get the Task ID we created, then rerun with the debug_simulate_remote_task
passing the previous Task.ID
Make sense ?
Is trains-agent using docker-mode or virtual-env ?
Please attach the log 🙂
Hi MelancholyElk85
Can I manually deleteÂ
.zip
 files with datasets inÂ
.clearml/cache/storage_manager/datasets
 directory?
Yes, you can. I "think" the .zip is stored for easier access, but you can delete it, as long as the "extracted" folder exists, it should be fine.
If I try to connect a dictionary of typeÂ
dict[str, list]
 withÂ
task.connect
, when retrieving this dictionary with
Wait, this should work out of the box, do you have any specific example?
Still feels super hacky tho, think it would be nice to have a simplier way or atleast some nice documentationÂ
YES you are absolutely correct, we should add it to the Task interface.
Any chance you add a GitHub issue so we do not forget ?
Hi @<1657918724084076544:profile|EnergeticCow77>
Can I launch training with HugginFaces accelerate package using multi-gpu
Yes,
It detects torch distributed but I guess I need to setup main task?
It should 🤞
Under the execution Tab script path, you should see something like -m torch.distributed.launch ...
Hi @<1730033904972206080:profile|FantasticSeaurchin8>
Is this only relates to this
https://github.com/coqui-ai/Trainer/issues/7
Or is it a clearml sdk issue?
However, SNPE performs quantization with precompiled CLI binary instead of python library (which also needs to be installed). What would be the pipeline in this case?
I would imagine a container with preinstalled SNPE compiler / quantizer, and a python script triggering the process ?
one more question: in case of triggering the quantization process, will it be considered as separate task?
I think this makes sense, since you probably want a container with the SNE environment, m...
Please let me know what you find 🤞
Hi @<1610083503607648256:profile|DiminutiveToad80>
<h1>Request Entity Too Large</h1>
What's the size of the file? how are you running your clearml-server?
Hmm can you test with the latest RC?pip install clearml==0.17.6rc1
Also, how would one ensure immutability ?
I guess this is the big question, assuming we "know" a file was changed, this will invalidate all versions using it, this is exactly why the current implementation stores an immutable copy. Or are you suggesting a smarter "sync" function ?
Hi ElegantCoyote26
If there is, it will have to be using the docker-mode, but I do not think this is actually possible because this is not a feature of docker. It is possible to do on k8s, but that's a diff level of integration 🙂
EDIT:
FYI we do support k8s integration
Regrading the missing packages, you might want to test with:force_analyze_entire_repo: false
https://github.com/allegroai/trains/blob/c3fd3ed7c681e92e2fb2c3f6fd3493854803d781/docs/trains.conf#L162
Or if you have a full venv you like to store instead:
https://github.com/allegroai/trains/blob/c3fd3ed7c681e92e2fb2c3f6fd3493854803d781/docs/trains.conf#L169
BTW:
What is the missed package?
Hi MortifiedCrow63 , thank you for pinging! (seriously greatly appreciated!)
See here:
https://github.com/googleapis/python-storage/releases/tag/v1.36.0
https://github.com/googleapis/python-storage/pull/374
Can you test with the latest release, see if the issue was fixed?
https://github.com/googleapis/python-storage/releases/tag/v1.41.0
Hi DeliciousBluewhale87
My theory is that the clearml-agent is configured correctly (which means you see it in the clearml-server). The issue (I think) is that the Task itself (running inside the docker) is missing the configuration. The way the agent passes the configuration into the docker is by mapping a temporary configuration file into the docker itself. If the agent is running bare-metal, this is quite straight forward. If the agent is running on k8s (or basically inside a docker) th...
SmarmyDolphin68 okay what's happening is the process exists before the actual data is being sent (report_matplotlib_figure is an async call, and data is sent in the background)
Basically you should just wait for all the events to be flushedtask.flush(wait_for_uploads=True)
That said, quickly testing it it seems it does not wait properly (again I think this is due to the fact we do not have a main Task here, I'll continue debugging)
In the meantime you can just dosleep(3.0)
And it wil...
In the Task log itself it will say the version of all the packages, basically I wonder maybe it is using an older clearml version, and this is why I cannot reproduce it..
Thanks PompousBaldeagle18 !
Which software you used to create the graphics?
Our designer, should I send your compliments 😉 ?
You should add which tech is being replaced by each product.
Good point! we are also missing a few products from the website, they will be there soon, hence the "soft launch"
Hi HarebrainedBear62
What's the type of data ?
This is odd... can you post the entire trigger code ?
also what's the clearml version?
Just to clarify, where do I run the second command?
Anywhere just open a python console and import the offline task:from trains import TaskTask.import_offline_session('./my_task_aaa.zip')
Related, how to I specify in my code the cache_dir where the zip is saved?
This is the Trains cache folder, you can set it in the trains.conf file:
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/docs/trains.conf#L24
I think so, when you are saying "clearml (bash script..." you basically mean, "put my code + packages + and run it" , correct ?
Hi EnchantingWorm39
Great question!
Regrading the data management, I know the enterprise edition has full support for unstructured data, and we plan to soon have a solution for structured data as part of the open source (soon= hopefully in a month time)
Regrading model serving, I know you can integrate with TFServing or seldon with very little effort (usually the challenge is creating triggers etc, but but in most cases this is custom code anyhow 🙂 )
I do not have experience with Cortex/B...
Hi, Is there a way to stop a clearml-agent from within an experiment?
It is possible but only in the paid tier (it needs backend support for that) 😞
My use case it: in a spot instance marked for termination after 2 mins by aws
Basically what you are saying is you want the instance to spin down after the job is completed, correct?