Reputation
Badges 1
25 × Eureka!Hi @<1523701132025663488:profile|SlimyElephant79>
I would like to save only the last & best checkpoints and not all of them if possible.
Basically it will mimic the local file system, so if you overwrite the local files it will overwrite the remote model.
You can also disable auto logging, and manually upload the models
In Task.init pass auto_connect_frameworks False for the specific framework
see:
[None](https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk/#automatic-lo...
Hi @<1523715429694967808:profile|ThickCrow29>
Is there a way to specify a callback upon an abort action from the user
You mean abort of the entire pipeline?
None
Not at all, we love ideas on improving ClearML.
I do not think there is a need to replace feast, it seems to do a lot, I'm just thinking on integrating it into the ClearML workflow. Do you have a specific use case we can start to work on? Or maybe a workflow that would make sense to implment?
Hi @<1523701066867150848:profile|JitteryCoyote63>
Thank you for bringing it! can you verify with the latest clearml-agent 1.5.3rc2 ?
Β are models technicallyΒ
Task
s and can they be treated as such? If not, how to delete a model permanently (both from the server and from AWS storage)?
When you call Task.delete() it actually goes over a;; the models/artifacts and deletes them from the storage
The wheel you download from pip, for example this one torch-1.11.0-cp38-cp38-manylinux1_x86_64.whl
is actually both CPU and cuda 117
So is there any tutorial on this topic
Dude, we just invented it π
Any chance you feel like writing something in a github issue, so other users know how to do this ?
Guess Iβll need to implement job schedule myself
You have a scheduler, it will pull jobs from the queue by order, then run them one after the other (one at a time)
SlipperyDove40 following on the missing section name, this seems like backwards compatibility issue. Try calling with backwards_compatibility=Falsemy_params = Task.get_parameters(backwards_compatibility=False)This should always add the section name prefix.
So there is a hack for it:CLEARML_OFFLINE_MODE=1 python3 my_main.pyWhich is the same as calling Task.set_offline
Then inside the code After the Task.init call:
` task = Task.init(...)
not sure what the if here is?!
Task.debug_simulate_remote_task(task_id="offline-1") `This will make things act as if this is running remotely , i.e. your logic Task.running_remotely() will be called.
Do notice that in remote mode, all the arguments / data is read from the clearml-server into the cod...
now realise that the ignite events callbacks seem to not be fired
So this is an ignite issue ?
I get the same "white" image in both TB & ClearML π
Hi MiniatureCrocodile39
I would personally recommend the ClearML show π
https://www.youtube.com/watch?v=XpXLMKhnV5k
https://www.youtube.com/watch?v=qz9x7fTQZZ8
Hi EagerOtter28
The agent knows how to do the http->ssh conversion on the fly, in your cleaml.conf (on the agent's machine) set force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/42606d9247afbbd510dc93eeee966ddf34bb0312/docs/clearml.conf#L25
Maybe the only thing to worry about is making sure the IP address is stable, so if k8s replaces the node, you do not have to reconfigure the clients π
Can you please elaborate on the latter point? My jupyterhubβs fully containerized and allows users to select their own containers (from a list i built) at launch, and launch multiple containers at the same time, not sure I follow how toes are stepped on. (edited)
Definitely a great start, usually it breaks on memory / GPU-mem where too many containers on the same machine are eating each others GPU ram (that cannot be virtualized)
If there is new issue will let you know in the new thread
Thanks! I would really like to understand what is the correct configuration
` from time import sleep
from clearml import Task
import tqdm
task = Task.init(project_name='debug', task_name='test tqdm cr cl')
print('start')
for i in tqdm.tqdm(range(100)):
sleep(1)
print('done') `The above example code will output a line every 10 seconds (with the default console_cr_flush_period=10) , can you verify it works for you?
Follow-up; any ideas how to avoid PEP 517 with the auto scaler?
Takes a
long
time to build the wheels
enable venv caching ?
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L116
Hi @<1686547375457308672:profile|VastLobster56>
where are you getting stuck? are you getting any errors ?
JitteryCoyote63 How can I reproduce it quickly?
I would suggest deleting them immediately when they're no longer needed,
This is the idea for the next RC, it will delete them after it is done using π
it handles 2FA if my repo lies in Github and my account needs 2FA to sign in
It does not π
Maybe this one?
https://github.com/allegroai/clearml/issues/448
I think it is already there (i.e. 1.1.1)
PompousParrot44 What is the "working directory" on the experiment itself? and the "script path"?
Based on what you wrote above, in order for it work you should have:
working directory: "."
script path: "-m test.scripts.script"
notice no "--args" and working directory is "." (i.e. the root of the repository)
So the thing is clearml automatically detects the last iteration of the previous run, my assumption you also add it hence the double shift.
SourOx12 could that be it?