And the agent continue running.
oh just kill al the processes with clearml-agent
in the cmd line
pkill -9 -f clearml-agent
I see, so basically fix old links that are now not accessible? If this is the case you might need to manually change the document on the mongodb running in the backend
GrumpySeaurchin29 you can pass s3 credential for the autoscaler, but all the tasks will have them. Are you saying two diff sets of credentials is the issue, or is it the visibility?
It does not use key auth, instead sets up some weird password and then fails to auth:
AdventurousButterfly15 it ssh Into the container inside the container it sets new daemon with new random very long password
It will Not ssh to the host machine (i.e. the agent needs to run in docker mode, not venv mode), make sense ?
But itβs running in docker mode and it is trying to ssh into the host machine and failing
It is Not sshing to the machine it is sshing directly Into the container.
Notice the port is is sshing to is 10022 which is mapped into the container
- ...that file and the logs of the agent service always say the same thing as before:
Oh in that case you need feel in Your credentials here:
https://github.com/allegroai/clearml-server/blob/5de7c120621c2831730e01a864cc892c1702099a/docker/docker-compose.yml#L137
Basically CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY will let the agent running inside the docker talk to the server itself. Just put your own credentials there as a start, it should solve the issue
No it will not π the closer is closer to the actual print.
That said, I'm sure it would not be complicated to add.
But I have to wonder, this will really create a mess in the console log, so if someone wants it, it will be global (i.e. also in the visible console. not only in the backend), so the case where the console on the machine itself is "clean" but the backend log is full of debug stuff is not clear to me
but I belive it should have work with 0.14.1 as well
Correct
Thanks!
Hmm from here : None
Could it be you do not have privileges to the resource, or that you did not provide credentials ?
Did that autoscaler work before ?
Why would that require refactoring ? Dataset class should take care if it internally ,no?
The reason my_name is a subproject , is that so every version could be a "Task" inside that project , just easier to manage (or at least that was the idea)
@<1556812486840160256:profile|SuccessfulRaven86> is the issue with flask
reproducible ? if so could you open a github issue, so we do not forget to look into it?
Hi @<1610083503607648256:profile|DiminutiveToad80>
I think we will need more context for the log...
but I think there is something wrong with the GCP resource configuration of your autoscaler
Can you send the full autoscaler log and the configuration ?
Hmm there was this one:
https://github.com/allegroai/clearml/commit/f3d42d0a531db13b1bacbf0977de6480fedce7f6
Basically always caching steps (hence the skip), you can install from the main branch to verify this is the issue. an RC is due in a few days (it was already supposed to be out but got a bit delayed)
Hi MelancholyElk85
So the way datasets now work, is they are actually an entity (folder) inside a project , all under TFW hidden .datasets sub project
This is so all data and tasks are both on the same project , but at the same time will not intersect with subprojects by the same name. Does that make sense?
Sorry, I mean a vault on the clearml-server holding the credentials per user, then agent pulls it based on the user, and it is transparent from the user perspective
Hi BurlySeagull48
you mean for the clearml-server ?
It's always the details... Is the new Task running inside a new subprocess ?
basically there is a difference between
remote task spawning new tasks (as subprocesses, or as jobs on remote machine), remote task still running remote task, is being replaced by a spawned task (same process?!)UnevenDolphin73 am I missing a 3rd option? which of these is your case?
p,s. I have a suspicion that there might be a misuse of "Task" here?! What are you considering a Task? (from clearml perspective a Task...
Or did you mean I can couple a short "mini config" with the package and redirect clearml to use this local one (instead of the one at ~/clearml.conf)?
Actually yes, you can set a "fixed" config point to it with ENV variable, then setup per user just the access/secret .
wdyt?
(I was also pointing to the fact you do not have to use clearml-init you can create a simple partial config template and let user just fill in the missing "key"/"secret")
current task fetches the good Task
Assuming you fork the process than the gloabl instance" is passed to the subprocess. Assuming the sub-process was spawned (e.g. POpen) then an environement variable with the Task's unique ID is passed. then when you call the "Task.current_task" it "knows" the Task was already created and it will fetch the state from the clearml-server and create a new Task object for you to work with.
BTW: please use the latest RC (we fixed an issue with exactly this...
Can you try to manually install it and see what you are getting?python3.10 -m pip install /home/boris/.clearml/pip-download-cache/cu117/torch-1.12.1+cu116-cp310-cp310-linux_x86_64.whl
that embed seems to be slightly off with regards to where the link is actually pointing to
I think this is the Slack preview... π
Since you are running in venv mode, adding the OS environment before the clearml-agent, will basically make sure it will propagate to the process itself.
ReassuredTiger98 make sense ?
OmegaConf
is the configuration, the overrides are in the Hyperparameters "Hydra" section
None
WackyRabbit7 I do 'pkill -f trains' but it's the same... If you need to debug and test run with --foreground and just hit ctrl-c to end the process (it will never switch to background...). Helps?
Hi @<1523701132025663488:profile|SlimyElephant79>
I would like to save only the last & best checkpoints and not all of them if possible.
Basically it will mimic the local file system, so if you overwrite the local files it will overwrite the remote model.
You can also disable auto logging, and manually upload the models
In Task.init
pass auto_connect_frameworks
False for the specific framework
see:
[None](https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk/#automatic-lo...
Could it be you have some custom SSL certificate installed, or policy ?
can you get other https sites? (for example your clearml-server)
Also, how do pipelines compare here?
Pipelines are a type of Task, so like Tasks you can clone and enqueue them, or set them as the target of the trigger.
the most flexible solution would be to have some way of triggering the execution of a script in the parent task environment,
This is the exact idea of the TriggerScheduler None
What am I missing here?
Hmm, you are correct
Which means this is some conda issue, basically when installing from env file, conda is not resolving the correct pytorch version π
Not sure why... Could you try to upgrade conda ?
Hi TenseOstrich47 whats the matplotlib version and clearml version you are using ?