Reputation
Badges 1
25 × Eureka!Task.create will create a new Task (and return an object) but it does not do any auto-magic (like logging the console, tensorboard etc.)
hmm that is odd.
Can you send the full log ?
. I'm trying to run to get a task to run using a specific docker image and to source a bash script before execution of the python script.
Are you running an agent in docker mode ? if so you should be able to see the Output of your bash script first thing in the log
(and it will appear in the docker CMD)
Should work, follow the backup process, and restore into a new machine:
None
That makes total sense.
So right now you can probably use clearml-session to spin a session in any container, add the jupyterhub to the requirements like so:clearml-session --packages jupyterhubThen ssh + run jupyerhub + tunnel port?ssh roo@IP -p 10022 -L 6666:localhost:6666 $ jupyterhubWould that work?
Maybe it is better to add an option to use jupyterhub instead of jupyterlab ?
wdyt?
Tested with two sub folders, seems to work.
Could you please test with the latest RC:pip install clearml==0.17.5rc4
My bad, I worded my question wrong I see,
LOL no worries π
Any chance you have some "debug" leftover in the Pipeline code:
https://github.com/allegroai/clearml/blob/7016138c849a4f8d0b4d296b319e0b23a1b7bd9e/examples/pipeline/pipeline_from_decorator.py#L113
Maybe we should show a warning when we it is being called, or ignore it when running via an agent ...
K8s + clearml-agent integration.
Hmm is this an on-prem k8s cluster?
Does it wok if you remove the Task.init call?
Task.force_requirements_env_freeze()
This might be very brittle, if users are running on a diff OS, or python versions...
I would actually go with:
you like poetry, update your lock file in git you do not use poetry, work on your own branch and delete poetry lock file
wdyt?
MuddySquid7
are you saying that for some reason the models pick the artifacts ? Is that reproducible ? (they are two different things)
Can you see the df.pkl on the Models section of the Task (in the UI) ?
instead of terminating them once they are inactive, so that they could be available immediately when they are needed.
JitteryCoyote63 I think you can increase the IDLE timeout on the autoscaler, and achive the same behavior, no ?
Hi @<1523711619815706624:profile|StrangePelican34>
Hmm, I think this is missing from the docs, let me ping the guys about that π
I see them run reliably (no killed), are they running in service mode?
How do you deploy agents, with the clearml k8s glue ?
Hi @<1684010629741940736:profile|NonsensicalSparrow35>
But the provided command is missing the url target for the curl so it is not complete.
Not sure I followed. did you specify "NEW_ADDRESS" ?
or is it the in both cases the URL is locahost ?
for example train.py & eval.py under the same repo
so that one app I am using inside the Task can use the python packages installed by the agent and I can control the packages using clearml easily
That's the missing part for me, You have all the requiremnts on the Task (that you can fully control), the agent is setting a brand new venv for each Task inside a container (the venv is cahced, and you can also make the agent just use the default python without installing anything). The part where I'm lost is why would you need the path to t...
one of the two experiments for the worker that is running both experiments
So this is the actual bug ? I need some more info on that, what exactly are you seeing?
Hmm, what's the clearml-agent version ?
LuckyRabbit93 We do!!!
looks like a great idea, I'll make sure to pass it along and that someone reply π
Looking at theΒ
supervisor
Β method of the baseΒ
AutoScaler
Β class, where are the worker IDs kept.
Is it in the class attributeΒ
queues
Β ?
Actually the supervisor is passing a fixed prefix, then it asks the clearml-server on workers starting with this name.
This way we can have a fixed init script for all agents, while we still can differentiate them from the other agent instances in the system. Make sense ?
JitteryCoyote63 yes this is very odd, seems like a pypi flop ?!
On the website they do say there is 0.5.0 ... I do not get it
https://pypi.org/project/pytorch3d/#history
, I generate some more graphs with a file calledΒ
graphs.py
Β and want to attach/upload to this training task
Make total sense to use Task.get_task, I just want to make sure that you are aware of all the options, so you pick the correct one for you :)
Hi ShortElephant92
You could get a local copy from the local server, then switch credentials to the hosted server and upload again, would that work?
It's just the print (_ repr _) not showing the datafor w in client.workers.get_all(): print(w.data)
Hi JumpyPig73
Funny enough this is being fixed as we speak π
The main issue is that as you mentioned, ClearML does not "detect" the exit code when os.exit() is called, and this is why it is "missing" the failed test (because as mentioned, all exceptions are caught). This should be fixed in the next RC
Is the code in this "other" repo downloaded to the agent's machine? Or is the component's code pushed to the machine on which the repository is?
Yes this repo is downloaded into the agent, so your code has access to it
How can I ensure that additional tasks arenβt created for a notebook unless I really want to?
TrickySheep9 are you saying two Tasks are created in the same notebook without you closing one of them ?
(Also, how is the git diff warning there with the latest clearml, I think there was some fix related to that)