Reputation
Badges 1
25 × Eureka!hmm that is odd.
Can you send the full log ?
Should work, follow the backup process, and restore into a new machine:
None
That makes total sense.
So right now you can probably use clearml-session to spin a session in any container, add the jupyterhub to the requirements like so:clearml-session --packages jupyterhubThen ssh + run jupyerhub + tunnel port?ssh roo@IP -p 10022 -L 6666:localhost:6666 $ jupyterhubWould that work?
Maybe it is better to add an option to use jupyterhub instead of jupyterlab ?
wdyt?
Tested with two sub folders, seems to work.
Could you please test with the latest RC:pip install clearml==0.17.5rc4
My bad, I worded my question wrong I see,
LOL no worries π
Any chance you have some "debug" leftover in the Pipeline code:
https://github.com/allegroai/clearml/blob/7016138c849a4f8d0b4d296b319e0b23a1b7bd9e/examples/pipeline/pipeline_from_decorator.py#L113
Maybe we should show a warning when we it is being called, or ignore it when running via an agent ...
K8s + clearml-agent integration.
Hmm is this an on-prem k8s cluster?
Task.force_requirements_env_freeze()
This might be very brittle, if users are running on a diff OS, or python versions...
I would actually go with:
you like poetry, update your lock file in git you do not use poetry, work on your own branch and delete poetry lock file
wdyt?
so that one app I am using inside the Task can use the python packages installed by the agent and I can control the packages using clearml easily
That's the missing part for me, You have all the requiremnts on the Task (that you can fully control), the agent is setting a brand new venv for each Task inside a container (the venv is cahced, and you can also make the agent just use the default python without installing anything). The part where I'm lost is why would you need the path to t...
Hmm, what's the clearml-agent version ?
LuckyRabbit93 We do!!!
Looking at theΒ
supervisor
Β method of the baseΒ
AutoScaler
Β class, where are the worker IDs kept.
Is it in the class attributeΒ
queues
Β ?
Actually the supervisor is passing a fixed prefix, then it asks the clearml-server on workers starting with this name.
This way we can have a fixed init script for all agents, while we still can differentiate them from the other agent instances in the system. Make sense ?
JitteryCoyote63 yes this is very odd, seems like a pypi flop ?!
On the website they do say there is 0.5.0 ... I do not get it
https://pypi.org/project/pytorch3d/#history
, I generate some more graphs with a file calledΒ
graphs.py
Β and want to attach/upload to this training task
Make total sense to use Task.get_task, I just want to make sure that you are aware of all the options, so you pick the correct one for you :)
Hi JumpyPig73
Funny enough this is being fixed as we speak π
The main issue is that as you mentioned, ClearML does not "detect" the exit code when os.exit() is called, and this is why it is "missing" the failed test (because as mentioned, all exceptions are caught). This should be fixed in the next RC
do you have your Task.init call inside the "train.py" script ? (and if you do, what are you getting in the Execution tab of the task) ?
Can you share the storagemanager usage, and error you are getting ?
Hi ResponsiveCamel97
Let me explain how it works, essentially it creates a new venv inside the docker, inheriting all the packages form the main system packages.
This allows it to use the installed packages if the version match, and upgrade/change if you need, all without the need to rebuild a new container. Make sense ?
One last question: Is it possible to set the pip_version task-dependent?
no... but why would it matter on a Task basis ? (meaning what would be a use case to change the pip version per Task)
My bad I wrote refresh and then edited it to the correct "reload" π
Yes you can drag it in the UI :) it's a new feature in v1
Is this a common case? maybe we should change the run_pipeline_steps_locally argument to False?
(The idea of run_pipeline_steps_locally=True is that it will be easier to debug the entire pipeline on the same machine)
SoggyFrog26 there is a full pythonic interface, why don't you use this one instead, much cleaner π
CheerfulGorilla72
yes, IP-based access,
hmm so this is the main downside of using IP based server, the links (debug images, models, artifacts) store the full URL (e.g. http://IP:8081/ http://IP:8081/... ) This means if you switched IP they will no longer work. Any chance to fix the new server to the old IP?
(the other option is somehow edit the DB with the links, I guess doable but quite risky)
So it seems decorator is simply the superior option?
Kind of yes π
In which case would we use add_task() option?
When you have existing Tasks, and the piping is very straight forward (i.e. input / output in the code is basically referencing other Tasks/artifacts, and there is no real need to do any magic for serializing/deserializing data between steps
CheerfulGorilla72 could it be the server address has changed when migrating ?
Regulatory reasons and proprietary data is what I had in mind. We have some projects that may need to be fully self hosted in the end
If this is the case then, yes do self-hosted, or talk to clearml sales to get the VPC option, but SaaS is just not the right option
I might take a look at it when I get a chance but I think I'd have to see if ClearML is a good fit for our use case before I can justify the commitment
I hope it is π
Is it possible to do something so that the change of the server address is supported and the pictures are pulled up on the new server from the new server?
The link itself (full link) is stored inside the server. Can I assume the access is IP based not host based (i.e. dns) ?