Reputation
Badges 1
25 × Eureka!Hi PanickyAnt52
hi, is there a way to get back the pipeline object when given a pipeline id?
Yes basically this is a specific type of Task, anything you stored on it can be accessed via the Task object, i.e. pipeline_task=Task.get_task(pipeline_id)
I'm curious, how would you use it?
BTW: since pipeline is also a Task you can have a pipeline launch a step that is a pipeline by its own
Oh, so the pipeline basically makes itself their parent, this means you can get their IDs:steps_ids = Task.query_tasks(task_filter=dict(parent=<pipeline_id_here)) for task_id in steps_ids: task = Task.get_task(task_id)
Question - why is this the expected behavior?
It is π I mean the original python version is stored, but pip does not support replacing python version. It is doable with conda, but than you have to use conda for everything...
Hover near the edge of the plot, the you should get a "bar" you can click on to resize
So basically development on a "shared" GPU?
Hi BattyLizard6
does clearml orchestration have the ability to break gpu devices into virtual ones?
So this is fully supported on A100 with MIG slices. That said dynamic multi-tenant GPU on Kubernetes is a Kubernetes issue... We do support multi agents on the same GPU on bare metal, or over shared GPU instances over k8s with:
https://github.com/nano-gpu/nano-gpu-agent
https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/gpu_plugin#fractional-resources
http...
Sure thing, any specific reason for querying on multi pod per GPU?
Is this for remote development process ?
BTW: the funny thing is, on bare metal machines multi GPU woks out of he box, and deploying it with bare metal clearml-agents is very simple
So it seems to get the "hint" from the type:
This will worktf.summary.image('toy255', (ex * 255).astype(np.uint8), step=step, max_outputs=10)
wdyt, should it actually check min/max and manually cast it ?
StaleButterfly40 are you sure you are getting the correct image on your TB (toy255) ?
GiddyTurkey39
I would guess your VM cannot access the trains-server
, meaning actual network configuration issue.
What are VM ip and the trains-server IP (the first two numbers are enough, e.g. 10.1.X.Y 174.4.X.Y)
Hi CheekyAnt38
However now I would like to evaluate directly my machine learning model via api requests, directly over clearml. Itβs possible?
This basically means serving the model, is this what you mean?
Hi SolidSealion72
"/tmp" contained alot of artifacts from ClearML past runs (1.6T in our case).
How did you end up with 1.6TB of artifacts there? what are the workflows on that machine? at least in theory, there should not be any leftover in the tmp folder, after the process is completed.
I want a blacklist of things I DONT want to report
π only whitelisting is currently supported
Should not be very complicated to add:
https://github.com/allegroai/clearml/blob/b24ed1937cf8a685f929aef5ac0625449d29cb69/clearml/task.py#L4096
Maybe everything that starts with exclamation is exclusion, like "!*.bin"
will only skip *.bin files
Hi SpicyCrab51 ,
Hmm, how exactly is the Dataset opened?
If the Dataset object is alive for 30h it will keep the dataset alive, why isn't it being closed ?
What's the host you have in the clearml.conf ?
is it something like " http://localhost:8008 " ?
so for example if there was an idle GPU and Q3 take it and then there is a task comes to Q2 which we specified 3GPU but now the Q3 is taken some of these GPU what will happen
This is a standard "race" the first one to come will "grab" the GPU and the other will wait for it.
I'm pretty sure enterprise edition has preemption support, but this is not currently part of the open source version (btw: also the dynamic GPU allocation, I think, is part of the enterprise tier, in the opensource ...
Hi WickedBee96
Queue1 will take 3GPUs, Queue2 will take another 3GPUs, so in Queue3 can I put 2-4 GPUs??
Yes exactly !
if there are idle GPUs so take them to process the task? o
Correct, basically you are saying, this queue needs a minimum of 2 GPUs, but if you have more allocate them to the Task it pulled (with a maximum of 45 GPUs)
Make sense ?
Hmm can you try:--args overrides="['log.clearml=True','train.epochs=200','clearml.save=True']"
How does
deferred_init
affect the process?
It ders all the networking and stuff in the background (usually the part that might slow the Task initialization process)
Also, is there a way of specifying a blacklist instead of a whitelist of features?
BurlyPig26 you can while list per framework and file name, exampletask = Task.init(..., auto_connect_frameworks={'pytorch' : '*.pt', 'tensorflow': ['*.h5', '*.hdf5']} )
What am I missing ?
I see TrickyFox41 try the following:--args overrides="param=value"
Notice this will change the Args/overrides argument that will be parsed by hydra to override it's params
Was going crazy for a short amount of time yelling to myself: I just installed clear-agent init!
oh noooooooooooooooooo
I can relate so much, happens to me too often that copy pasting into bash just uses the unicode character instead of the regular ascii one
I'll let the front-end guys know, so we do not make ppl go crazy π
Hmm that should have worked ...
I'm assuming the Task itself is running on a remote agent, correct ?
Can you see the changes in the OmegaConf section ?
what happens when you pass--args overrides="['dataset.path=abcd']"
. So to conclude: it has to be executed manually first, then with trains agent?
Yes, that said, as you mentioned, you can always edit the "installed packages" once manually, from that point you are basically cloning the experiment, including the "installed packages" so it should work if the original worked.
Make sense ?
Now Iβm just wondering if I could remove the PIP install at the very beginning, so it starts straightaway
AbruptCow41 CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
does exactly that π BTW, I would just set the venv cache and this means it will just be able to restore the entire thing (even if you have changed the requirements
https://github.com/allegroai/clearml-agent/blob/077148be00ead21084d63a14bf89d13d049cf7db/docs/clearml.conf#L115
Yes, actually ensuring pip is there cannot be skipped (I think in the past it cased to many issues, hence the version limit etc.)
Are you saying it takes a lot of time when running? How long is the actual process that the Task is running (just to normalize times here)
so when inside the docker, I donβt see the git repo and thatβs why ClearML doesnβt see it
Correct ...
I could map the root folder of the repo into the container, but that would mean everything ends up in there
This is the easiest, you can put it on the ENV variable :
None
Hi @<1575656665519230976:profile|SkinnyBat30>
Streamlit apps are backend run (i.e. the python code drives the actual web app)
This means running your Tasks code and exposing the web app (i.e. http) streamlit.
This is fully supported with ClearML, but unfortunately only in the paid tiers π
You can however run your Task with an agent, make sure the agent's machine is accessible and report the full IP+URL as a hyper-parameter or property, and then use that to access your streaml...