Reputation
Badges 1
25 × Eureka!Is this still an issue (if you provide queue name, the default tag is not used so no error should be printed)
Hi @<1541954607595393024:profile|BattyCrocodile47>
see here: None
Try with app.clearml.mlops-club.org
and the rest of them
ReassuredTiger98 I can verify the code snippet reproduces the issues with packages missing from "installed package".
If you feel this is important, please open a GitHub issue.
Also, you can manually add packages:
Task.add_requirements('package_name_here', 'optional version here')
So when you manually load the package you can make sure it will be listed, do remember to call it before the Task.init call.
. Is it possible for two agents to be utilizing the same GPU?
It is, as long as memory wise they do not limit one another.
(If you are using k8s and clearml enterprise, then it supports GPU slicing and dynamic memory allocation)
Hi VexedCat68
(sorry I just saw the message)
I wanted to ask, how to run pipeline steps conditionally? E.g if step returns a specific value, exit the pipeline or run another step instead of the sequential step
So do do so you can do:
` def pre_execute_callback_example(a_pipeline, a_node, current_param_override):
# if we want to skip this node (and subtree of this node) we return False
...
# ew decided to skip so we return False
return False
pipe.add_step(name='...
I'm glad it worked out, thanks SmallBluewhale13 🙂
Hi CheerfulGorilla72
is it ideological...
Lol, no 😀
Since some of the comparisons are done client side (browser, mostly the text comparisons) it is a bit heavy , so we added a limit. We want to change it so it does some on the backend, but in the meantime we can actually expand the limit, and maybe only lazy compare the text areas. Hopefully in the next version 🤞
(obviously if you have dependencies, they will be installed before, and then the correct torch will be installed over the previous version
OddAlligator72 FYI, in you current code you can always doif use_trains: from trains import Task Task.init()Might be easier 😉
Hi @<1697056701116583936:profile|JealousArcticwolf24>
Can you run your pipeline on an agent (i.e. remotely) but launching it from the UI (not the taskscheduler)?
@<1523711619815706624:profile|StrangePelican34> are you saying that after the " with " block the task is marked completed? how is that possible? is this done manually ?
OddAlligator72 sure thing 🙂
This should sort it out:Task.init('examples', 'train', continue_last_task=True)If you want to continue a specific Task:continue_last_task='task_id_here'Getting the previous model:last_checkopoint = task.models['output'][-1]What do you think?
basically @<1554638166823014400:profile|ExuberantBat24> you can think of hyper-datasets as a "feature-store for unstructured data"
GiganticTurtle0
What do you mean by "reuse_last_task_id" ? each component is always a new Task generated (unless it is cached, and then it will reuse the previous executed)
What am I missing here?
You are correct, it is currently not supported in venv mode. We could not find a good use case for it. What is yours?
We should probably change it so it is more human readable 🙂
Thanks a lot. I meant running a bash script after cloning the repository and setting the environment
Hmm that is currently not supported 😞
The main issue in adding support is where to store this bash script...
Perhaps somewhere inside clear ml there is an order of actions for starting that can be changed?
Not that I can think of,
but let's assume you could have such a thing, what would you have put in the bash script (basically I want to see maybe there is a worka...
No worries, I'll see what I can do 🙂
PunySquid88 do you want to test a fix?
My model files are also there, just placed in some usual non-shared linux directory.
So this is the issue, How would the container Get to these models? you either need to mount the folder to the container,
or you push them to ClearML model repo with the OutputModel class , does that make sense ?
Hi JuicyFox94
I think you are correct, this bug will explain the entire thing.
Basically what happens is that remote_execute stops the local run before the configuration is set on the Task. Then running remotely the code pull the configuration, sees that it is empty and does nothing.
Let me see if I can reproduce it...
PompousHawk82 unfortunately this is kind of binary, either you have full tracking of load/save operations or you do not.
This warning message will disappear in the next version as we will be able to log multiple models under the same Task :)
If you do not have a lot of workers, that I would guess console outputs
ContemplativeGoat37
http://1.it seems the DNS resolving to the server fails? (Temporary failure in name resolution) Is this running on an agent, or manually ? "clearml.Task - WARNING - ### TASK STOPPED - USER ABORTED - STATUS CHANGED ###" Is this you manually aborting the Task or is it aborting itslef due to the connectivity ?
4. what's the clearml/clearml-agent versions ?
and then?
The thing is programmatically this is not easy to do as API, because at the end the "function" (i.e. LCI) never leaves, it connects to the SSH and stays
But you can query the Task it creates, the project is known, the user is known and it is of special type/tag
actually no
hmm, are those packages correct ?
Bummer... that seems like a bit of an oversight tbh.
There is never a solution for those, unless the helm chart "knows" something about the server before spinning it the first time, which basically means a predefined access-key, I do not think we want that 😉