SmarmySeaurchin8 regarding the original question:task.set_project(project_id)Task.get_projects() to get all the project names/ids
OddAlligator72 what you are saying is, take the repository / packages from the runtime, aka the python code calling the "Task.create(start_task_func)" ?
Is that correct ?
BTW: notice that the execution itself will be launched on other remote machines, not on this local machine
You actually have to login/ssh under said user, have another dedicated mountpoint and spin the agent from that user.
This is odd it says 1.0.0 but then, it was updated t weeks ago ...
@<1657918706052763648:profile|SillyRobin38> out of curiosity did you compare performance of tensorrt-llm vs vllm ?
(the jury is still out on that, just wondered if you had a chance)
Why does my task execution freeze after pip installation (running agent in foreground mode)?
Hi AdventurousButterfly15
Are you running in agent docker mode or venv mode ?
What do you mean freeze? do you see anything on the Taks console log in the UI? what's the host OS ?
yea, does the enterprise version have more functionality like this?
yes, all sorts of bit and pieces for easier DevOps / K8s etc.
I got everything working using the default queue. I can submit an experiment, and a new GPU node is provisioned, all good
Nice!
My next question, how do I add more queues?
You can create new queues in the UI and spin a new glue for the queue (basically think of a queue as an abstraction for a specific type of resource)
Make sense ?
So I'd create the queue in the UI, then update the helm yaml as above, and install? How would I add a 3rd queue?
Same process?!
Also I'd like to create the queues pragmatically, is that possible?
Yes, you can, you can also pass an argument for the agent to create the queue if it does not already exist, just add --create-queue to the agent execution commandline
create inside another task that would again run remotely
This Task will be run on another node, user / permissions will be dealt with by the agent on the other node running the Task
Hmm maybe we should add a test once the download is done, comparing the expected file size and the actual file size, and if they are different we should redownload ?
Hi ItchyJellyfish73
You can always archive a Task/Model even when published
In the UI you can right-click and choose archive.
From code you need to add a system tag "archived"from clearml import Task t = Task.get_task(task_id='aabb') t.set_system_tags(t.get_system_tags() + ['archived'])And similarly for Model(model_id='aabb')
Okay, could you try to run again with the latest clearml package from github?pip install -U git+
confirmed that the change had been added by
Make sure you see them in the Task log in the UI (the agent print it when it starts)
Any insight on how we can reproduce the issue?
Can this be reproducible using a simple script that we can also run?
(currently I think the implementation expects that if the download completed, it was successful)
RipeWhale0 I think this is installing older version of clearml, try to pull the latest chart 🙂
Hi WittyOwl57
That's actually how it works (original idea/design was borrowed from libclound), basically you need to create a Drive, then the storage manger will use it.
Abstract class here:
https://github.com/allegroai/clearml/blob/6c96e6017403d4b3f991f7401e68c9aa71d55aa5/clearml/storage/helper.py#L51
Is this what you had in mind ?
@<1547390422483996672:profile|StaleElk72> when you go to the dataset in the UI, and press on "Full Details" then go to the Artifacts tab, what is the link you see there?
VirtuousFish83 I can confirm clearml-server 1.3 solves the issue.
VirtuousFish83 I remember an issue on github with something similar , what's the cleamrl- server version you are using ?
And are you sure your are pointing to the correct API server and not mixing API with WEB address ?
Also what's the clearml-server version?
HI QuizzicalDove0
I guess the reason is that the idea is integration is literally 2 lines, and it will take less time to execute the code on a system with working env (we assume there is one) then to configure all the git , python packages, arguments etc...
All that said you can create an experiment from code , using Task.import_task https://allegro.ai/docs/task.html#trains.task.Task.import_task
EnviousStarfish54 could you send the conda / pip environment?
Maybe that's the diff between machine A/B ?
I think that what you need is to create an OutputModel , then call update weights file when you have the better model, this will also allow you to tag the model object. Would that help? Or would it make sense to use Task.models and count on the auto logging?
This might work (I have to admit I haven't had the time to test, please let me know if it works, so we could push it as a cool new feature 🙂 )
` class LocalClearmlJob(ClearmlJob):
def init(self, *args, **kwargs):
super(LocalClearmlJob, self).init(*args, **kwargs)
def launch(self, queue_name=None):
# type: (str) -> bool
if self._is_cached_task:
return False
# create the subprocess
cmd = self.task.data.execution.script.ent...