now realise that the ignite events callbacks seem to not be fired
So this is an ignite issue ?
Hi @<1564785037834981376:profile|FrustratingBee69>
It's the previous container I've used for the task.
Notice that what you are configuring is the Default container, i.e. if the Task does not "request" a specific container, then this is what the agent will use.
On the Task itself (see Execution Tab, down below Container Image) you set the specific container for the Task. After you execute the Task on an Agent, the agent will put there the container it ended up using. This means that ...
tried it and restarted the agent, but not working properly
What do you mean not working? can you provide logs ?
In our case this is not possible due to client security (e.g. training data from clients can potentially be 'reverse engineered' from trained models in future).
Hmm I see, wouldn't it make more sense to separate clients like a multi-tenant SAAS solution ?
FrothyShark37 any chance you can share snippet to reproduce?
Hi TrickyFox41
is there a way to cache the docker containers used by the agents
You mean for the apt get install part? or the venv?
(the apt packages themselves are cached on the host machine)
for the venv I would recommend turning on cache here:
https://github.com/allegroai/clearml-agent/blob/76c533a2e8e8e3403bfd25c94ba8000ae98857c1/docs/clearml.conf#L131
Hi @<1657918706052763648:profile|SillyRobin38>
Hi everyone, I wanted to inquire if it's possible to have some type of model unloading.
What do you mean by "unloading" ? you mean remove it from the clearml-serving endpoint ?
If this is from the clearml-serving, then yes you can online :
None
however can you see the inconsistency between the key and the name there:
Yes that was my point on "uniqueness" ... 😞
the model-key must be unique, and it is based on the filename itself (the context is known, it is inside the Task) but the Model Name is an entity, so it should have the Task Name as part of the entity name, does that make sense ?
Here this new entry in the log is 2 min after env completed =>
1702378941039 box132 DEBUG 2023-12-12 11:02:16,112 - clearml.model - INFO - Selected model id: 9be79667ca644d7dbdf26732345f5415
This seems to be something in your code, just add print("starting") in your entry python file, Before any imports (because they might actually do something)
Because form the agent's perspective after printing Starting Task Execution:
it literally calls the python script, nothing else...
Hi
, It works if I dont specify the project name and just give the task name
But now it searches for it globally , which is not very stable:
Let me check why it fails to find the project...
EnviousPanda91 so which frame works are being missed? Is it a request to support new framework or are you saying there is a bug somewhere?
while I want to upload a converted
.onnx
weights with custom tags to my custom project
Oh I see, sure, see this one?
https://github.com/allegroai/clearml/blob/master/examples/reporting/model_reporting.py
Or:output_model.update_weights(weights_filename="/path/to/file.onnx")
feature value distribution over time
You mean how to create this chart? None
K8s + clearml-agent integration.
Hmm is this an on-prem k8s cluster?
ContemplativePuppy11
yes, nice move. my question was to make sure that the steps are not run in parallel because each one builds upon the previous one
if they are "calling" one another (or passing data) then the pipeline logic will deduce they cannot run in parallel 🙂 basically it is automatic
so my takeaway is that if the funcs are class methods the decorators wont break, right?
In theory, but the idea of the decorator is that it tracks the return value so it "knows" how t...
MelancholyChicken65 found it ! thank you for finding this issue.
I'm hoping to get an update soon 🙂
Yes, there is no real limit, I think the only requirements id docker v19+
CrookedWalrus33 can you post the clearml.conf you have on the agent machine?
Hi CrookedAlligator14
Hi, I just started using clearml, and it is amazing!
Thank you! 😍
When I enqueue the task, the venv is setup and starts to install all the packages from the
requirements.txt
file, but at the end I get the following in the console:
Can you try with the latest agent, we improved the support for pytorch (they now have a proper pypi compatible repo), can you see if that solves it?pip3 install clearml-agent==1.5.0rc0
But what should I do? It does not work, it says incorrect password as you can see
How are you spinning the agent machine ?
Basically 10022 port from the host (agent machine) is routed into the container, but it still needs to be open on the host machine, could it be it is behind a firewall? Are you (client side runnign clearml-session) on the same network as the machien runnign the agent ?
My question is if there is an easy way to track gradients similar to
wandb.watch
@<1523705099182936064:profile|GrievingDeer61> not at the moment, but should be fairly easy to add.
Usually torch examples just use TB as a default logging, which would go directly to clearml
, but this is a great idea to add
Could probably go straight to the next version 🙂
wdyt?
Hi DepressedFish57
In my case download each part takes ~5 second, and unzip ~15.
We run into that, and the new version will employ multithreading approach for the unzip (meaning the unzipping will happen in the background)
You can do that programatically, clone the pipeline Task (a pipeline is also a Task) and change the Args section of that Task, wdyt?
Example:
None