I remember there were some issues with it ...
I hope not π Anyhow the only thing that does matter is the auto_connect arguments (meaning if you want to disable some, you should pass them when calling Task.init)
JitteryCoyote63 fix should be pushed later today π
Meanwhile you can manually add the Task.init() call to the original script at the top, it is basically the same π
JitteryCoyote63
Should be added before theΒ
if name == "main":
?
Yes, it should.
From you code I understand it is not ?
What's the clearml
version you are using ?
Task.add_requirements('.')
Should work
Hmm I assume it is not running from the code directory...
(I'm still amazed it worked the first time)
Are you actually using "." ?
JitteryCoyote63 I found it π
Are you working in docker mode or venv mode ?
JitteryCoyote63 instead of _update_requirements, call the following before Task.init:Task.add_requirements('torch', '1.3.1') Task.add_requirements('git+
')
Hi @<1523701066867150848:profile|JitteryCoyote63>
Could you please push the code for that version on github?
oh seems like it is not synced, thank you for noticing (it will be taken care immediately)
Regrading the issue:
Look at the attached images
None does not contain a specific wheel for cuda117 to x86, they use the pip defualt one
![image](https://clearml-web-assets.s3.amazonaws.com/scoold/images/TT9ATQXJ5-F05744CK09L/screenshot...
I am not sure what switching back will solve, here the wheel should have been correct, it's just the architecture of the card that is incompatible
So I tested the "old" code that did the parsing and matching, and it did resolve to the correct wheel (i.e. found that there is no 117 only 115 and installed this one)
I think we should switch back, and have a configuration to control which mechanism the agent uses , wdyt?
Hi @<1523701066867150848:profile|JitteryCoyote63>
RC is out,
pip3 install clearml-agent==1.5.3rc3
Then in pytorch_resolve: "direct"
None
Let me know if it worked
@<1523701066867150848:profile|JitteryCoyote63>
I just created a new venv and run
pip install "torch==1.11.0.*" --extra-index-url
Then started python:
import torch
torch.cuda.is_available()
And I get True
what are you getting?
Hi @<1523701066867150848:profile|JitteryCoyote63>
Thank you for bringing it! can you verify with the latest clearml-agent 1.5.3rc2
?
if this is the case pytorch really messed things up, this means they removed packages
Let me check something
DeliciousBluewhale87 out of curiosity , what do you mean by "deployment functionality" ? is it model serving ?
So I might be a bit out of sync, but I think there should be Triton serving and OpenVino serving built into it (or at least in progress).
Hi DeliciousBluewhale87 ,
Yes they do (I think it's ClearML Enterprise or Allegro ClearML). I also know it has extended capabilities in data management , permissions , and security.
More than that you should probably talk to them directly ( https://clear.ml/contact-us/ ) π
After it finishes the 1st Optimzation task, what's the next job which will be pulled ?
The one in the highest queue (if you have multiple queues)
If you use fairness it will pull in round robin from all queues, (obviously inside every queue it is based on the order of jobs).
fyi, you can reorder the jobs inside the queue from the UI π
DeliciousBluewhale87 wdyt?
DeliciousBluewhale87 Yes I think so, do notice that you might end up with maximum of 12 pods.
You can also do the following with max 10 nodes: (notice --queue can always get a list of nodes it will pull based on the order of the queues)python k8s_glue_example.py --queue high_priority_q low_priority_q --ports-mode --num-of-services 10
Is this some sort of polling ?
yes
End of the day, we are just worried whether this will hog resources compared to a web-hook ? Any ideasΒ (edited)
No need to worry, it pulls every 30 sec, and this is negligible (as a comparison any task will at least send a write request every 30 sec, if not more)
Actually webhooks might be more taxing on the server, as you need to always have a webhook up (i.e. wasting a socket ...)
Yes, just set system_site_packages: true
in your clearml.conf
https://github.com/allegroai/clearml-agent/blob/d9b9b4984bb8a83914d0ec6d53c86c68bb847ef8/docs/clearml.conf#L57
Yes
Are you trying to upload_artifact to a Task that is already completed ?
I commented the upload_artifact at the end of the code and it finishes correctly now
upload_artifact caused the "failed" issue ?
Hmm... any idea on what's different with this one ?