Reputation
Badges 1
25 × Eureka!MuddySquid7 I might have found something, and this is very very odd, it seems it will Not upload any new images post the history size, which is very odd considering the number of users actively using this feature...
Do you want to try a hack to see if it solved your issue ?
So when the agent fire up it get's the hostname, which you can then get from the API,
I think it does something like "getlocalhost", a python function that is OS agnostic
@<1657918706052763648:profile|SillyRobin38> out of curiosity did you compare performance of tensorrt-llm vs vllm ?
(the jury is still out on that, just wondered if you had a chance)
Hmm... That's what happens with the exception of None/'' if type is str... There is no way to differentiate in the UI.
This is why we opted for type=str will "cast" everything to str so you always get str, while not specifying a type will leave the variable as is... If you have an idea on how to support both, feel free to suggest 🙂
Just run once (from your python console / pycharm etc.):
https://github.com/allegroai/clearml/blob/master/examples/automation/toy_base_task.py
Hi OutrageousSheep60
Do you mean something like:
https://github.com/allegroai/clearml/tree/master/examples/datasets
?
We are working on 1.3.0 so this is right in time
Hi @<1526371965655322624:profile|NuttyCamel41>
How are you creating the model? specifically what do you have in "config.pbtxt"
specifically any python code should be in the pre/post processing code (actually not running on the GPU instance)
Done!
Thanks
fatal: unable to find a suitable socket path; use --socket
)
I think that's the root cause, we should probably also add https://github.com/allegroai/trains-agent/issues/16
@<1639799308809146368:profile|TritePigeon86> +1
Hi FloppyDeer99
Since this thread is a bit old, I might have missed something 🙂
Are we saying the links are not working in the UI ?
(notice the links themselves are generated by the clearml package, so if there was a bug, still not sure here, then old links will remain invalid until manually fixed) Can you verify that the latest clearml generates working links?
Hmm @<1523701083040387072:profile|UnevenDolphin73> I think this is the reason, None
and this means that even without a full lock file poetry can still build an environment
Great! btw: final v1.2.0 should be out after the weekend
Hi @<1554275802437128192:profile|CumbersomeBee33>
what do you mean by "will the dependencies will be removed or not" ?
The next time the agent spin a new Task it will create a new venv and delete the previous one
Hi DepressedChimpanzee34
Why do you need to have the configuration added manually ? isn't the cleaml.conf easier ? If not I think OS environments are easier no? I run run above code, everything worked with no exception/warning... What is the try/except solves exactly ?
Hi DeliciousKoala34
Happened when cloning and running a task on an agent on a different machine. I
sounds like torch internal issue, can you send the full log of the remote Task ?
Hi SkinnyPanda43
No idea what the ImageId actually is.
That's the ami image string that the new EC2 will be started with, make sense ?
Hi @<1544853695869489152:profile|NonchalantOx99>
I would assume the clearml-server configuration / access key is misconfigured in your copy of example.env
.I am using pipeline from tasks method and not pipeline from decorator.
Wait I'm confused nowm if this is a pipeline from Tasks then the Tasks themselves should have clearml in the "installed packages", no? and if they do not, how were they created?
Long story short, this is done internally when you call the Task.init (I think, there is a chance it is called before)
One way of controlling it would be to have something like:Task.init(auto_connect_frameworks={'hydra': {'log_before_resolve': True}})That said, I think it will be simpler to store both (in different section of course)
Maybe "Configuration Object: OmegaConf" and "Configuration Object: OmegaConfDefinition" ?
Hi LackadaisicalOtter14
Is it possible to remove this line to stop it from being executed
Everything is possible 🙂 II think the main question is why it is there (which ti the best of my understanding, is to solve for any cuda drivers and installed packages, meaning anything that is installed in runtime)
I think we can suppress the error, wdyt?'echo "ldconfig" 2>/dev/null >> /etc/profile && '
Both are fully implemented in the enterprise version. I remember a few medical use cases, and I think they are working on publishing a blog post on it, not sure. Anyhow I suggest you contact the sales people and I'm sure they will gladly setup a call/demo/PoC.
https://allegro.ai/enterprise/#contact
and of course if your docker has packages preinstalled they are automatically used (not reinstalled)
I found "scheduler" on allegroai github, is it something related to the case I want to make?
MoodyCentipede68 it is exactly what you are looking for 🙂
Do notice that you need to make sure you have your services queue configured and running for that to work 🙂
Hi, I changed it to 1.13.0, but it still threw the same error.
This is odd, just so we can make the agent better, any chance you can send the Task log ?
Check the log, the container has torch 1.13.0 but the task requires torch==1.13.1
Now torch package inside those nvidia prepackaged containers are compiled a bit differently . What I suspect happens is the torch wheel from pytorch is not compatible with this container . Easiest fix , change the task requirments to 1.13
Wdyt ?
I'm sorry my bad, this is use_current_task
https://github.com/allegroai/clearml/blob/6d09ff15187197e1f574902352115aa08dc1c28a/clearml/datasets/dataset.py#L663task = Task.init(...) dataset = Dataset.create(..., use_current_task=True) dataset.add_files(...)