TenseOstrich47 this looks like elasticserach is out of space...
Hi @<1556812486840160256:profile|SuccessfulRaven86>
Every clearml-serving session (you can have multiple different "sessions") is assumed to be homogeneous, this would mean it will serve the same models on as many nodes as possible supporting multiple models per pod.
In your example I think the easiest is to create two serving sessions one with a node selector for the 24GB node and another for the 16GB node, wdyt?
Hi FunnyTurkey96
what's the clearml server you are using ?
orpip install -U trains
As I suspected, from your log:agent.package_manager.system_site_packages = false
Which is exactly the problem of the missing tensorflow (basically it creates a new venv inside the docker, but without the flag On, it does not inherit the docker preinstalled packages)
This flag should have been true.
Could it be that the clearml.conf you are providing for the glue includes this value?
(basically you should only have the sections that are either credentials or missing from the default, there...
Yep 🙂
Basically:
` task = Task.get_task(task_id='aaaa')
while task.status not in ('completed', 'stopped',):
do something ?
sleep(15) `(Notice task.status / task.get_status() will refresh the Task status on every call)
But I am considreing just failing the task.
This will of course work, just raise exception in the Task itself, and protect the call from the pipeline logic function with try/except
regrading the second option, try to nullify the hash on the Component Task:
# running the Task component here
# if we do not want someone to use us
Task.current_task()._set_runtime_properties({"pipeline_job_hash": None})
but I still need the laod ballancer ...
No you are good to go, as long as someone will register the pods IP automatically on a dns service (local/public) you can use the regsitered address instead of the IP itself (obviously with the port suffix)
Thanks for your support
With pleasure!
Hi @<1590514584836378624:profile|AmiableSeaturtle81>
I think you should use add_external_files
, instead of add_files
(which is for local files)
None
Hi ConvolutedSealion94
Yes this seems like the correct curl
How did you spin the clearml-serving containers? is it with the docker-compose or with the helm chart (I remember that there are some pitfalls with the helm chart, and I would actually start with the local docker-compose to debug it)
Hi @<1684010629741940736:profile|NonsensicalSparrow35>
But the provided command is missing the url target for the curl so it is not complete.
Not sure I followed. did you specify "NEW_ADDRESS" ?
or is it the in both cases the URL is locahost ?
Assuming from previous threads this is run on K8s , I think a configuration is missing, use system packages:
https://github.com/allegroai/clearml-agent/blob/cb6bdece39751eaef975287609b8bab603f116e5/docs/clearml.conf#L57
But they are all running inside the same pod, correct ?
Can't figure out what made it get to this point
I "think" this has something to do with loading the configuration and setting up the "StorageManager".
(in other words setting the google.storage)... Or maybe it is the lack of google storage package?!
Let me check
instead of the one that I want or the one of the env which it is started from.
The default is the python that is used to run the agent.agent.ignore_requested_python_version = true agent.python_binary = /my/selected/python3.8
Hi @<1554275802437128192:profile|CumbersomeBee33>
what do you mean by "will the dependencies will be removed or not" ?
The next time the agent spin a new Task it will create a new venv and delete the previous one
Hi AstonishingWorm64
I think you are correct, there is external interface to change the docker.
Could you open a GitHub issue so we do not forget to add an interface for that ?
As a temp hack, you can manually clone "triton serving engine" and edit the container image (under the execution Tab).
wdyt?
Hi ProudChicken98task.connect(input)
preserves the types based on the "input" dict types, on the flip side get_parameters
returns the string representation (as stored on the clearml-server).
Is there a specific reason for using get_parameters
over connect ?
Woot woot!
awesome, this RC is stable you can feel free to use it, the official release is probably due to be out next week :)
Worker just installs by name from pip, and it installs not my package!
Oh dear ...
Did you configure additional pip repositories in the Agent's clearml.conf ? https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L77 It might be that (1) is not enough, as pip will first try to search the package in the pip repository, and only then in the private one. To avoid that, in your code you can point directly to an https of your package` Ta...
Thanks @<1569496075083976704:profile|SweetShells3> ! let me see if I can reproduce the issue
@<1523706266315132928:profile|DefiantHippopotamus88> seems like you are missing the ports 🙂
CLEARML_WEB_HOST="
"
CLEARML_API_HOST="
"
CLEARML_FILES_HOST="
"
Hi @<1697056701116583936:profile|JealousArcticwolf24>
Awesome deployment 🤩
Yes if you need another scalable model serving you can just run another instance of the clearml-serving-inference
https://github.com/allegroai/clearml-serving/blob/7ba356efc97a6ae2159283d198d981b3c1ab85e6/docker/docker-compose.yml#L77
So you end up with two of them, one per models environ...
Hi @<1697419082875277312:profile|OutrageousReindeer5>
Is NetApp S3 protocol enabled or are you referring to NFS mounts?
JitteryCoyote63 with pleasure 🙂
BTW: the Ignite TrainsLogger will be fixed soon (I think it's on a branch already by SuccessfulKoala55 ) to fix the bug ElegantKangaroo44 found. should be RC next week
Hi @<1528908687685455872:profile|MassiveBat21>
However
no useful
template
is created for down stream executions - the source code template is all messed up,
Interesting, could you provide the code that is "created", or even better some way to reproduce it ? It sounds like sort of a bug? or maybe a feature support that is missing.
My question is - what is a best practice in this case to be able to run exported scripts (python code not made availa...