Reputation
Badges 1
25 × Eureka!Correct (with the port mapping service in it)
Yeah I think this kind of makes sense to me, any chance you can open a GH issue on this feature request?
Also I canβt call the βpreprocessβ function since there is no valid endpoint to be hitting
Wait now I'm confused, when you are calling " None " you are actually calling the preprocess function running on the inference container, and this one in turn (automatically) calls the Triton container.
Are you calling the Triton manually?
Could you share your preprcoess.py , and the command line you have used to register the two model versions ?
(based on ...
Hi @<1569858449813016576:profile|JumpyRaven4> could you test the fix? just pull & run
allegroai/clearml-serving-triton:1.3.1
allegroai/clearml-serving-inference:1.3.1
Okay we have located the issue, thanks guys! We will push a patch release hopefully later today
@<1671689437261598720:profile|FranticWhale40> I might have found something, let me see if I can reproduce it
Thanks @<1671689437261598720:profile|FranticWhale40> !
I was able to locate the issue, fix should be released later today (or worst case tomorrow)
@<1671689437261598720:profile|FranticWhale40> this one: None
what is the best approach to update the package if we have frequent update on this common code?
since this package has an indirect affect on the model endpoint, I would package with the preprocess code of the endpoint.
Each server is updating it's own local copy, and it will make sure it can take it and deploy it hand over hand without breaking its ability to serve these endpoints.
the "wastefulness" of holding multiple copies is negligible when comparing to a situation where everyone ...
we also provide a custom
aux-config
file. We also had to make sure to update the name inside
config.pbtxt
so that Triton is happy:
Good point, what would be the logic of the auto "config.pbtxt" patching we should employ ?
@<1671689437261598720:profile|FranticWhale40> could you test the fix? just pull & run
allegroai/clearml-serving-triton:1.3.1
allegroai/clearml-serving-inference:1.3.1
Should have worked, the error you are getting is docker-compose parsing the yml file
Is this exactly the one from the trains-server repo ?
@<1524922424720625664:profile|TartLeopard58> @<1545216070686609408:profile|EnthusiasticCow4>
Notice that when you are spinning multiple agents on the same GPU, the Tasks should request the "correct" fractional GPU container, i.e. if they pick a "regular" no mem limit.
So something like
CLEARML_WORKER_NAME=host-gpu0a clearml-agent daemon --gpus 0 clearml/fractional-gpu:u22-cu12.3-2gb
CLEARML_WORKER_NAME=host-gpu0b clearml-agent daemon --gpus 0 clearml/fractional-gpu:u22-cu12.3-2gb
```...
making me realize that this may have been optional
I think it is optional, and this is why it was not entered in the first place.
Could you double check and just remove it from your manual pbtxt ?
I guess it wonβt due to the nature of services?
Correct, k8s glue works differently, that said I would actually use the helm to spin a pod woth the agent in services mode and venv mode.
No worries, just found it. Thanks!
I'll make sure to followup on the GitHub issue for better visibility π
Still, My problem is calling
pipe.start()
crashes.
is supposed to kill the process2022-08-19 09:17:56,626 - clearml - WARNING - Terminating local execution process
This is what it writes before killing the local process.
` /opt/homebrew/anaconda3/envs/py39/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 16 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be ...
Hi CleanPigeon16
You need to pass the private repository docker credentials to the aws instance, I would use the custom bash script option of the aws autoscaler to create the docker credentials file.
ok, but this happens in my local machine, not in the agent
resource monitoring is always running in the background, even on local machines. (of course you can turn it off)
Hi SuperficialGrasshopper36
/home/ubuntu/.clearml/venvs-builds.1/3.8/task_repository/repository_name/.venv
This is the problem, they should not be installed there, it should be in/home/ubuntu/.clearml/venvs-builds.1/3.8/
Could you post the poetry.lock file? Maybe it is something there?
What's the poetry version and cleaml-agent versions ?
Hi ClumsyElephant70
So do you need both requirements.txt combined ?
How will the agent be able to reproduce both repo on the remote machine ?
Thanks NonchalantDeer14 !
BTW: how do you submit the multi GPU job? Is it multi-gpu or multi node ?
Regrading the demoapp, this is just a default server that allows you to start play around with ClearML without needing to setup any of your own servers or signup
That said, I would recommend to sign up (totally free) on the community server
https://app.community.clear.ml/