Reputation
Badges 1
53 × Eureka!Yea. Added an issue. We can follow up from there. Really hope that clearml serving can work, is a nice project.
OK let me try by adding to vol mount.
@<1523701205467926528:profile|AgitatedDove14> when my codes get the clearml datasets, it stores in the cache e.g. /$HOME/.clearml/cache....
I wanted it to be in a mounted PV instead, so other pods (in same node) who needed same datasets can use without pulling again.
I have yet to figure out how to do so, would appreciate if u could give some guidance
@<1523701205467926528:profile|AgitatedDove14> do u mean not using helm but fill up the values and install with the yaml files directly? E.g. kubectl apply ...
Thanks AgitatedDove14 and TimelyMouse69 . The intention was to have some traceability between the two setups. I think the best way is to enforce some naming convention (for project and name) so we can know how they are related? Any better suggestions?
https://clear.ml/docs/latest/docs/integrations/storage/
Try add the <path to your cert> for s3.credentials.verify.
Do u have an example of how I can define the packages to be installed for every steps of the pipeline?
Thanks I just realised I didn't add --docker
Example i build my docker image using a image in docker hub. In this image, i installed torch and cupy packages. But when i run my experiment in this image, the packages are not found.
Yes, I ran the experiment inside.
Ok. Can I check that only the main script was stored in the task but not the dependent packages?
I guess the more correct way is to upload to some repo where the remote task can still pull from it?
seems like it was broken for numpy version 1.24.1.
Tried with numpy 1.23.5 and it works.
Hi CostlyOstrich36 I have run this task locally at first. This attempt was successful.
When I use this task to run in a pipeline (task was run remotely), it cannot find the external package. This seems logical but I not sure how to resolve this.
I was browsing clearml agent gihub and saw this. Isn't this for spinning up clearml-agent in a docker and perform like a daemon?
May I know where to set the cert to in env variable?
U want to share your clearml.conf here?
By the way, how can I start up the clearml agent using the clearml-agent image instead of SDK? Do u have an example of the docker run command that includes the queue, gpus etc?
Nice. That should work. Thanks
Not exactly sure yet but I would think user tag for deployed make sense as it should be a deliberated user action. And additional system state is required too since a deployed state should have some pre-requitise system state.
I would also like to ask if clearml has different states for a task, model, or even different task types? Right now I dun see differences, is this a deliberated design?
Nice. It is actually dataset.id
.
I guess we need to understand the purpose of the various states. So far only "archive, draft, publish". Did I miss any?
Yes. But I not sure what's the agent running. I only know how to stop it if I have the agent id
I not very sure tbh. Just want to see if this is useful....
I got SSL error few days back and I solved it by adding cert to /etc/ssl/certs
and perform update-ca-certificates
.
export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
Add this. Note that verify
might not work with sdk.aws.s3.verify
but sdk.aws.s3.credentials
. Pls see the attached image.
Example:aws {
s3 {
credentials: [
{
` ...
SdK meaning I run the agent using clearml-agent daemon ....
Alternatively I understand I can also run the agent using docker run allegroai/clearml-agent:latest.
But I cannot figure out how to add --restart, --queue, -- gpus flag to the container
Yup. But I happened to reinstall my server and the data is lost. And the agent continue running.
I see. Was wondering any advantage to do it any of the ways.
@<1523701070390366208:profile|CostlyOstrich36> This is output_uri or where do I put this url?
Yup, was thinking of bash script.
The intent is to generate some outputs from the clearml task and thinking probably to package it into a docker image for ease of sharing to others that are not plug into our network and able to run the image directly.
Hello CostlyOstrich36 I am facing an issue now. basically i installed all necessary python packages in my docker image. But somehow, the clearml-agent does not seems to be able to detect these global packages. I don't see them in the "installed packages". Any advice?