Reputation
Badges 1
59 × Eureka!Example i build my docker image using a image in docker hub. In this image, i installed torch and cupy packages. But when i run my experiment in this image, the packages are not found.
Yes, I ran the experiment inside.
Thanks AgitatedDove14 . Specifically, I wanted to use my own clearml server and Triton. Thus, I attempted to use --engine-container-args during launch but error saying no such flag. Looked into --help but I guessed it is not updated yet.
SuccessfulKoala55 Nope. I didn't even get to enter my name. I suspect there is some mistake in mapping the data folder.
Was using the template in https://github.com/allegroai/clearml-helm-charts to deploy.
Thanks I just realised I didn't add --docker
I have yet to figure out how to do so, would appreciate if u could give some guidance
SdK meaning I run the agent using clearml-agent daemon ....
Alternatively I understand I can also run the agent using docker run allegroai/clearml-agent:latest. But I cannot figure out how to add --restart, --queue, -- gpus flag to the container
U want to share your clearml.conf here?
Ah I think I was not very clear on my requirement. I was looking at porting project level, not entire clearml data over. Is it possible instead?
Ok. Can I check that only the main script was stored in the task but not the dependent packages?
I guess the more correct way is to upload to some repo where the remote task can still pull from it?
Yup, was thinking of bash script.
The intent is to generate some outputs from the clearml task and thinking probably to package it into a docker image for ease of sharing to others that are not plug into our network and able to run the image directly.
By the way, how can I start up the clearml agent using the clearml-agent image instead of SDK? Do u have an example of the docker run command that includes the queue, gpus etc?
I not very sure tbh. Just want to see if this is useful....
I got SSL error few days back and I solved it by adding cert to /etc/ssl/certs and perform update-ca-certificates .
export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
Add this. Note that verify might not work with sdk.aws.s3.verify but sdk.aws.s3.credentials . Pls see the attached image.
Example:aws {s3 {credentials: [{
` ...
Hi CostlyOstrich36 I have run this task locally at first. This attempt was successful.
When I use this task to run in a pipeline (task was run remotely), it cannot find the external package. This seems logical but I not sure how to resolve this.
May I know where to set the cert to in env variable?
@<1523701205467926528:profile|AgitatedDove14> I still trying to figure out how to do so. Coz when I add a task in queue, clearml agent basically creates a pod with the container. How can I make a task that does a helm install or kubectl create deployment.yaml?
@<1526734383564722176:profile|BoredBat47> Just to check if u need to do update-ca-certificates or equivalent?
Thanks. The examples uses upload_artifact which stores the files in output_uri. What if I do not want to save it but simply pass to next step, is there a way to do so?
JuicyFox94 and SuccessfulKoala55 Thanks alot. Indeed it is caused by dirty cookies.
By the way, will downloading still happen if the datasets is available in the cache folder? Any specific settings to add to Dataset.get_local_copy()?
Cool thanks guys. I am clearer now. Was confused by the obsolete info. Thanks for the clarification.
Hello CostlyOstrich36 I am facing an issue now. basically i installed all necessary python packages in my docker image. But somehow, the clearml-agent does not seems to be able to detect these global packages. I don't see them in the "installed packages". Any advice?
Clearml 1.1.1. Yes, i have boto3 installed too.
It return false. Just to share abit more, I have the requirements.txt in gitlab with my codes and are in folders. Do I need to provide a gitlab path?
To clarify, there might be cases where we get helm chart /k8s manifests to deploy a inference services. A black box to us.
Users may need to deploy this service where needed to test out against other software components. This needs gpu resources which a queue system will allow them to queue up and eventually get this deployed instead of hard resource allocation to this purpose
Hi TimelyPenguin76 , nope. I don't see any errors. That's why not sure what went wrong
Hi ExasperatedCrab78 I managed to get it. It was due to ip address set in examples.env.
seems like it was broken for numpy version 1.24.1.
Tried with numpy 1.23.5 and it works.
@<1523701205467926528:profile|AgitatedDove14> when my codes get the clearml datasets, it stores in the cache e.g. /$HOME/.clearml/cache....
I wanted it to be in a mounted PV instead, so other pods (in same node) who needed same datasets can use without pulling again.