Reputation
Badges 1
53 × Eureka!Hi SuccessfulKoala55 Thanks for pointing me to this repo. Was using this repo.
I didn't manage to find in this repo that if we still require to label the node app=clearml, like what was mentioned in the deprecated repo. Although from the values.yaml, the node selector is empty. Would u be able to advise?
How is the clearml data handled now then? Thanks
Cool thanks guys. I am clearer now. Was confused by the obsolete info. Thanks for the clarification.
Hello CostlyOstrich36 I am facing an issue now. basically i installed all necessary python packages in my docker image. But somehow, the clearml-agent does not seems to be able to detect these global packages. I don't see them in the "installed packages". Any advice?
Example i build my docker image using a image in docker hub. In this image, i installed torch and cupy packages. But when i run my experiment in this image, the packages are not found.
Yes, I ran the experiment inside.
Hi ExasperatedCrab78 I managed to get it. It was due to ip address set in examples.env.
OK let me try by adding to vol mount.
SuccessfulKoala55 Nope. I didn't even get to enter my name. I suspect there is some mistake in mapping the data folder.
Was using the template in https://github.com/allegroai/clearml-helm-charts to deploy.
JuicyFox94 and SuccessfulKoala55 Thanks alot. Indeed it is caused by dirty cookies.
Hi TimelyPenguin76 , nope. I don't see any errors. That's why not sure what went wrong
This is what I got. and when I see http400 error in the console.
@<1526734383564722176:profile|BoredBat47> Just to check if u need to do update-ca-certificates or equivalent?
@<1523701205467926528:profile|AgitatedDove14> do u mean not using helm but fill up the values and install with the yaml files directly? E.g. kubectl apply ...
Yup, was thinking of bash script.
The intent is to generate some outputs from the clearml task and thinking probably to package it into a docker image for ease of sharing to others that are not plug into our network and able to run the image directly.
I guess we need to understand the purpose of the various states. So far only "archive, draft, publish". Did I miss any?
Not exactly sure yet but I would think user tag for deployed make sense as it should be a deliberated user action. And additional system state is required too since a deployed state should have some pre-requitise system state.
I would also like to ask if clearml has different states for a task, model, or even different task types? Right now I dun see differences, is this a deliberated design?
By the way, how can I start up the clearml agent using the clearml-agent image instead of SDK? Do u have an example of the docker run command that includes the queue, gpus etc?
Thanks I just realised I didn't add --docker
@<1523701070390366208:profile|CostlyOstrich36> This is output_uri or where do I put this url?
Yes. But I not sure what's the agent running. I only know how to stop it if I have the agent id
Yup. But I happened to reinstall my server and the data is lost. And the agent continue running.
Nice. That should work. Thanks
@<1523701070390366208:profile|CostlyOstrich36> Yes. I'm running on k8s
@<1523701205467926528:profile|AgitatedDove14> when my codes get the clearml datasets, it stores in the cache e.g. /$HOME/.clearml/cache....
I wanted it to be in a mounted PV instead, so other pods (in same node) who needed same datasets can use without pulling again.
By the way, will downloading still happen if the datasets is available in the cache folder? Any specific settings to add to Dataset.get_local_copy()?
It gets rerouted to http://app.clearml.home.ai/dashboard . with the same network error.
May I know where to set the cert to in env variable?
I have yet to figure out how to do so, would appreciate if u could give some guidance
Thanks. The examples uses upload_artifact which stores the files in output_uri. What if I do not want to save it but simply pass to next step, is there a way to do so?