Reputation
Badges 1
59 × Eureka!Can clearml-serving does helm install or upgrade? We have cases where the ml models do not come from the ml experiments in clearml. But would like to tap on clearml q to enable resource queuing.
I figured out that it maybe possible to do theseexperiment_task = Task.current_task()OutputModel(experiment_task ).update_weights(' http://model.pt ') to attach it to the ClearML experiment task.
Thanks @<1523701205467926528:profile|AgitatedDove14> . what I could think of is to write a task that may run python subproecss to do "helm install". In those python script, we could point to /download the helm chart from somewhere (e.g. nfs, s3).
Does this sound right to u?
Anything that I was wondering is if we could pass the helm charts /files when we uses clearml sdk, so we could minimise the step to push them to the nfs/s3.
This is what I got. and when I see http400 error in the console.
When I run as regular remote task it works. But when I run as a step in pipeline, it cannot access the same folder in my local machine.
It gets rerouted to http://app.clearml.home.ai/dashboard . with the same network error.
Do u have an example of how I can define the packages to be installed for every steps of the pipeline?
A more advanced case will be to decide how long this job should run amd terminate after that. This is to improve the usage of gpu
Not exactly sure yet but I would think user tag for deployed make sense as it should be a deliberated user action. And additional system state is required too since a deployed state should have some pre-requitise system state.
I would also like to ask if clearml has different states for a task, model, or even different task types? Right now I dun see differences, is this a deliberated design?
I guess we need to understand the purpose of the various states. So far only "archive, draft, publish". Did I miss any?
Hi @<1523701070390366208:profile|CostlyOstrich36> , basically
- I uploaded dataset using clearml Datasets. The output_uri is pointed to my s3, thus the dataset is stored in s3. My s3 is setup with http only.
- When I retrieve the dataset for training, using
Dataset.get(), I encountered ssl cert error as the url to retrieve data washttps://<s3url>/...instead ofs3://<s3url>/...which is http. This is weird as the dataset url is without https. - I am not too sure why and I susp...
Thanks AgitatedDove14 and TimelyMouse69 . The intention was to have some traceability between the two setups. I think the best way is to enforce some naming convention (for project and name) so we can know how they are related? Any better suggestions?
I see. Was wondering any advantage to do it any of the ways.
Hi SuccessfulKoala55 Thanks for pointing me to this repo. Was using this repo.
I didn't manage to find in this repo that if we still require to label the node app=clearml, like what was mentioned in the deprecated repo. Although from the values.yaml, the node selector is empty. Would u be able to advise?
How is the clearml data handled now then? Thanks
https://clear.ml/docs/latest/docs/integrations/storage/
Try add the <path to your cert> for s3.credentials.verify.
@<1523701070390366208:profile|CostlyOstrich36> Yes. I'm running on k8s
Yea. Added an issue. We can follow up from there. Really hope that clearml serving can work, is a nice project.
Hi Bart, yes. Running with inference container.
@<1523701070390366208:profile|CostlyOstrich36> This is output_uri or where do I put this url?
Yes. But I not sure what's the agent running. I only know how to stop it if I have the agent id
Yup. But I happened to reinstall my server and the data is lost. And the agent continue running.
Nice. It is actually dataset.id .
And just a suggestion which maybe I can post in GitHub issue too.
It is not very clear what are the purpose of the project name and name, even after I read the --help. Perhaps this is something that can be made clearer when updating the docu?
CostlyOstrich36 I mean the dataset object in clearml as well as the data that is tied to this object.
The intent is to bring over to another clearlml setup and keep some form of traceability.
@<1523701205467926528:profile|AgitatedDove14> I looking at a queue system which clearml q offers that allow user to queue job to deploy an app / inference service. This cam be as simple as a pod or a more complete helm chart.
Nice. That should work. Thanks