Reputation
Badges 1
59 × Eureka!I see. Was wondering any advantage to do it any of the ways.
Not exactly sure yet but I would think user tag for deployed make sense as it should be a deliberated user action. And additional system state is required too since a deployed state should have some pre-requitise system state.
I would also like to ask if clearml has different states for a task, model, or even different task types? Right now I dun see differences, is this a deliberated design?
To clarify, there might be cases where we get helm chart /k8s manifests to deploy a inference services. A black box to us.
Users may need to deploy this service where needed to test out against other software components. This needs gpu resources which a queue system will allow them to queue up and eventually get this deployed instead of hard resource allocation to this purpose
I guess we need to understand the purpose of the various states. So far only "archive, draft, publish". Did I miss any?
Hello CostlyOstrich36 I am facing an issue now. basically i installed all necessary python packages in my docker image. But somehow, the clearml-agent does not seems to be able to detect these global packages. I don't see them in the "installed packages". Any advice?
Hi @<1523701070390366208:profile|CostlyOstrich36> , basically
- I uploaded dataset using clearml Datasets. The output_uri is pointed to my s3, thus the dataset is stored in s3. My s3 is setup with http only.
- When I retrieve the dataset for training, using
Dataset.get(), I encountered ssl cert error as the url to retrieve data washttps://<s3url>/...instead ofs3://<s3url>/...which is http. This is weird as the dataset url is without https. - I am not too sure why and I susp...
@<1523701070390366208:profile|CostlyOstrich36> This is output_uri or where do I put this url?
@<1523701205467926528:profile|AgitatedDove14> when my codes get the clearml datasets, it stores in the cache e.g. /$HOME/.clearml/cache....
I wanted it to be in a mounted PV instead, so other pods (in same node) who needed same datasets can use without pulling again.
U want to share your clearml.conf here?
https://clear.ml/docs/latest/docs/integrations/storage/
Try add the <path to your cert> for s3.credentials.verify.
Thanks I just realised I didn't add --docker
A more advanced case will be to decide how long this job should run amd terminate after that. This is to improve the usage of gpu
SdK meaning I run the agent using clearml-agent daemon ....
Alternatively I understand I can also run the agent using docker run allegroai/clearml-agent:latest. But I cannot figure out how to add --restart, --queue, -- gpus flag to the container
When I run as regular remote task it works. But when I run as a step in pipeline, it cannot access the same folder in my local machine.
Can clearml-serving does helm install or upgrade? We have cases where the ml models do not come from the ml experiments in clearml. But would like to tap on clearml q to enable resource queuing.
Ok. Can I check that only the main script was stored in the task but not the dependent packages?
I guess the more correct way is to upload to some repo where the remote task can still pull from it?
@<1523701205467926528:profile|AgitatedDove14> I still trying to figure out how to do so. Coz when I add a task in queue, clearml agent basically creates a pod with the container. How can I make a task that does a helm install or kubectl create deployment.yaml?
I figured out that it maybe possible to do theseexperiment_task = Task.current_task()OutputModel(experiment_task ).update_weights(' http://model.pt ') to attach it to the ClearML experiment task.
SuccessfulKoala55 i tried comment off fileserver, clearml dockers started but it doesn't seems to be able to start well. When I access clearml via webbrowser, site cannot be reached.
Just to confirm, I commented off these in docker-compose.yaml.
apiserver:command:- apiservercontainer_name: clearml-apiserverimage: allegroai/clearml:latestrestart: unless-stoppedvolumes:- /opt/clearml/logs:/var/log/clearml
`...
Clearml 1.1.1. Yes, i have boto3 installed too.
Thanks AgitatedDove14 and TimelyMouse69 . The intention was to have some traceability between the two setups. I think the best way is to enforce some naming convention (for project and name) so we can know how they are related? Any better suggestions?
I was browsing clearml agent gihub and saw this. Isn't this for spinning up clearml-agent in a docker and perform like a daemon?
Hi CostlyOstrich36 I have run this task locally at first. This attempt was successful.
When I use this task to run in a pipeline (task was run remotely), it cannot find the external package. This seems logical but I not sure how to resolve this.
Yup. But I happened to reinstall my server and the data is lost. And the agent continue running.
I not very sure tbh. Just want to see if this is useful....
I got SSL error few days back and I solved it by adding cert to /etc/ssl/certs and perform update-ca-certificates .
export REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
Add this. Note that verify might not work with sdk.aws.s3.verify but sdk.aws.s3.credentials . Pls see the attached image.
Example:aws {s3 {credentials: [{
` ...
And just a suggestion which maybe I can post in GitHub issue too.
It is not very clear what are the purpose of the project name and name, even after I read the --help. Perhaps this is something that can be made clearer when updating the docu?