Hi ExasperatedCrab78 I managed to get it. It was due to ip address set in examples.env.
SdK meaning I run the agent using clearml-agent daemon ....
Alternatively I understand I can also run the agent using docker run allegroai/clearml-agent:latest.
But I cannot figure out how to add --restart, --queue, -- gpus flag to the container
Ok. Can I check that only the main script was stored in the task but not the dependent packages?
I guess the more correct way is to upload to some repo where the remote task can still pull from it?
Clearml 1.1.1. Yes, i have boto3 installed too.
Thanks I just realised I didn't add --docker
May I know where to set the cert to in env variable?
@<1523701205467926528:profile|AgitatedDove14> when my codes get the clearml datasets, it stores in the cache e.g. /$HOME/.clearml/cache....
I wanted it to be in a mounted PV instead, so other pods (in same node) who needed same datasets can use without pulling again.
It gets rerouted to http://app.clearml.home.ai/dashboard . with the same network error.
When I run as regular remote task it works. But when I run as a step in pipeline, it cannot access the same folder in my local machine.
Just to add, when I run the pipeline locally it works as well.
Hi CostlyOstrich36 I have run this task locally at first. This attempt was successful.
When I use this task to run in a pipeline (task was run remotely), it cannot find the external package. This seems logical but I not sure how to resolve this.
SuccessfulKoala55 Nope. I didn't even get to enter my name. I suspect there is some mistake in mapping the data folder.
Was using the template in https://github.com/allegroai/clearml-helm-charts to deploy.
SuccessfulKoala55 i tried comment off fileserver, clearml dockers started but it doesn't seems to be able to start well. When I access clearml via webbrowser, site cannot be reached.
Just to confirm, I commented off these in docker-compose.yaml.
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
`...
@<1523701070390366208:profile|CostlyOstrich36> Yes. I'm running on k8s
Yea. Added an issue. We can follow up from there. Really hope that clearml serving can work, is a nice project.
Not exactly sure yet but I would think user tag for deployed make sense as it should be a deliberated user action. And additional system state is required too since a deployed state should have some pre-requitise system state.
I would also like to ask if clearml has different states for a task, model, or even different task types? Right now I dun see differences, is this a deliberated design?
@<1523701070390366208:profile|CostlyOstrich36> This is output_uri or where do I put this url?
Hi TimelyPenguin76 , nope. I don't see any errors. That's why not sure what went wrong
I figured out that it maybe possible to do theseexperiment_task = Task.current_task()
OutputModel(experiment_task ).update_weights('
http://model.pt ')
to attach it to the ClearML experiment task.
By the way, will downloading still happen if the datasets is available in the cache folder? Any specific settings to add to Dataset.get_local_copy()?
Cool thanks guys. I am clearer now. Was confused by the obsolete info. Thanks for the clarification.
Hi SuccessfulKoala55 Thanks for pointing me to this repo. Was using this repo.
I didn't manage to find in this repo that if we still require to label the node app=clearml, like what was mentioned in the deprecated repo. Although from the values.yaml, the node selector is empty. Would u be able to advise?
How is the clearml data handled now then? Thanks
Yes. But I not sure what's the agent running. I only know how to stop it if I have the agent id
Nice. That should work. Thanks
Thanks AgitatedDove14 . Specifically, I wanted to use my own clearml server and Triton. Thus, I attempted to use --engine-container-args during launch but error saying no such flag. Looked into --help but I guessed it is not updated yet.
U want to share your clearml.conf here?