
Reputation
Badges 1
59 × Eureka!SuccessfulKoala55 i tried comment off fileserver, clearml dockers started but it doesn't seems to be able to start well. When I access clearml via webbrowser, site cannot be reached.
Just to confirm, I commented off these in docker-compose.yaml.
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
`...
@<1523701205467926528:profile|AgitatedDove14> do u mean not using helm but fill up the values and install with the yaml files directly? E.g. kubectl apply ...
May I know where to set the cert to in env variable?
Hi @<1523701070390366208:profile|CostlyOstrich36> , basically
- I uploaded dataset using clearml Datasets. The output_uri is pointed to my s3, thus the dataset is stored in s3. My s3 is setup with http only.
- When I retrieve the dataset for training, using
Dataset.get()
, I encountered ssl cert error as the url to retrieve data washttps://<s3url>/...
instead ofs3://<s3url>/...
which is http. This is weird as the dataset url is without https. - I am not too sure why and I susp...
It return false. Just to share abit more, I have the requirements.txt in gitlab with my codes and are in folders. Do I need to provide a gitlab path?
I see. Was wondering any advantage to do it any of the ways.
Yup. But I happened to reinstall my server and the data is lost. And the agent continue running.
Ok. Can I check that only the main script was stored in the task but not the dependent packages?
I guess the more correct way is to upload to some repo where the remote task can still pull from it?
And just a suggestion which maybe I can post in GitHub issue too.
It is not very clear what are the purpose of the project name and name, even after I read the --help. Perhaps this is something that can be made clearer when updating the docu?
seems like it was broken for numpy version 1.24.1.
Tried with numpy 1.23.5 and it works.
Do u have an example of how I can define the packages to be installed for every steps of the pipeline?
@<1523701070390366208:profile|CostlyOstrich36> Yes. I'm running on k8s
Hi ExasperatedCrab78 I managed to get it. It was due to ip address set in examples.env.
Hi CostlyOstrich36 I have run this task locally at first. This attempt was successful.
When I use this task to run in a pipeline (task was run remotely), it cannot find the external package. This seems logical but I not sure how to resolve this.
By the way, how can I start up the clearml agent using the clearml-agent image instead of SDK? Do u have an example of the docker run command that includes the queue, gpus etc?
This is what I got. and when I see http400 error in the console.
Nice. That should work. Thanks
Hi Bart, yes. Running with inference container.
@<1526734383564722176:profile|BoredBat47> Just to check if u need to do update-ca-certificates or equivalent?
U want to share your clearml.conf here?
Ah I think I was not very clear on my requirement. I was looking at porting project level, not entire clearml data over. Is it possible instead?
Thanks AgitatedDove14 . Specifically, I wanted to use my own clearml server and Triton. Thus, I attempted to use --engine-container-args during launch but error saying no such flag. Looked into --help but I guessed it is not updated yet.
SuccessfulKoala55 Nope. I didn't even get to enter my name. I suspect there is some mistake in mapping the data folder.
Was using the template in https://github.com/allegroai/clearml-helm-charts to deploy.
Can clearml-serving does helm install or upgrade? We have cases where the ml models do not come from the ml experiments in clearml. But would like to tap on clearml q to enable resource queuing.
Nice. It is actually dataset.id
.
A more advanced case will be to decide how long this job should run amd terminate after that. This is to improve the usage of gpu