Badges 152 × Eureka!
It return false. Just to share abit more, I have the requirements.txt in gitlab with my codes and are in folders. Do I need to provide a gitlab path?
Ah I think I was not very clear on my requirement. I was looking at porting project level, not entire clearml data over. Is it possible instead?
Hello CostlyOstrich36 I am facing an issue now. basically i installed all necessary python packages in my docker image. But somehow, the clearml-agent does not seems to be able to detect these global packages. I don't see them in the "installed packages". Any advice?
Example i build my docker image using a image in docker hub. In this image, i installed torch and cupy packages. But when i run my experiment in this image, the packages are not found.
Yes, I ran the experiment inside.
Do u have an example of how I can define the packages to be installed for every steps of the pipeline?
SuccessfulKoala55 i tried comment off fileserver, clearml dockers started but it doesn't seems to be able to start well. When I access clearml via webbrowser, site cannot be reached.
Just to confirm, I commented off these in docker-compose.yaml.
OK let me try by adding to vol mount.
Hi Bart, yes. Running with inference container.
Not exactly sure yet but I would think user tag for deployed make sense as it should be a deliberated user action. And additional system state is required too since a deployed state should have some pre-requitise system state.
I would also like to ask if clearml has different states for a task, model, or even different task types? Right now I dun see differences, is this a deliberated design?
I guess we need to understand the purpose of the various states. So far only "archive, draft, publish". Did I miss any?
Thanks. The examples uses upload_artifact which stores the files in output_uri. What if I do not want to save it but simply pass to next step, is there a way to do so?
Hi @<1523701070390366208:profile|CostlyOstrich36> , basically
- I uploaded dataset using clearml Datasets. The output_uri is pointed to my s3, thus the dataset is stored in s3. My s3 is setup with http only.
- When I retrieve the dataset for training, using
Dataset.get(), I encountered ssl cert error as the url to retrieve data was
s3://<s3url>/...which is http. This is weird as the dataset url is without https.
- I am not too sure why and I susp...
May I know where to set the cert to in env variable?
Try add the <path to your cert> for s3.credentials.verify.
U want to share your clearml.conf here?
I not very sure tbh. Just want to see if this is useful....
I got SSL error few days back and I solved it by adding cert to
/etc/ssl/certs and perform
Add this. Note that
verify might not work with
sdk.aws.s3.credentials . Pls see the attached image.
When I run as regular remote task it works. But when I run as a step in pipeline, it cannot access the same folder in my local machine.
Just to add, when I run the pipeline locally it works as well.
Yes. The training working well with cuda
I have yet to figure out how to do so, would appreciate if u could give some guidance
@<1523701205467926528:profile|AgitatedDove14> when my codes get the clearml datasets, it stores in the cache e.g. /$HOME/.clearml/cache....
I wanted it to be in a mounted PV instead, so other pods (in same node) who needed same datasets can use without pulling again.
By the way, will downloading still happen if the datasets is available in the cache folder? Any specific settings to add to Dataset.get_local_copy()?
Hi SuccessfulKoala55 Thanks for pointing me to this repo. Was using this repo.
I didn't manage to find in this repo that if we still require to label the node app=clearml, like what was mentioned in the deprecated repo. Although from the values.yaml, the node selector is empty. Would u be able to advise?
How is the clearml data handled now then? Thanks
Cool thanks guys. I am clearer now. Was confused by the obsolete info. Thanks for the clarification.
Hi CostlyOstrich36 I have run this task locally at first. This attempt was successful.
When I use this task to run in a pipeline (task was run remotely), it cannot find the external package. This seems logical but I not sure how to resolve this.
@<1526734383564722176:profile|BoredBat47> Just to check if u need to do update-ca-certificates or equivalent?
Ok. Can I check that only the main script was stored in the task but not the dependent packages?
I guess the more correct way is to upload to some repo where the remote task can still pull from it?