Reputation
Badges 1
115 × Eureka!Im doing all of this because there isnt (or im not aware of) any good way understand what datasets are on workers
It looks like im moving forward
Setting url in clearml.conf without "s3" as suggested works (But I dont add port ther, not sure if it breaks something, we dont have a port)
host: " our-host.com "
Then in test_task.py
task: clearml.Task = clearml.Task.init(
project_name="project",
task_name="task",
output_uri=" None ",
)
I think connection is created
What im getting now is bucket error, i suppose I have to specify it so...
We dont need a port
"s3" is part of url that is configured on our routers, without it we cannot connect
i need clearml.conf on my clearml server (in config folder which is mounted in docker-compose) or user PC? Or Both?
Its self hosted S3 thats all I know, i dont think it s Minio
we use Ceph Storage Cluster, interface to it is the same as S3
I dont get what I have misconfigured.
The only thing I have not added is "region" field in clearml.conf because we literally dont have, its a self hosted cluster.
You can try and replicate this s3 config I have posted earlier.
Bump, still waiting, closing in on a month since we are unable to deploy. We have team of 10+ people
No, i specify where to upload
I see the data on S3 bucket is beeing uploaded. Just the log messages are really confusing
I tried it with port, but still having the same issue
Tried it with/without secure and multipart
But it seems like the data is gone, not sure how to get them back
- Here is how client side clearml.conf looks like together with the script im using to create the tasks. Uploads seems to work and is fixed thanks to you guys 🙌
Yes, but does add_external_files makes chunked zips as add_files do?
Is that supposted to be so? How to fix it?
I hope that its all the experiments
maybe someone on your end can try to parse such a config and see if they also have the same problem
@<1523701435869433856:profile|SmugDolphin23> Any ideas how to fix this?
@<1523701070390366208:profile|CostlyOstrich36> Hello, im still unable to understand how to fix this
ok, is dataset path stored in mongo?
Im unable to find it in elasticsearch (debug images were here)
ok, I found it.
Are S3 links supported?
We had a similar problem. Clearml doesnt support data migration (not that I know of)
So you have two ways to fix this:
- Recreate the dataset when its already in Azure
- Edit each elasticsearch database file entry to point to new destination (we did this)
ok, slight update. It seems like artifacts are uploading now to bucket. Maybe my folder explorer used old cache or something.
However, reported images are uploaded to fileserver instead of s3
here is the script im using to test things. Thanks
good morning, I tried the script you provided and Im getting somewhere
I solved the problem.
I had to add tensorboard loggger and pass it to pytorch_lightning trainer logger=logger
Is that normal?
How can I do that?
I need to save the original hash, otherwise I lose all trackability to about 2k experiments