Reputation
Badges 1
123 × Eureka!Im doing all of this because there isnt (or im not aware of) any good way understand what datasets are on workers
How can I do that?
I need to save the original hash, otherwise I lose all trackability to about 2k experiments
@<1523701070390366208:profile|CostlyOstrich36> Hello John, we are still unable to use clearml with our self hosted s3 CEPH instances, is there any update on the hotfix for 1.14?
we use Ceph Storage Cluster, interface to it is the same as S3
I dont get what I have misconfigured.
The only thing I have not added is "region" field in clearml.conf because we literally dont have, its a self hosted cluster.
You can try and replicate this s3 config I have posted earlier.
In which ui? Because there are two ways to do it. When clicking on artifacti url there is a popup (but has no way to change host url)
Our s3 host doesnt have port (didnt specify port in clearml.conf anywhere and upload works)
![image](https://clearml-web-assets.s3.amazonaws.com/scoold/images/TT9A...
there is a typing in clearm.conf i sent you on like 87, there should be "key" not "ey" im aware of it
py file:
task: clearml.Task = clearml.Task.init(
project_name="project",
task_name="task",
output_uri=" None ",
)
clearml.conf:
{
# This will apply to all buckets in this host (unless key/value is specifically provided for a given bucket)
host: " our-host.com "
key: "xxx"
secret: "xxx"
multipart: false
...
- Here is how client side clearml.conf looks like together with the script im using to create the tasks. Uploads seems to work and is fixed thanks to you guys 🙌
WebApp: 1.16.0-494 • Server: 1.16.0-494 • API: 2.30
But be careful, upgrading is extremely dangerous
has 8 cores, so nothing fancy even
@<1523701601770934272:profile|GiganticMole91> Thats rookie numbers. We are at 228 GB for elastic now
7 out of 30 GB is currently used and is quite stable
@<1709740168430227456:profile|HomelyBluewhale47> We have the same problem. Millions of files, stored on CEPH. I would not recommend you to do it this way. Everything gets insanely slow (dataset.list_files, downloading the dataset, removing files)
The way I use Clearml Datasets for large number of samples now is to save a json which stores all paths to samples in Dataset metadata:
clearml_dataset.set_metadata(metadata, metadata_name=metadata_key)
However this then means that you need wrappe...
You can check out boto3 python client (This is what we use to download / upload all S3 stuff), but minio-client probably already uses it under the hood.
We also use aws cli to do some downloading, it is way faster than python.
Regarding pdfs, yes, you have no choice but to preprocess it
hi, thanks for reaching out. Getting desperate here.
Yes, its self hosted
No, only currently running experiments are deleted (task itself is gone, but debug images and models are present in fileserver folder)
What I do see is some random elastisearch errors popping up from time to time
[2024-01-05 09:16:47,707] [9] [WARNING] [elasticsearch] POST
None ` [status:N/A requ...
- is 50GB elastisearch normal? Have you seen it. elsewhere or are we doing something wrong, one thing I think is that we are probably logging too frequently
- Is it possible to somehow clean up this?
I also have noticed that this incident usually happens in the morning at around 6-7AM
Are there maybe some clearnup tasks or backups running on clearml server at those times?
Is that supposted to be so? How to fix it?
maybe someone on your end can try to parse such a config and see if they also have the same problem
will it be appended in clearml?
"s3" is part of domain to the host
Our datasets are more than 1TB in size and will grow in size (probably 4TB and up), this means we also need 4TB local storage just to upload the dataset back in zipped format. This is not a good solution.
What we can do I guess is do the downloading locally by some chunks of files?
Download locally 100 files, add_to_clearml dataset, repeat
I was on 1.7 version and now im on latest 1.11
Cant get screenshow yet (copying data), will add later.
What worries me is that config and agent folders are empty. I can reconfigure all agents, no problems.
But where is info about projects stored?
This is what I see on fresh clearml
Where all my mounts are on /mnt/data/clearml-server instead of /opt/clearml
I get the same when I copy /opt/clearml/data folder into /mnt/data/clearml/data
I hope that its all the experiments