Reputation
Badges 1
123 × Eureka!We fixed the issue, thanks, had to update everything to latest.
- Here is how client side clearml.conf looks like together with the script im using to create the tasks. Uploads seems to work and is fixed thanks to you guys 🙌
Is is even known if the bug is fixed on that version?
@<1709740168430227456:profile|HomelyBluewhale47> We have the same problem. Millions of files, stored on CEPH. I would not recommend you to do it this way. Everything gets insanely slow (dataset.list_files, downloading the dataset, removing files)
The way I use Clearml Datasets for large number of samples now is to save a json which stores all paths to samples in Dataset metadata:
clearml_dataset.set_metadata(metadata, metadata_name=metadata_key)
However this then means that you need wrappe...
You can check out boto3 python client (This is what we use to download / upload all S3 stuff), but minio-client probably already uses it under the hood.
We also use aws cli to do some downloading, it is way faster than python.
Regarding pdfs, yes, you have no choice but to preprocess it
No, i specify where to upload
I see the data on S3 bucket is beeing uploaded. Just the log messages are really confusing
What do you mean by reusing the task for clearml Dataset, got a code example?
We have multiple different projects with multiple people working on each project.
This is our most used code on dataset uploading
I also have noticed that this incident usually happens in the morning at around 6-7AM
Are there maybe some clearnup tasks or backups running on clearml server at those times?
@<1523701087100473344:profile|SuccessfulKoala55> Anything on this?
Our datasets are more than 1TB in size and will grow in size (probably 4TB and up), this means we also need 4TB local storage just to upload the dataset back in zipped format. This is not a good solution.
What we can do I guess is do the downloading locally by some chunks of files?
Download locally 100 files, add_to_clearml dataset, repeat
When I look at LinkEntry object, link property is correct, no duplicates. Its relative_path thats duped and also key name in _dataset_link_entries
Yes, but does add_external_files makes chunked zips as add_files do?
@<1523701601770934272:profile|GiganticMole91> Thats rookie numbers. We are at 228 GB for elastic now
Is fileserver folder needed for successful backup?
is there any way to see if I even have the data in mongodb?
I purged all docker images and it still doesnt seem right
I see no side panel and it doesnt ask for login name
I solved the problem.
I had to add tensorboard loggger and pass it to pytorch_lightning trainer logger=logger
Is that normal?
I guess I fucked up something when moving files
I hope that its all the experiments
The incident happened last friday (5 january)
Im giving you logs from around that time
I get sidebars and login on my local PC
But the data isnt loaded
I tried to not edit anything in docker-compose and just paste my data in there. Didnt help
im also batch uploading, maybe thats the problem?
- The dataset is about 1TB containing 1 million files
- I dont have the SSD space locally to do the upload
- So i download a part of the dataset, use add_files() and then upload() to that batch
- Upload the dataset
I noticed that each batch is slower and slower
how to get rid of this auto appended line
@<1523701435869433856:profile|SmugDolphin23> Setting it without http is not possible as it auto fills them back in
here is also another magic stuff