Reputation
Badges 1
123 × Eureka!No, i specify where to upload
I see the data on S3 bucket is beeing uploaded. Just the log messages are really confusing
What do you mean by reusing the task for clearml Dataset, got a code example?
We have multiple different projects with multiple people working on each project.
This is our most used code on dataset uploading
I also have noticed that this incident usually happens in the morning at around 6-7AM
Are there maybe some clearnup tasks or backups running on clearml server at those times?
@<1523701087100473344:profile|SuccessfulKoala55> Anything on this?
Our datasets are more than 1TB in size and will grow in size (probably 4TB and up), this means we also need 4TB local storage just to upload the dataset back in zipped format. This is not a good solution.
What we can do I guess is do the downloading locally by some chunks of files?
Download locally 100 files, add_to_clearml dataset, repeat
When I look at LinkEntry object, link property is correct, no duplicates. Its relative_path thats duped and also key name in _dataset_link_entries
Yes, but does add_external_files makes chunked zips as add_files do?
@<1523701601770934272:profile|GiganticMole91> Thats rookie numbers. We are at 228 GB for elastic now
Is fileserver folder needed for successful backup?
is there any way to see if I even have the data in mongodb?
I purged all docker images and it still doesnt seem right
I see no side panel and it doesnt ask for login name
I guess I fucked up something when moving files
I hope that its all the experiments
The incident happened last friday (5 january)
Im giving you logs from around that time
I get sidebars and login on my local PC
But the data isnt loaded
I tried to not edit anything in docker-compose and just paste my data in there. Didnt help
im also batch uploading, maybe thats the problem?
- The dataset is about 1TB containing 1 million files
- I dont have the SSD space locally to do the upload
- So i download a part of the dataset, use add_files() and then upload() to that batch
- Upload the dataset
I noticed that each batch is slower and slower
how to get rid of this auto appended line
@<1523701435869433856:profile|SmugDolphin23> Setting it without http is not possible as it auto fills them back in
here is also another magic stuff
I have tried:
Airflow - Pain to setup, old UI and other problems
Prefect - Literaly just tried to setup a simple distributed system, took me a week, I do not recommend this tool at all, horrible documentation, noone helps at slack.
Dagster - Absolute beauty, nice UI, easy to setup (as a pip package or just a docker + postgres), i highly recommend this tool. Takes a bit to get used to it. I will in coming week try this combo of dagster + clearml, where i periodically check some things and if...
@<1523701482157772800:profile|AnxiousSeal95> I see a lot of people here migrating data from one data source to another.
For us it was that we experimented with Clearml to get the feeling and we used clearml built in file storage to save debug images an all other artifacts.
Then we grew rapidly and we had to migrate to S3 storage.
I had to write a script that goes through elasticsearch and mongo db to point to new S3 links wher the data was migrated to.
I do however understand that migration...
Is that supposted to be so? How to fix it?
has 8 cores, so nothing fancy even
Where can i override this so that it uses uv instead of trying to install python with apt
WebApp: 1.16.0-494 • Server: 1.16.0-494 • API: 2.30
But be careful, upgrading is extremely dangerous