WebApp: 1.14.1-451 • Server: 1.14.1-451 • API: 2.28
I solved the problem.
I had to add tensorboard loggger and pass it to pytorch_lightning trainer logger=logger
Is that normal?
@<1523701070390366208:profile|CostlyOstrich36> Any news on this? We are currently stuck without this fix, cant finish up clearml setup
@<1523701070390366208:profile|CostlyOstrich36> 👀
Yes, credetials seems to work
Im trying to figure out not why I dont see the uploaded files / folders
- I checked maybe clearml task uses fileserver instead but i dont see any files in fileserver folder
- Nothing is uploaded in bucket (i will ask IT guy to check if im uploading any files in logs)
I have tried:
Airflow - Pain to setup, old UI and other problems
Prefect - Literaly just tried to setup a simple distributed system, took me a week, I do not recommend this tool at all, horrible documentation, noone helps at slack.
Dagster - Absolute beauty, nice UI, easy to setup (as a pip package or just a docker + postgres), i highly recommend this tool. Takes a bit to get used to it. I will in coming week try this combo of dagster + clearml, where i periodically check some things and if...
I hope that its all the experiments
i can add "source /workspace/.venv/bin/activate", to clearml.conf docker_init_bash_script
However it then tries to access pip, but i dont need no pip, how to disable it, i already have my packages, and uv doesnt even require pip
is there any way to see if I even have the data in mongodb?
Is it possible to split the large elasticsearch indexes? I know elasticsearch has something called rollover, but im not sure that clearml supports this
Im doing all of this because there isnt (or im not aware of) any good way understand what datasets are on workers
7 out of 30 GB is currently used and is quite stable
has 8 cores, so nothing fancy even
You can check out boto3 python client (This is what we use to download / upload all S3 stuff), but minio-client probably already uses it under the hood.
We also use aws cli to do some downloading, it is way faster than python.
Regarding pdfs, yes, you have no choice but to preprocess it
@<1523701482157772800:profile|AnxiousSeal95> I see a lot of people here migrating data from one data source to another.
For us it was that we experimented with Clearml to get the feeling and we used clearml built in file storage to save debug images an all other artifacts.
Then we grew rapidly and we had to migrate to S3 storage.
I had to write a script that goes through elasticsearch and mongo db to point to new S3 links wher the data was migrated to.
I do however understand that migration...
I was on 1.7 version and now im on latest 1.11
Cant get screenshow yet (copying data), will add later.
What worries me is that config and agent folders are empty. I can reconfigure all agents, no problems.
But where is info about projects stored?
I also see that elastisearch and mongo has some data
Our datasets are more than 1TB in size and will grow in size (probably 4TB and up), this means we also need 4TB local storage just to upload the dataset back in zipped format. This is not a good solution.
What we can do I guess is do the downloading locally by some chunks of files?
Download locally 100 files, add_to_clearml dataset, repeat
Yes, but does add_external_files makes chunked zips as add_files do?
im also batch uploading, maybe thats the problem?
- The dataset is about 1TB containing 1 million files
- I dont have the SSD space locally to do the upload
- So i download a part of the dataset, use add_files() and then upload() to that batch
- Upload the dataset
I noticed that each batch is slower and slower
Sounds similar to our issue? We have self hosted S3
None
I need the zipping, chunking to manage millions of files
ok, I found it.
Are S3 links supported?
I already found the source code and i modified it as needed.
How can I now get this info from Task that is created when Dataset is created?
Couldnt find anything like clearml.Dataset(id=id).get_size()
What you want is to have a service script that cleans up archived tasks, here is what we used: None
WebApp: 1.16.0-494 • Server: 1.16.0-494 • API: 2.30
But be careful, upgrading is extremely dangerous