Reputation
Badges 1
123 × Eureka!ok, then, I have a solution, but it still makes duplicate names
- new_dataset._dataset_link_entries = {} # Cleaning all raw/a.png files
- resize a.png and save it in another location named a_resized.png
- Add back other files i need (excluding raw/a.png), I add them to new_dataset._ dataset_link_entries
- Use add_external_files to include it in dataset. Im also using dataset_path=[a list of relative paths]
What I would expect:
100 Files removed (all a.png)
100 Files added (all a_resized.png)
...
I cant get the conf credentials to work
Specifying it like this gives me:
Exception has occurred: ValueError
Could not get access credentials for ' None ' , check configuration file ~/clearml.conf
I get sidebars and login on my local PC
But the data isnt loaded
I tried to not edit anything in docker-compose and just paste my data in there. Didnt help
But there are stil some wierd issues, i cannot see the files uploaded in bucket
we use Ceph Storage Cluster, interface to it is the same as S3
I dont get what I have misconfigured.
The only thing I have not added is "region" field in clearml.conf because we literally dont have, its a self hosted cluster.
You can try and replicate this s3 config I have posted earlier.
What do you mean by reusing the task for clearml Dataset, got a code example?
We have multiple different projects with multiple people working on each project.
This is our most used code on dataset uploading
I also have noticed that this incident usually happens in the morning at around 6-7AM
Are there maybe some clearnup tasks or backups running on clearml server at those times?
from docker inspect I can see that allegorai/clearml uses:
"CLEARML_SERVER_VERSION=1.11.0",
"CLEARML_SERVER_BUILD=373"
Image hash:ed05631045c4237f59ad48f477e06dd72274ab67e70d2f9adc489431d1ce75d7
Here are my clearml versions and elastisearch taking up 50GB
No, i specify where to upload
I see the data on S3 bucket is beeing uploaded. Just the log messages are really confusing
I already found the source code and i modified it as needed.
How can I now get this info from Task that is created when Dataset is created?
Couldnt find anything like clearml.Dataset(id=id).get_size()
ClearML team should really write up some tutorial about this. I see this question weekly now. The short answer on what we did when we migrated servers was to wite a python script that takes data from clearml mongodb(stores tasks and datasets) and elastic (stores debug image urls, logs, scalars) and migrate them to other clearml instance databases
It is also possible to just make a copy of all the database files and move them to another server
Is is even known if the bug is fixed on that version?
Is fileserver folder needed for successful backup?
You can check out boto3 python client (This is what we use to download / upload all S3 stuff), but minio-client probably already uses it under the hood.
We also use aws cli to do some downloading, it is way faster than python.
Regarding pdfs, yes, you have no choice but to preprocess it
- is 50GB elastisearch normal? Have you seen it. elsewhere or are we doing something wrong, one thing I think is that we are probably logging too frequently
- Is it possible to somehow clean up this?
we are cleaning, but there is a major problem
When deleting a task from web UI, nothing is deleted elsewhere
Debug images are not deleted, models are not deleted. And I suspect that scalars and logs are not deleted too
Im not sure why is that so
@<1523701087100473344:profile|SuccessfulKoala55> Anything on this?
I see the debug images in fileserver folder
What you want is to have a service script that cleans up archived tasks, here is what we used: None
ok, I found it.
Are S3 links supported?
How can I do that?
I need to save the original hash, otherwise I lose all trackability to about 2k experiments



