Reputation
Badges 1
123 × Eureka!ok, then, I have a solution, but it still makes duplicate names
- new_dataset._dataset_link_entries = {} # Cleaning all raw/a.png files
- resize a.png and save it in another location named a_resized.png
- Add back other files i need (excluding raw/a.png), I add them to new_dataset._ dataset_link_entries
- Use add_external_files to include it in dataset. Im also using dataset_path=[a list of relative paths]
What I would expect:
100 Files removed (all a.png)
100 Files added (all a_resized.png)
...
In which ui? Because there are two ways to do it. When clicking on artifacti url there is a popup (but has no way to change host url)
Our s3 host doesnt have port (didnt specify port in clearml.conf anywhere and upload works)
![image](https://clearml-web-assets.s3.amazonaws.com/scoold/images/TT9A...
@<1523701435869433856:profile|SmugDolphin23> Setting it without http is not possible as it auto fills them back in
ok, slight update. It seems like artifacts are uploading now to bucket. Maybe my folder explorer used old cache or something.
However, reported images are uploaded to fileserver instead of s3
here is the script im using to test things. Thanks
py file:
task: clearml.Task = clearml.Task.init(
project_name="project",
task_name="task",
output_uri=" None ",
)
clearml.conf:
{
# This will apply to all buckets in this host (unless key/value is specifically provided for a given bucket)
host: " our-host.com "
key: "xxx"
secret: "xxx"
multipart: false
...
No, i specify where to upload
I see the data on S3 bucket is beeing uploaded. Just the log messages are really confusing
- Here is how client side clearml.conf looks like together with the script im using to create the tasks. Uploads seems to work and is fixed thanks to you guys 🙌
@<1523703436166565888:profile|DeterminedCrab71> Thanks for responding
It was unclear to me that I need to set 443 also everywhere in clearml.conf
Setting s3 host urls with 443 in clearml.conf and also in web UI made it work
Im now almost at the finish line. The last thing that would be great is to fix archived task deletion.
For some reason i have error of missing S3 keys in clearml docker compose logs, the folder / files are not deleted in S3 bucket.
You can see how storage_credentials.co...
good morning, I tried the script you provided and Im getting somewhere
@<1523701070390366208:profile|CostlyOstrich36> Still unable to understand what im doing wrong.
We have self hosted S3 Ceph storage server
Setting my config like this breaks task.init
We had a similar problem. Clearml doesnt support data migration (not that I know of)
So you have two ways to fix this:
- Recreate the dataset when its already in Azure
- Edit each elasticsearch database file entry to point to new destination (we did this)
The problem is that clearml.conf s3 config doesnt support empty region field, even empty strings crashes it
Is it possible to split the large elasticsearch indexes? I know elasticsearch has something called rollover, but im not sure that clearml supports this
@<1523701070390366208:profile|CostlyOstrich36> Hello, im still unable to understand how to fix this
ClearML team should really write up some tutorial about this. I see this question weekly now. The short answer on what we did when we migrated servers was to wite a python script that takes data from clearml mongodb(stores tasks and datasets) and elastic (stores debug image urls, logs, scalars) and migrate them to other clearml instance databases
It is also possible to just make a copy of all the database files and move them to another server
@<1523701601770934272:profile|GiganticMole91> Thats rookie numbers. We are at 228 GB for elastic now
WebApp: 1.16.0-494 • Server: 1.16.0-494 • API: 2.30
But be careful, upgrading is extremely dangerous
I have tried:
Airflow - Pain to setup, old UI and other problems
Prefect - Literaly just tried to setup a simple distributed system, took me a week, I do not recommend this tool at all, horrible documentation, noone helps at slack.
Dagster - Absolute beauty, nice UI, easy to setup (as a pip package or just a docker + postgres), i highly recommend this tool. Takes a bit to get used to it. I will in coming week try this combo of dagster + clearml, where i periodically check some things and if...
@<1523701482157772800:profile|AnxiousSeal95> I see a lot of people here migrating data from one data source to another.
For us it was that we experimented with Clearml to get the feeling and we used clearml built in file storage to save debug images an all other artifacts.
Then we grew rapidly and we had to migrate to S3 storage.
I had to write a script that goes through elasticsearch and mongo db to point to new S3 links wher the data was migrated to.
I do however understand that migration...
It looks like im moving forward
Setting url in clearml.conf without "s3" as suggested works (But I dont add port ther, not sure if it breaks something, we dont have a port)
host: " our-host.com "
Then in test_task.py
task: clearml.Task = clearml.Task.init(
project_name="project",
task_name="task",
output_uri=" None ",
)
I think connection is created
What im getting now is bucket error, i suppose I have to specify it so...
What do you mean by reusing the task for clearml Dataset, got a code example?
We have multiple different projects with multiple people working on each project.
This is our most used code on dataset uploading
we use Ceph Storage Cluster, interface to it is the same as S3
I dont get what I have misconfigured.
The only thing I have not added is "region" field in clearml.conf because we literally dont have, its a self hosted cluster.
You can try and replicate this s3 config I have posted earlier.
How can I do that?
I need to save the original hash, otherwise I lose all trackability to about 2k experiments
@<1523701070390366208:profile|CostlyOstrich36> Any news on this? We are currently stuck without this fix, cant finish up clearml setup