I solved the problem.
I had to add tensorboard loggger and pass it to pytorch_lightning trainer logger=logger
Is that normal?
ok, then, I have a solution, but it still makes duplicate names
- new_dataset._dataset_link_entries = {} # Cleaning all raw/a.png files
- resize a.png and save it in another location named a_resized.png
- Add back other files i need (excluding raw/a.png), I add them to new_dataset._ dataset_link_entries
- Use add_external_files to include it in dataset. Im also using dataset_path=[a list of relative paths]
What I would expect:
100 Files removed (all a.png)
100 Files added (all a_resized.png)
...
When I look at LinkEntry object, link property is correct, no duplicates. Its relative_path thats duped and also key name in _dataset_link_entries
@<1523701070390366208:profile|CostlyOstrich36> Hello John, we are still unable to use clearml with our self hosted s3 CEPH instances, is there any update on the hotfix for 1.14?
7 out of 30 GB is currently used and is quite stable
@<1523701070390366208:profile|CostlyOstrich36> 👀
We had a similar problem. Clearml doesnt support data migration (not that I know of)
So you have two ways to fix this:
- Recreate the dataset when its already in Azure
- Edit each elasticsearch database file entry to point to new destination (we did this)
ClearML team should really write up some tutorial about this. I see this question weekly now. The short answer on what we did when we migrated servers was to wite a python script that takes data from clearml mongodb(stores tasks and datasets) and elastic (stores debug image urls, logs, scalars) and migrate them to other clearml instance databases
In which ui? Because there are two ways to do it. When clicking on artifacti url there is a popup (but has no way to change host url)
Our s3 host doesnt have port (didnt specify port in clearml.conf anywhere and upload works)
![image](https://clearml-web-assets.s3.amazonaws.com/scoold/images/TT9A...
WebApp: 1.14.1-451 • Server: 1.14.1-451 • API: 2.28
Hey, i see that 1.14.2 dropped
I tried it but the issue is still there, maybe the hotfix is in next patch?
Here is the setup so you can reproduce it (we dont have region field)
clearml.conf:s3 {
use_credentials_chain: false
credentials: [
{
host: "
s3.somehost.com "
key: "XXXXXXXXXXXXXXXXXXXX"
` secret: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...
@<1523701070390366208:profile|CostlyOstrich36> It it still needed since Eugene thinks there is a bug?
It is also possible to just make a copy of all the database files and move them to another server
also, when uploading artifacts, I see where they are stored on the s3 bucket, but I cant find where the debug images are stored at
@<1523701070390366208:profile|CostlyOstrich36> Hello, im still unable to understand how to fix this
there is a typing in clearm.conf i sent you on like 87, there should be "key" not "ey" im aware of it
This is what I see on fresh clearml
Where all my mounts are on /mnt/data/clearml-server instead of /opt/clearml
@<1523701070390366208:profile|CostlyOstrich36> Still unable to understand what im doing wrong.
We have self hosted S3 Ceph storage server
Setting my config like this breaks task.init
i need clearml.conf on my clearml server (in config folder which is mounted in docker-compose) or user PC? Or Both?
Its self hosted S3 thats all I know, i dont think it s Minio
Adding bucket in clearml.conf causes the same error: clearml.storage - ERROR - Failed uploading: Could not connect to the endpoint URL: " None "
![image](https://clearml-web-assets.s3.amazonaws.com/scoold/images/TT...
Bump, still waiting, closing in on a month since we are unable to deploy. We have team of 10+ people
I tried it with port, but still having the same issue
Tried it with/without secure and multipart
@<1523701070390366208:profile|CostlyOstrich36> Updated webserver and the problem still persists
This is the new stack:
WebApp: 1.15.1-478 • Server: 1.14.1-451 • API: 2.28
notice, we didnt update API (we had running experiments)
good morning, I tried the script you provided and Im getting somewhere
So from our IT guys i now know that
"s3" part of url is subdomain, we use it in all other libs like boto3 and cloudpathlib, never had any problems
This is where the crash happens inside the clearml Task