Reputation
Badges 1
37 × Eureka!By the way, since if i create the task locally, reset it and enqueue it, it works. This is the workaround that i'm using right now
I tried like that clearml-task --script train.py --args overrides="log.clearml=True train.epochs=200 clearml.save=True" --project mornag-plain-dense --name mornag-plain-dense-training --queue tesla-t4 --skip-task-init
This is the full print(cfg)
` {'dataset': {'name': '<my-dataset-name>', 'path': '', 'target_shape': [128, 128]}, 'model': 'unet', 'models': {'unet': {'dim': 8, 'dim_mults': [1, 2, 4], 'num_blocks_per_stage': [2, 2, 2], 'num_self_attn_per_stage': [0, 0, 1], 'nested_unet_depths': [0, 0, 0], 'nested_unet_dim': 16, 'use_convnext': False, 'resnet_groups': 2, 'consolidate_upsample_fmaps': True, 'weight_standardize': False, 'attn_heads': 2, 'attn_dim_head': 16}}, 'train': {'accelerator': 'auto...
Yes. It seems a bug of the UI but it is weird that was unnoticed
The file has been uploaded correctly in the bucket
Ok i did some investigations and the bug appear from version 1.8.0. In version 1.7.0 there is not. I open a issue in GitHub
No problem! Thank you for your amazing work!
Also 1.9.1-312 is affected
if i print config_list
from def from_config(cls, s3_configuration):
in file bucket_config.py
line 121 i get{'key': '', 'secret': '', 'region': '', 'multipart': True, 'use_credentials_chain': False, 'bucket': 'clearml', 'host': 's3.myhost.tld:443', 'token': '', 'extra_args': ConfigTree()}
But not argument is passed to the scripts. Here i am printing sys.argv
When enqueued the configuration tab still shows the correct arguments
I specified the upload destination in the logger. Logger.current_logger().set_default_upload_destination(cfg.clearml.media_uri)
. Yes Minio with no special config. The s3 config is in the clearml.conf
The error is2022-11-28 14:40:17,099 - clearml.storage - ERROR - Failed creating storage object
Reason: Missing key and secret for S3 storage access (
)
Everything works fine i force those values using AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY
env variables
Yes the task is running on a remote agent with the --docker
flag
this is the config on the machine(s) running the agent
`
agent {
venvs_cache: {
max_entries: 50
free_space_threshold_gb: -1
path: ~/.clearml/venvs-cache
}
extra_docker_arguments: [
"--network", "host",
"-v", "/home/ubuntu/.ssh:/root/.ssh:ro",
"-v", "/home/ubuntu/.cache:/root/.cache",
]
docker_internal_mounts {
sdk_cache: "/clearml_agent_cache"
...
Yes it uses hydra and everything works fine without clearml. The script is similar to this one https://github.com/galatolofederico/lightning-template/blob/main/train.py
The debug samples are correctly uploaded in the bucket (is a minio bucket) i can see them from the minio webapp. I have used logger.report_image
I installed clearml from source and printed the internal S3 configurations, basically key
and secret
are empty
Nevermind my aws config was not under sdk
:face_palm:
This is the configuration in the webapp
I opened the issue on github https://github.com/allegroai/clearml-web/issues/46
just tried 1.9.1 and it is affected