Reputation
Badges 1
37 × Eureka!By the way, since if i create the task locally, reset it and enqueue it, it works. This is the workaround that i'm using right now
Yes it uses hydra and everything works fine without clearml. The script is similar to this one https://github.com/galatolofederico/lightning-template/blob/main/train.py
Everything works fine i force those values using AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY env variables
Nevermind my aws config was not under sdk :face_palm:
When enqueued the configuration tab still shows the correct arguments
I installed clearml from source and printed the internal S3 configurations, basically key and secret are empty
i printed cfg in the script and the config has not been overwritten 😢
I opened the issue on github https://github.com/allegroai/clearml-web/issues/46
Also 1.9.1-312 is affected
The error is2022-11-28 14:40:17,099 - clearml.storage - ERROR - Failed creating storage object Reason: Missing key and secret for S3 storage access ( )
This is the full print(cfg)
` {'dataset': {'name': '<my-dataset-name>', 'path': '', 'target_shape': [128, 128]}, 'model': 'unet', 'models': {'unet': {'dim': 8, 'dim_mults': [1, 2, 4], 'num_blocks_per_stage': [2, 2, 2], 'num_self_attn_per_stage': [0, 0, 1], 'nested_unet_depths': [0, 0, 0], 'nested_unet_dim': 16, 'use_convnext': False, 'resnet_groups': 2, 'consolidate_upsample_fmaps': True, 'weight_standardize': False, 'attn_heads': 2, 'attn_dim_head': 16}}, 'train': {'accelerator': 'auto...
The file has been uploaded correctly in the bucket
Ok i did some investigations and the bug appear from version 1.8.0. In version 1.7.0 there is not. I open a issue in GitHub
1.9.0 is still affected
No problem! Thank you for your amazing work!
I used an env variable to avoid creating and endless loop of init/enqueue (using an argument like clearml.queue that would be captured and forwarded to the agent)
just tried 1.9.1 and it is affected
But not argument is passed to the scripts. Here i am printing sys.argv
if i print config_list from def from_config(cls, s3_configuration): in file bucket_config.py line 121 i get{'key': '', 'secret': '', 'region': '', 'multipart': True, 'use_credentials_chain': False, 'bucket': 'clearml', 'host': 's3.myhost.tld:443', 'token': '', 'extra_args': ConfigTree()}
I tried like that clearml-task --script train.py --args overrides="log.clearml=True train.epochs=200 clearml.save=True" --project mornag-plain-dense --name mornag-plain-dense-training --queue tesla-t4 --skip-task-init
I also save the models in the s3 bucket using output_uri=cfg.clearml.output_uri, in the Task.init
Related GitHub issue https://github.com/allegroai/clearml/issues/847
Yes the task is running on a remote agent with the --docker flag
this is the config on the machine(s) running the agent
`
agent {
venvs_cache: {
max_entries: 50
free_space_threshold_gb: -1
path: ~/.clearml/venvs-cache
}
extra_docker_arguments: [
"--network", "host",
"-v", "/home/ubuntu/.ssh:/root/.ssh:ro",
"-v", "/home/ubuntu/.cache:/root/.cache",
]
docker_internal_mounts {
sdk_cache: "/clearml_agent_cache"
...
The debug samples are correctly uploaded in the bucket (is a minio bucket) i can see them from the minio webapp. I have used logger.report_image

