
Reputation
Badges 1
37 × Eureka!i printed cfg
in the script and the config has not been overwritten 😢
The file has been uploaded correctly in the bucket
I tried like that clearml-task --script train.py --args overrides="log.clearml=True train.epochs=200 clearml.save=True" --project mornag-plain-dense --name mornag-plain-dense-training --queue tesla-t4 --skip-task-init
I used an env variable to avoid creating and endless loop of init/enqueue (using an argument like clearml.queue that would be captured and forwarded to the agent)
Yes it uses hydra and everything works fine without clearml. The script is similar to this one https://github.com/galatolofederico/lightning-template/blob/main/train.py
This is the full print(cfg)
` {'dataset': {'name': '<my-dataset-name>', 'path': '', 'target_shape': [128, 128]}, 'model': 'unet', 'models': {'unet': {'dim': 8, 'dim_mults': [1, 2, 4], 'num_blocks_per_stage': [2, 2, 2], 'num_self_attn_per_stage': [0, 0, 1], 'nested_unet_depths': [0, 0, 0], 'nested_unet_dim': 16, 'use_convnext': False, 'resnet_groups': 2, 'consolidate_upsample_fmaps': True, 'weight_standardize': False, 'attn_heads': 2, 'attn_dim_head': 16}}, 'train': {'accelerator': 'auto...
When enqueued the configuration tab still shows the correct arguments
But not argument is passed to the scripts. Here i am printing sys.argv
By the way, since if i create the task locally, reset it and enqueue it, it works. This is the workaround that i'm using right now
Related GitHub issue https://github.com/allegroai/clearml/issues/847
The error is2022-11-28 14:40:17,099 - clearml.storage - ERROR - Failed creating storage object
Reason: Missing key and secret for S3 storage access (
)
Yes the task is running on a remote agent with the --docker
flag
this is the config on the machine(s) running the agent
`
agent {
venvs_cache: {
max_entries: 50
free_space_threshold_gb: -1
path: ~/.clearml/venvs-cache
}
extra_docker_arguments: [
"--network", "host",
"-v", "/home/ubuntu/.ssh:/root/.ssh:ro",
"-v", "/home/ubuntu/.cache:/root/.cache",
]
docker_internal_mounts {
sdk_cache: "/clearml_agent_cache"
...
if i print config_list
from def from_config(cls, s3_configuration):
in file bucket_config.py
line 121 i get{'key': '', 'secret': '', 'region': '', 'multipart': True, 'use_credentials_chain': False, 'bucket': 'clearml', 'host': 's3.myhost.tld:443', 'token': '', 'extra_args': ConfigTree()}
Ok i did some investigations and the bug appear from version 1.8.0. In version 1.7.0 there is not. I open a issue in GitHub
1.9.0 is still affected
This is the configuration in the webapp
The debug samples are correctly uploaded in the bucket (is a minio bucket) i can see them from the minio webapp. I have used logger.report_image
I also save the models in the s3 bucket using output_uri=cfg.clearml.output_uri,
in the Task.init
I specified the upload destination in the logger. Logger.current_logger().set_default_upload_destination(cfg.clearml.media_uri)
. Yes Minio with no special config. The s3 config is in the clearml.conf
I opened the issue on github https://github.com/allegroai/clearml-web/issues/46