
Reputation
Badges 1
37 × Eureka!just tried 1.9.1 and it is affected
I tried like that clearml-task --script train.py --args overrides="log.clearml=True train.epochs=200 clearml.save=True" --project mornag-plain-dense --name mornag-plain-dense-training --queue tesla-t4 --skip-task-init
Yes the task is running on a remote agent with the --docker
flag
this is the config on the machine(s) running the agent
`
agent {
venvs_cache: {
max_entries: 50
free_space_threshold_gb: -1
path: ~/.clearml/venvs-cache
}
extra_docker_arguments: [
"--network", "host",
"-v", "/home/ubuntu/.ssh:/root/.ssh:ro",
"-v", "/home/ubuntu/.cache:/root/.cache",
]
docker_internal_mounts {
sdk_cache: "/clearml_agent_cache"
...
i printed cfg
in the script and the config has not been overwritten 😢
I used an env variable to avoid creating and endless loop of init/enqueue (using an argument like clearml.queue that would be captured and forwarded to the agent)
By the way, since if i create the task locally, reset it and enqueue it, it works. This is the workaround that i'm using right now
The file has been uploaded correctly in the bucket
Related GitHub issue https://github.com/allegroai/clearml/issues/847
1.9.0 is still affected
The debug samples are correctly uploaded in the bucket (is a minio bucket) i can see them from the minio webapp. I have used logger.report_image
I also save the models in the s3 bucket using output_uri=cfg.clearml.output_uri,
in the Task.init
I opened the issue on github https://github.com/allegroai/clearml-web/issues/46
Ok i did some investigations and the bug appear from version 1.8.0. In version 1.7.0 there is not. I open a issue in GitHub
No problem! Thank you for your amazing work!
Also 1.9.1-312 is affected
I specified the upload destination in the logger. Logger.current_logger().set_default_upload_destination(cfg.clearml.media_uri)
. Yes Minio with no special config. The s3 config is in the clearml.conf
Yes. It seems a bug of the UI but it is weird that was unnoticed
This is the full print(cfg)
` {'dataset': {'name': '<my-dataset-name>', 'path': '', 'target_shape': [128, 128]}, 'model': 'unet', 'models': {'unet': {'dim': 8, 'dim_mults': [1, 2, 4], 'num_blocks_per_stage': [2, 2, 2], 'num_self_attn_per_stage': [0, 0, 1], 'nested_unet_depths': [0, 0, 0], 'nested_unet_dim': 16, 'use_convnext': False, 'resnet_groups': 2, 'consolidate_upsample_fmaps': True, 'weight_standardize': False, 'attn_heads': 2, 'attn_dim_head': 16}}, 'train': {'accelerator': 'auto...
But not argument is passed to the scripts. Here i am printing sys.argv
When enqueued the configuration tab still shows the correct arguments
I installed clearml from source and printed the internal S3 configurations, basically key
and secret
are empty