ok, hours of debugging later, I realized that the auto_scaler example initializes a https://github.com/allegroai/clearml/blob/721569bb77d89d89e5b4f32a0ed98311c4574650/examples/services/aws-autoscaler/aws_autoscaler.py#L68 the task is initialized on the remote side.
Apparently, https://github.com/allegroai/clearml/blob/721569bb77d89d89e5b4f32a0ed98311c4574650/examples/services/aws-autoscaler/aws_autoscaler.py#L103 , doesn’t populate that dict with any keys that don’t already exist in it .
Since all this is happening around task initialization on the border between local and remote, it was a terrible issue to debug.
Bad waste of time - first to realize that the instance profile is not getting passed, then finding where, between 15 places where it could have been dropped.
RoughTiger69
Apparently,
, doesn’t populate that dict with
any keys that don’t already exist in it
.
Are you saying new entries are not added to the Dict even if they are on the Task (i.e. only entries that already exist on the dict are populated ?
But you already have all the entries defined here:
https://github.com/allegroai/clearml/blob/721569bb77d89d89e5b4f32a0ed98311c4574650/examples/services/aws-autoscaler/aws_autoscaler.py#L22
Since all this is happening around task initialization on the border between local and remote, it was a terrible issue to debug.
Notice that this line updates back on the Task the actual full configuration that is used (this is one way, meaning the code will always update the Task, never the other way around)
it does appear on the task in the UI, just somehow not repopulated in the remote run if it’s not a part of the default empty dict…
Hmm that is the odd thing... what's the missing field ? Could it be that it is failing to Cast to a specific type because the default value is missing?
(also, is issue present in the latest clearml RC? It seems like a task.connect issue)
Trust me, I had to add this field to this default dict just so that clearml doesn’t delete it for me
it does appear on the task in the UI, just somehow not repopulated in the remote run if it’s not a part of the default empty dict…
But you already have all the entries defined here:
yes but it’s missing a field that is actually found and parsed from my local autoscaler.yaml….