Reputation
Badges 1
662 × Eureka!That will come at a later stage
and I don't think it's in the docs - we'll add that
Very welcome update, please use some highlighting for it too, it's so important for a complete understanding of how the remote execution works
It's not exactly "debugging", but rather a description of the generated model/framework (generated with pygraphviz).
Seemed to work fine again in detached mode, what went wrong there :shocked_face_with_exploding_head:
Of course Im using report_table
in the above; it seems the support for Pandas DataFrame does not include support for MultiIndex
other than by concatenating the indices together
That's fine (as in, it works), but it looks a bit weird and defies the purpose of a MultiIndex
๐ค Was wondering if there are plans to add better support for it
Actually SuccessfulKoala55 , there is something like that happening behind the scenes.
I have an AWS Autoscaler running on a services
queue, so the autoscaler inherits the configuration used by the services
agent, right?
Now, when my autoscaler launched new EC2 instances, they used the same fileserver
as the one that was defined in the services
agent too ๐ค
Nope, no other config files
Okay so the only missing thing of the puzzle I think is that it would be nice if this propagates to the autoscaler as well; that then also allows hiding some of the credentials etc ๐ฎ
Holy crap this was a light-bulb moment, is this listed somewhere in the docs?
It solves so much of my issues xD
Actually TimelyPenguin76 I get only the following as a "preview" -- I thought the preview for an image would be... the image itself..?
Couldn't the agent just come with the toml library? Kinda easy to load up and check if poetry is present then... ๐ค
But yes it indeed used poetry correctly, though it would fail in other circumstances
From the log you shared, the task is picked up by theย
worker_d1bd92a3b039400cbafc60a7a5b1e52b_4e831c4cbaf64e02925b918e9a3a1cf6_<hostname>:gpu0,1
ย worker
I can try and target the default one if it helps..?
Hah. Now it worked.
It's pulled from the remote repository, my best guess is that the uncommitted changes apply only after the environment is set up?
Sorry, I misspoke, yes of course, the agents config file, not the queues
I mean, it makes sense to have it in a time-series plot when one is logging iterations and such. But that's not always the case... Anyway I opened an issue about that too! ๐
A follow up question (instead of opening a new thread), is there a way I could signal some files/directories to be copied to the execute_remotely
task?
The idea is that the features would be copied/accessed by the server, so we can transition slowly and not use the available storage manager for data monitoring
It's a bit hard to read when they'll all clustered together:
I guess following the example https://github.com/allegroai/clearml/blob/master/examples/advanced/execute_remotely_example.py , it's not clear to me how the server has access to the data loaders location when it hits execute_remotely
Exactly; the cloud instances (that are run with clearml-agent
) should have that clearml.conf
+ any changes specified in extra_clearml_configuration
for the scaler
The only thing I could think of is that the output of pip freeze would be a URL?
Full log:
` command: /usr/sbin/helm --version=4.1.2 upgrade -i --reset-values --wait -f=/tmp/tmp77d9ecye.yml clearml clearml/clearml
msg: |-
Failure when executing Helm command. Exited 1.
stdout:
stderr: W0728 09:23:47.076465 2345 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0728 09:23:47.126364 2345 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unava...
Could you provide a more complete set of instructions, for the less inclined?
How would I backup the data in future times etc?
Okay, I'll test it out by trying to downgrade to 4.0.0 and then upgrade to 4.1.2
Just to make sure, the chart_ref
is allegroai/clearml
right? (for some reason we had clearml/clearml
and it seems like it previously worked?)
But to be fair, I've also tried with python3.X -m pip install poetry
etc. I get the same error.