Reputation
Badges 1
662 × Eureka!And yes, our flow would break anyway with the internal references within the yaml file. It would be much simpler if we could specify the additional files
The new task is not running inside a new subprocess. Our platform trains several models, and we'd like each of them to be tracked in their own Task
. When running locally, this is "out of the box", as we can init and close before and after each model.
When running remotely, one cannot close the main task (since it is what orchestrates everything), and so this workaround was needed.
I am; it seems like maybe a couple of hours?
But since this has come up a lot recently, any updates on #340? ๐
@<1523701087100473344:profile|SuccessfulKoala55> could you provide some instructions?
Just a side note - the 1.1.1 notice keeps popping up even though the server is at 1.1.1 (and I've cleared browser cache etc)
Opened a matching feature request issue for this -> https://github.com/allegroai/clearml/issues/418
I'll kill the agent and try again but with the detached mode ๐ค
It seems that the agent uses the remote repository 's lock file. We've removed and renamed the file locally (caught under local changes), but it still installs from the remote lock file ๐ค
Here's an example where poetry.lock
is removed, and still the console reads:url:
.... branch: HEAD commit: 22fffaf8d5f377b7f10140e642a7f6f26b72ffaa root: /.../.clearml/venvs-builds/3.10/task_repository/... Applying uncommitted changes Poetry Enabled: Ignoring requested python packages, using repository poetry lock file! Creating virtualenv ds-platform in /.../.clearml/venvs-builds/3.10/task_repository/.../.venv Updating dependencies Resolving dependencies...
That will come at a later stage
and I don't think it's in the docs - we'll add that
Very welcome update, please use some highlighting for it too, it's so important for a complete understanding of how the remote execution works
It's not exactly "debugging", but rather a description of the generated model/framework (generated with pygraphviz).
Seemed to work fine again in detached mode, what went wrong there :shocked_face_with_exploding_head:
Of course Im using report_table
in the above; it seems the support for Pandas DataFrame does not include support for MultiIndex
other than by concatenating the indices together
That's fine (as in, it works), but it looks a bit weird and defies the purpose of a MultiIndex
๐ค Was wondering if there are plans to add better support for it
Actually SuccessfulKoala55 , there is something like that happening behind the scenes.
I have an AWS Autoscaler running on a services
queue, so the autoscaler inherits the configuration used by the services
agent, right?
Now, when my autoscaler launched new EC2 instances, they used the same fileserver
as the one that was defined in the services
agent too ๐ค
Nope, no other config files
Okay so the only missing thing of the puzzle I think is that it would be nice if this propagates to the autoscaler as well; that then also allows hiding some of the credentials etc ๐ฎ
Holy crap this was a light-bulb moment, is this listed somewhere in the docs?
It solves so much of my issues xD
Actually TimelyPenguin76 I get only the following as a "preview" -- I thought the preview for an image would be... the image itself..?
Couldn't the agent just come with the toml library? Kinda easy to load up and check if poetry is present then... ๐ค
But yes it indeed used poetry correctly, though it would fail in other circumstances
From the log you shared, the task is picked up by theย
worker_d1bd92a3b039400cbafc60a7a5b1e52b_4e831c4cbaf64e02925b918e9a3a1cf6_<hostname>:gpu0,1
ย worker
I can try and target the default one if it helps..?
One more UI question TimelyPenguin76 , if I may -- it seems one cannot simply report single integers. The report_scalar
feature creates a plot of a single data point (or single iteration).
For example if I want to report a scalar "final MAE" for easier comparison, it's kinda impossible ๐
Hah. Now it worked.
It's pulled from the remote repository, my best guess is that the uncommitted changes apply only after the environment is set up?
Sorry, I misspoke, yes of course, the agents config file, not the queues