Reputation
Badges 1
662 × Eureka!Honestly, this is all related to issue #340. The only reason we have this to begin with is because we need one separate "initializer" task that downloads the remote cache and prepares the agent environment for execution (downloading the configuration files, etc).
Otherwise it fits perfectly with pipelines, but we're not there yet.
In the local execution we don't have this initializer task, so we use Task.init()
before starting to work on a model, and task.close()
when we're done....
It does not 🙂
We started discussing it here - https://clearml.slack.com/archives/CTK20V944/p1640955599257500?thread_ts=1640867211.238900&cid=CTK20V944
You suggested this solution - https://clearml.slack.com/archives/CTK20V944/p1640973263261400?thread_ts=1640867211.238900&cid=CTK20V944
And I eventually found this solution to work - https://clearml.slack.com/archives/CTK20V944/p1641034236266500?thread_ts=1640867211.238900&cid=CTK20V944
Thanks CostlyOstrich36 !
Hm, I did not specify any specific versions previously. What was the previous default?
And yes, our flow would break anyway with the internal references within the yaml file. It would be much simpler if we could specify the additional files
The new task is not running inside a new subprocess. Our platform trains several models, and we'd like each of them to be tracked in their own Task
. When running locally, this is "out of the box", as we can init and close before and after each model.
When running remotely, one cannot close the main task (since it is what orchestrates everything), and so this workaround was needed.
I am; it seems like maybe a couple of hours?
But since this has come up a lot recently, any updates on #340? 😍
@<1523701087100473344:profile|SuccessfulKoala55> could you provide some instructions?
I'll give it a shot. Honestly, the SDK documentation for both InputModel and OutputModel is (sorry) horrible ...
Can't wait for the documentation revamping.
Just a side note - the 1.1.1 notice keeps popping up even though the server is at 1.1.1 (and I've cleared browser cache etc)
Opened a matching feature request issue for this -> https://github.com/allegroai/clearml/issues/418
I'll kill the agent and try again but with the detached mode 🤔
It seems that the agent uses the remote repository 's lock file. We've removed and renamed the file locally (caught under local changes), but it still installs from the remote lock file 🤔
For now we've monkey-patched it to our usecase:
` Dataset._Dataset__hidden_tag = "active"
def foo(cls, dataset_project, dataset_name):
dataset_project = dataset_project or "Datasets"
return dataset_project, dataset_project.rpartition("/")[0]
Dataset._build_hidden_project_name = foo `
Here's an example where poetry.lock
is removed, and still the console reads:url:
.... branch: HEAD commit: 22fffaf8d5f377b7f10140e642a7f6f26b72ffaa root: /.../.clearml/venvs-builds/3.10/task_repository/... Applying uncommitted changes Poetry Enabled: Ignoring requested python packages, using repository poetry lock file! Creating virtualenv ds-platform in /.../.clearml/venvs-builds/3.10/task_repository/.../.venv Updating dependencies Resolving dependencies...
That will come at a later stage
I'm not sure what you mean by "entity", but honestly anything work. We're already monkey-patching our way 😄
and I don't think it's in the docs - we'll add that
Very welcome update, please use some highlighting for it too, it's so important for a complete understanding of how the remote execution works
It's not exactly "debugging", but rather a description of the generated model/framework (generated with pygraphviz).
Seemed to work fine again in detached mode, what went wrong there :shocked_face_with_exploding_head:
Of course Im using report_table
in the above; it seems the support for Pandas DataFrame does not include support for MultiIndex
other than by concatenating the indices together
That's fine (as in, it works), but it looks a bit weird and defies the purpose of a MultiIndex
🤔 Was wondering if there are plans to add better support for it
AgitatedDove14 Basically the fact that this happens without user control is very frustrating - https://github.com/allegroai/clearml/blob/447714eaa4ac09b4d44a41bfa31da3b1a23c52fe/clearml/datasets/dataset.py#L191
Actually SuccessfulKoala55 , there is something like that happening behind the scenes.
I have an AWS Autoscaler running on a services
queue, so the autoscaler inherits the configuration used by the services
agent, right?
Now, when my autoscaler launched new EC2 instances, they used the same fileserver
as the one that was defined in the services
agent too 🤔
Nope, no other config files
Okay so the only missing thing of the puzzle I think is that it would be nice if this propagates to the autoscaler as well; that then also allows hiding some of the credentials etc 😮