Also something we are very much interested in (including the logger-based scatter plots etc)
Here's how it failed for us 😅poetry
stores git related data in poetry.lock
, so when you pip list
, you get an internal package we have with its version, but no git reference, i.e. internal_module==1.2.3
instead of internal_module @ git+https://....@commit
.
Then pip
actually fails (our internal module is not on pypi), but poetry
suceeds
Maybe it's better to approach this the other way, if one uses Task.force_requirements_env_freeze()
, then the locally updated packages aren't reflected in poetry
🤔
Or some users that update their poetry.lock
and some that update manually as they prefer to resolve on their own.
Haha, I've opened so many issues these past few days... Sure, np!
Right so it uses whatever version is available on the agent.
Yeah it would be nice to have either a poetry_version
(a-la https://github.com/allegroai/clearml-agent/blob/5afb604e3d53d3f09dd6de81fe0a494dacb2e94d/docs/clearml.conf#L62 ), rename the latter to manager_version
, or just install from the captured environment, etc? 🤔
Fair enough 😄
Could be nice to be able to define the fallbacks under type
maybe?type: [ poetry, pip ]
(current way under the hood) vs type: [ pip, poetry ]
The tl;dr is that some of our users like poetry
and others prefer pip
. Since pip install git+....
stores the git data, it seems trivial to first try and install based on pip
, and only later on poetry
, since the pip
would crash with poetry
as it stores git data elsewhere (in poetry.lock
)
Would be nice if the second one was a toggle-able feature (either per use or in the server settings) maybe?
Yeah I figured (2) would be the way to go actually 😄
Local changes are applied before installing requirements, right?
Ah it already exists https://github.com/allegroai/clearml-server/issues/134 , so I commented on it
Ah right, I missed that in the codebase. It just adds the .dataset
convention to the dataset task.
SmugDolphin23 we've been working with this for 2 weeks now, and it creates a lot of junk in our UI. Is there anyway to have better control over this?
Let me test it out real quick.
Those are for specific packages, I'm wondering about the package managers as a whole
No task, no dataset, just an empty container with no reference to the task it's attached.
It seems to me that it should not move the task if use_current_task=True
?
Seems like Task.create
is the correct use-case then, since again this is about testing flows using e.g. pytest, so the task is not the current process.
I've at least seen references in dataset.py
's code that seem to apply to offline mode (e.g. in Dataset.create
there is if output_uri and not Task._offline_mode:
, so someone did consider datasets in offline mode)
I think the environment variables path might work for you then?
You'd set your config withuse_credentials_chain: ${CREDENTIALS_CHAIN}
Then in Python you could os.environ['CREDENTIALS_CHAIN'] = True/False
before you make any calls to ClearML?
StaleButterfly40 what use case are you looking for? I've used environment variables in the config file and then I can overwrite them in os.environ
before ClearML loads the config
I thought this follows from our previous discussion SuccessfulKoala55 , where this is a built-in feature of pyhocon?
There's not much (or anything) in the log to provide...
` (.venv) 15:42 [0:user@server$~] CLEARML_CONFIG_FILE=~/agent_clearml.conf clearml-agent daemon --queue default on_prem --detached --order-fairness
Environment variables set from configuration: ['AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', 'AWS_DEFAULT_REGION']
...
Sorry for the late reply Jake -- I was away on holidays -- it works perfectly now, thanks!
The agent also uses a different clearml.conf
, so it should not matter?
Most of these are configurations (specific for an execution, but one such configuration defines multiple tasks). Some models might be uploaded if the user does not use our built-in link to ClearML model fetching 😄
The documentation is messy, I’ve complained about it the in the past too 🙈
I didn't mention code in #340 nor did I mention data here 😄 The idea was to package non git-specific files for remote execution