Reputation
Badges 1
662 × Eureka!I mean, I know I could connect_configuration({k: os.environ.get(k) for k in [...]})
, but then those environment variables would be exposed in the ClearML UI, which is not ideal (the environment variables in question hold usernames and passwords, required for DB access)
Great, thanks! Any idea about environment variables and/or other files (CSV)? I suppose I could use the task.upload_artifact
for the CSVs. but I'm still unsure about the environment variables
I should maybe mention that the security regarding this is low, since this is all behind a private VPN server anyway, I'm mostly just interested in having the credentials used for backtracking purposes
You mean at the container level or at clearml?
Yes, the container level (when these docker shell scripts run).
The per user ID would be nice, except I upload the .env
file before the Task
is created (it's only available really early in the code).
Also I appreciate the time youre taking to answer AgitatedDove14 and CostlyOstrich36 , I know Fridays are not working days in Israel, so thank you 🙂
I guess the big question is how can I transfer local environment variables to a new Task
Maybe. When the container spins, are there any identifiers regarding the task etc available? I create a folder on the bucket per python train.py
so that the environment variables files doesn't get overwritten if two users execute almost-simultaneously
Maybe this is part of the paid version, but would be cool if each user (in the web UI) could define their own secrets, and a task could then be assigned to some user and use those secrets during boot?
Each user creates a .env
file for their needs or exports them in the shell running the python code. Currently I copy the environment variables to an S3 bucket and download it from there.
I'm not sure I follow, how would that solution look like?
The overall flow I currently have is e.g.
Start an internal task (not ClearML Task; MLOps not initialized yet) Call some pre_init
function with args
so I can upload the environment file via StorageManager to S3 Call some start_run
function with the configuration dictionary loaded, so I can upload the relevant CSV files and configuration file Finally initialize the MLOps (ClearML), start a task, execute remotely
I can play around with 3/4 (so e.g. upload CSVs and configuratio...
I will! (once our infra guy comes back from holiday and updates the install, for some reason they setup server 1.1.1???)
Meanwhile wondering where I got a random worker from
That's probably in the newer ClearML server pages then, I'll have to wait still 😅
yes, a lot of moving pieces here as we're trying to migrate to AWS and set up autoscaler and more 😅
Can I query where the worker is running (IP)?
Here's an example where poetry.lock
is removed, and still the console reads:url:
.... branch: HEAD commit: 22fffaf8d5f377b7f10140e642a7f6f26b72ffaa root: /.../.clearml/venvs-builds/3.10/task_repository/... Applying uncommitted changes Poetry Enabled: Ignoring requested python packages, using repository poetry lock file! Creating virtualenv ds-platform in /.../.clearml/venvs-builds/3.10/task_repository/.../.venv Updating dependencies Resolving dependencies...
It's pulled from the remote repository, my best guess is that the uncommitted changes apply only after the environment is set up?
Uhhh, but pyproject.toml does not necessarily entail poetry... It's a new Python standard
It seems that the agent uses the remote repository 's lock file. We've removed and renamed the file locally (caught under local changes), but it still installs from the remote lock file 🤔
AgitatedDove14 for future reference this is indeed a PEP-610 related bug, fixed in https://python-poetry.org/blog/announcing-poetry-1.2.0a1/ . I see we can choose the pip
version in the config, can we also set the poetry
version used? Or is it updated from the lock file itself, or...?
My current workaround is to use poetry
and tell users to delete poetry.lock
if they want their environment copied verbatim
Hmmm maybe 🤔 I thought that was expected behavior from poetry side actually
Those are for specific packages, I'm wondering about the package managers as a whole
Should this be under the clearml
or clearml-agent
repo?
Here's how it failed for us 😅poetry
stores git related data in poetry.lock
, so when you pip list
, you get an internal package we have with its version, but no git reference, i.e. internal_module==1.2.3
instead of internal_module @ git+https://....@commit
.
Then pip
actually fails (our internal module is not on pypi), but poetry
suceeds
Right so it uses whatever version is available on the agent.
Yeah it would be nice to have either a poetry_version
(a-la https://github.com/allegroai/clearml-agent/blob/5afb604e3d53d3f09dd6de81fe0a494dacb2e94d/docs/clearml.conf#L62 ), rename the latter to manager_version
, or just install from the captured environment, etc? 🤔
Maybe it's better to approach this the other way, if one uses Task.force_requirements_env_freeze()
, then the locally updated packages aren't reflected in poetry
🤔