Sorry, been away for a while!
I have no additional information, since it was a bug in my model that I have since eliminated...
Maybe it was just a matplotlib error and can be dropped for now. I'll let you know if it pops up again!
Each user creates a .env
file for their needs or exports them in the shell running the python code. Currently I copy the environment variables to an S3 bucket and download it from there.
Thanks AgitatedDove14 , I'll first have to prove viability with the free version :)
These are per-user. Essentially we log user DB access as well (for various backtracking afterwards), so it's beneficial for us to pass the user DB secrets to the task and not have it configured once on the agent.
I mean, I know I could connect_configuration({k: os.environ.get(k) for k in [...]})
, but then those environment variables would be exposed in the ClearML UI, which is not ideal (the environment variables in question hold usernames and passwords, required for DB access)
Maybe. When the container spins, are there any identifiers regarding the task etc available? I create a folder on the bucket per python train.py
so that the environment variables files doesn't get overwritten if two users execute almost-simultaneously
You mean at the container level or at clearml?
Yes, the container level (when these docker shell scripts run).
The per user ID would be nice, except I upload the .env
file before the Task
is created (it's only available really early in the code).
Thanks for the reply CostlyOstrich36 !
Does the task read/use the cache_dir
directly? It's fine for it to be a cache and then removed from the fileserver; if users want the data to stay they will use the ClearML Dataset 🙂
The S3 solution is bad for us since we have to create a folder for each task (before the task is created), and hope it doesn't get overwritten by the time it executes.
Argument augmentation - say I run my code with python train.py my_config.yaml -e admin.env
...
The S3 bucket credentials are defined on the agent, as the bucket is also running locally on the same machine - but I would love for the code to download and apply the file automatically!
One way to circumvent this btw would be to also add/use the --python
flag for virtualenv
Thanks Alon. In the full/official documentation the clearml-data
CLI is not mentioned anywhere, so perhaps it should be refreshed 😉
I think we're referring to different things here.
I won't be using the UI (and neither will my team).
But as mentioned, we've used DVC before and it adds a lot of junk metadata files to each GitHub PR (many dvc.yaml
, dvc.lock
and .gitignore
files). We're trying to avoid that as much as possible, hence my question about GitHub pull...
The overall flow I currently have is e.g.
Start an internal task (not ClearML Task; MLOps not initialized yet) Call some pre_init
function with args
so I can upload the environment file via StorageManager to S3 Call some start_run
function with the configuration dictionary loaded, so I can upload the relevant CSV files and configuration file Finally initialize the MLOps (ClearML), start a task, execute remotely
I can play around with 3/4 (so e.g. upload CSVs and configuratio...
I'm not sure I follow, how would that solution look like?
Great, thanks! Any idea about environment variables and/or other files (CSV)? I suppose I could use the task.upload_artifact
for the CSVs. but I'm still unsure about the environment variables
It misses the repository information of course, but the 'configuration/Args' were logged. So something weird in identifying the repository
I guess the big question is how can I transfer local environment variables to a new Task
Thanks CostlyOstrich36 !
And can I make sure the same budget applies to two different queues?
So that for example, an autoscaler would have a resource budget of 6 instances, and it would listen to aws
and default
as needed?
Maybe this is part of the paid version, but would be cool if each user (in the web UI) could define their own secrets, and a task could then be assigned to some user and use those secrets during boot?
I'll have some reports tomorrow I hope TimelyPenguin76 SuccessfulKoala55 !
Not really, I've only been able to somewhat understand the scope of where it happens, and I'm not sure it's even a ClearML issue (maybe matplotlib)
Trying now with 1.4.1, but I believe the changes you're referring to SuccessfulKoala55 were also introduced in 1.4.0, right?
Thanks AgitatedDove14 , I'll give it a try. Perhaps additional documentation is needed for that extra_layout
I can scroll sideways but if I open any of the comparison items, I pretty much can only see one experiment's values
I guess it's mixed. If #340 is resolved, then this initializer task will be a no-op: detach, and init-close new tasks as needed.
The instance that took a while to terminate (or has taken a while to disappear from the idle workers)
SuccessfulKoala55 could this be related to the monkey patching for logging platform? We have our own logging handlers that we use in this case