So a normal config file with environment variables.
My suspicion is that this relates to https://clearml.slack.com/archives/CTK20V944/p1643277475287779 , where the config file is loaded prematurely (upon import
), so our dotenv.load_dotenv()
call has not yet registered.
I think you are correct the env variable is not resolved in "time". It might be it's resolved at import not at Task.init
Then perhaps mac treats missing environment variables as empty and linux just crashes? Anyway, the config loading should be deferred, shouldn't it?
Actually it cannot be differed, long story short when the agent is running the same code we have to verify and pass arguments at import time. I have to wonder, I'm expecting the env variables to be preset (I.e previously set for the entire environment) how come they are manually set inside the code (and wouldn't that break when running with an agent)?
They are set with a .env
file - it's a common practice. The .env
file is, at the moment, uploaded to a temporary cache (if you remember the discussion regarding the StorageManager
), so it's also available remotely (related to issue #395)
The agent also uses a different clearml.conf
, so it should not matter?
That makes total sense. The question was about the Mac users and OS environment in the configuration file and having that os environment set in code (this is my assumption as it seems that at import time it does not exist). What am I missing here?
I'm not sure; the setup is not unique to Mac.
Each user has their own .env
file which is given to the code entry point, and at some point will be loaded with dotenv.load_dotenv()
.
The environment variables are not set in code anywhere, but the clearml.conf
uses them directly.
AFAIU, something like this happens (oversimplified):
` from clearml import Task # <--- Crash already happens here
import argparse
import dotenv
if name == "main":
# set up argparse with optional flag for a dotenv file
dotenv.load_dotenv(args.env_file)
# more stuff `
A quick fix will be:
` import dotenv
dotenv.load_dotenv('~/.env')
from clearml import Task # Now we can load it.
import argparse
if name == "main":
# do stuff `wdyt?
BTW: which clearml version are you using ?
(I remember there was a change in the last one, or the one before, making the config loading differed until accesses)
We're using 1.1.5 at the moment -- I'll make sure everyone updates to 1.1.6 on Monday.
That solution does not work for us unfortunately -- the .env
is an argument from argparse, and because we cannot attach non-git files to a remote task (again issue #395), we have to first download CLI arguments for remote execution and ensure they exist on the remote agent.
Hmm UnevenDolphin73 I just checked with v.1.1.6, the first time the configuration file is loaded is when calling Task.init (if not running with an agent, which is your case).
But the main point I just realized I missed 🤯"http://"${CLEARML_ENDPOINT}":8080"
The code does not try to resolve OS environments there!
Which, well, is a nice feature to add
https://github.com/allegroai/clearml/blob/d3e986393ac8d1a1ea48302224962570ab8e6f9e/clearml/backend_api/session/session.py#L576
should probably look something like:os.path.expandvars(ENV_HOST.get(default=(config.get("api.api_server", None) or config.get("api.host", None) or cls.default_host))).rstrip('/')
Then you can do:api_server = "http://"${CLEARML_ENDPOINT}":8008"
WDYT?
I'll have a look at 1.1.6 then!
And that sounds great - environment variables should be supported everywhere in the config, or then the docs should probably mention where they are and are not supported 🙂
I'll be happy to test it out if there's any commit available?
The thing I don't understand is how come this DOES work on our linux setups 🤔
The thing I don't understand is how come this DOES work on our linux setups
I do not think it actually works... I could not have find a code that will convert the ENV in the config string ...
I'll be happy to test it out if there's any commit available?
Please do, and feel free to PR it 😍
https://github.com/allegroai/clearml/blob/d3e986393ac8d1a1ea48302224962570ab8e6f9e/clearml/backend_api/session/session.py#L576
https://github.com/allegroai/clearml/blob/d3e986393ac8d1a1ea48302224962570ab8e6f9e/clearml/backend_api/session/session.py#L586
https://github.com/allegroai/clearml/blob/d3e986393ac8d1a1ea48302224962570ab8e6f9e/clearml/backend_api/session/session.py#L613
The same example code in these three and you are good to goimport os ... ... os.path.expandvars(ENV_HOST.get(...)).rstrip('/')
But it does work on linux 🤔 I'm using it right now and the environment variables are not defined in the terminal, only in the .env
🤔
Are they expanded in the "api_server" ? (I verified on a linux machine, same error, the env in the api_server is not being resolved)
Yeah 🤔 🤔 they did. I'll give your suggested fix a go on Monday!
Hm, just a small update - I just verified and it does indeed work on linux:
` import clearml
import dotenv
if name == "main":
dotenv.load_dotenv()
config = clearml.backend_api.Config.load() # Success, parsed with environment variables `
I believe that happens natively thanks to pyhocon? No idea why it fails on mac
I believe that happens natively thanks to pyhocon? No idea why it fails on mac
That's the only explanation ...
But the weird thing is, it did not work on my linux box?!
Sounds good let's work on it after the weekend, 🙂
UnevenDolphin73 AgitatedDove14 just reminding this is a feature built into pyhocon (i.e. env var resolution at load time). UnevenDolphin73 I do think the "http://"${CLEARML_ENDPOINT}":8080"
string might be confusing and should probably be either "http://${CLEARML_ENDPOINT}:8080"
or simply http://${CLEARML_ENDPOINT}:8080
SuccessfulKoala55 That string was autogenerated by pyhocon and matches their documentation too - https://github.com/lightbend/config/blob/master/HOCON.md#substitutions
The first example won't work (it will treat ${...}
as a string literal and won't replace it). The second does work, but as mentioned anyway, these were not hand typed, but rather generated from pyhocon, so I don't think that's the issue 🤔
Interesting - how did you auto-generate it using pyhocon?
We have a mini default config (if you remember from a previous discussion we had) that actually uses the second form you suggested.
I wrote a small "fixup" script that combines this default with the one generated by clearml-init
, and it simply does:def_config = ConfigFactory.parse_file(DEF_CLEARML_CONF, resolve=False) new_config = ConfigFactory.parse_file(new_config_file, resolve=False) updated_new_config = ConfigTree.merge_configs(new_config, def_config)
But you've added the http://"${CLEARML_ENDPOINT}":8080
yourself to your mini config, didn't you? I don't see it here in the code
It's given as the second form you suggested in the mini config ( http://${...}:8080
). The quotation marks are added later by pyhocon.
This was a long time running since I could not access the macbook in question to debug this.
It is now resolved and indeed a user error - they had implicitly defined CLEARML_CONFIG_FILE
to e.g. /home/username/clearml.conf
instead of /Users/username/clearml.conf
as is expected on Mac.
I guess the error message could be made clearer in this case (i.e. CLEARML_CONFIG_FILE='/home/username/clearml.conf' file does not exist
). Thanks for the support! ❤