
Reputation
Badges 1
979 × Eureka!Could you please point me to the relevant component? I am not familiar with typescript unfortunately 😞
SuccessfulKoala55 I tried to setup in a different machine the clearml-agent and now I get a different error message in the logs:Warning: could not locate requested Python version 3.6, reverting to version 3.6 clearml_agent: ERROR: Python executable with version '3.6' defined in configuration file, key 'agent.default_python', not found in path, tried: ('python3.6', 'python3', 'python')
CostlyOstrich36 I don’t see such number, can you please share a screenshot of where to look at?
AgitatedDove14 So in the https://pytorch.org/ignite/_modules/ignite/handlers/early_stopping.html#EarlyStopping class I see that some infos are logged (in the __call__
function), and I would like to have these infos logged by clearml
AgitatedDove14 WOW, thanks a lot! I will dig into that 🚀
but not as much as the ELB reports
but if you do that and the package is already installed it will not install using the git repo, this is an issue with pip
Exactly, that’s my problem: I want to remove it to make sure it is reinstalled (because the version can change)
I think that since the agent installs everything from scratch it should work for you. Wdyt?
With env caching enabled, it won’t reinstall this private dependency, right?
So probably only the main process (rank=0) should attach the ClearMLLogger?
If I remove security_group_ids
and just let subnet_id
in the configuration, it is not taken into account (the instances are created in a the default subnet)
Unfortunately this is difficult to reproduce... Neverthless it would be important to me to be robust against it, because if this error happens in a task in the middle of my pipeline, the whole process fails.
This binds to another wider topic I think: How to "skip" tasks if they already run (a mechanism similar to what [ https://luigi.readthedocs.io/en/stable/ ] offers). That would allow to restart the pipeline and skip tasks until the point where the task failed
I get the same error when trying to run the task using clearml-agent services-mode with docker, so weird
Is there one?
No, I rather wanted to understand how it worked behind the scene 🙂
The latest RC (0.17.5rc6) moved all logs into separate subprocess to improve speed with pytorch dataloaders
That’s awesome!
AgitatedDove14 I see https://github.com/allegroai/clearml-session/blob/main/clearml_session/interactive_session_task.py#L21= that a key pair is hardcoded in the repo. Is it being used to ssh to the instance?
/opt/clearml/data/fileserver
does not appear anywhere, sorry for the confusion - It’s the actual location where the files are stored
That's why I suspected trains was installing a different version that the one I expected
The host is accessible, I can ping it and even run curl "
http://internal-aws-host-name:9200/_cat/shards "
and get results from the local machine
AgitatedDove14 How can I filter out tasks archived? I don't see this option
trains==0.16.4
wow if this works that’s amazing
I am looking for a way to gracefully stop the task (clean up artifacts, shutdown backend service) on the agent
without the envs, I had error: ValueError: Could not get access credentials for '
s3://my-bucket ' , check configuration file ~/trains.conf
After using envs, I got error: ImportError: cannot import name 'IPV6_ADDRZ_RE' from 'urllib3.util.url'
Add carriage return flush support using the sdk.development.worker.console_cr_flush_period configuration setting (GitHub trains Issue 181)