Hi NutritiousBear41 , asking questions here is exactly the reason we open the Slack channel :)
Regrading the error, it might be that you are stubbled on a bug , do you get the git repo on the UI?
Hmm can you run the agent in debug mode, and check the specific console log?
'''
clearml-agent --debug daemon --foreground ...
Actually it is better to leave it as is, it will just automatically mount the .ssh folder into the container, i will make sure the docs point to this option first
Hi GrotesqueOctopus42
In theory it can be built, the main hurdle is getting elk/mongo/redis containers for arm64 ...
Hi GrotesqueOctopus42 ,
BTW: is it better to post the long error message on a reply to avoid polluting the channel?
Yes, that is appreciated π
Basically logs in the thread of the initial message.
To fix this a had to spin the agent using --cpu-only flag (--docker --cpu-only)
Yes if you do not specify --cpu-only it will default to trying to access gpus
Nice!
They inherit from one another, so it does make sense. Also the add_tags is on the "main" Task and not the backend parent
LudicrousParrot69 there is already
Task.add_tags
https://github.com/allegroai/clearml/blob/2d561bf4b3598b61525511a1a5f72a9dba74953e/clearml/task.py#L964
Notice that if you pass string it will split it based on spaces
Pycharm does get confused sometimes
Hi QuaintPelican38
Can you ssh to {instance_public_ip_address}:10022 (something like ssh -p 10022 user@IP_HERE
)?
Basically just getting the password prompt means you are okay.
I suspect that you have some AWS security definition (firewall) that prevents a direct access to the instance, could that be?
LOL, thanks!
ColossalAnt7 I would do the following:
Configure trains-server user/pass, mounting the API server configuration file as pointed in the trains-server documentation (intermediate temporary step) Start by providing the ML guys with a VPN access that allows them to access directly the trains-server api/web/file pos (caveat is the IP/sub-domain needs to be solved) Configure a ConfigMap to do the routing/ingest (this solves the IP/Sub-Domain issue) and allow the VPN to access the single entrypoint...
Hi ColossalAnt7
Following on SuccessfulKoala55 answer
I saw that there is a config file where you can specify specific users and passwords, but it currently requires
- mount the configuration file (the one holding the user/pass) into the pod from a persistent volume .
I think the k8s way to do this would be to use mounted config maps and secrets.
You can use ConfigMaps to make sure the routing is always correct, then add a load-balancer (a.k.a a fixed IP) for the users a...
WackyRabbit7 basically starting v1.1 if you are running code without any configuration file, you will get an error (in contrast to previous versions where it defaulted to the demo-server)
Does the clearml module parse the python packages?
Yes it analyzes the installed packages based on the actual mports you have in the code.
If I'm using a private pypi artifact server, would I set the PIP_INDEX_URL on the workers so they could retrieve those packages when that experiment is cloned and re-ran?
Correct π the agent basically calls pip install
on those packages, so if you configure it, with PIP_INDEX_URL it should just work like any other pip install
Thank you @<1689446563463565312:profile|SmallTurkey79> !!!
WithΒ
pipe.start(queue='services')
, it still tries to run some docker for some reason
The services agent is always running with --docker:
https://github.com/allegroai/clearml-agent/blob/e416ab526ba9fe05daa977b34c9e46b50fb214a0/docker/services/entrypoint.sh#L16
Actually I think we should have it as an argument, so it is easier to control from docker-compose
I'll be waiting for the full log to check the "git clone" issue
VirtuousFish83
could that be that "inplace-abn" while installing the package needs torch ?
Thanks @<1527459125401751552:profile|CloudyArcticwolf80> ! let me see if we can reproduce it
Hi JitteryCoyote63 , I have to admit, we have not thought of this scenario... what's the exact use case to clone a Task and change the type?
Obviously you can always change the task type, a bit of a hack but should work:task._edit(type='testing')
DeliciousBluewhale87 could you restart the pod and ssh to the Host and make sure the folder /opt/clearml/agent
exists and there is not *.conf file in it ?
DeliciousBluewhale87 and is it working?
Ohh okay something seems to half work in terms of configuration, the agent has enough configuration to register itself, but fails to pass it to the task.
Can you test with the latest agent RC:0.17.2rc4
Hi DeliciousBluewhale87
clearml-agent 0.17.2 was just release with the fix, let me know if it works
DeliciousBluewhale87
Upon ssh-ing into the folders in the both the physical node (/opt/clearml/agent) and the pod (/root/.clearml), it seems there are some files there..
Hmm that means it is working...
Do you see there a *.conf files? What do they contain? (it point to the correct clearml-server config)