Reputation
Badges 1
981 × Eureka!Ok, I won't have time to venture to check the different database components, the first option (shuting down the server) sounds like the easiest option for me, I would then run manually the script once a month or so
That would be awesome, yes, only from my side I have 0 knowledge of the pip codebase 😄
Yes, in the Task being executed in the agents, I have:from trains import Task task = Task.init(...) task.get_logger().report_text(str(task.get_parameters()))
I can also access these files directly if I enter the url in the browser
Note: I can verify that post_packages is well picked up by the trains-agent, since in the experiment log I see:agent.package_manager.type = pip agent.package_manager.pip_version = \=\=20.2.3 agent.package_manager.system_site_packages = true agent.package_manager.force_upgrade = false agent.package_manager.post_packages.0 = PyJWT\=\=1.7.1
Sorry, what I meant is that it is not documented anywhere that the agent should run in docker mode, hence my confusion
Yes AnxiousSeal95 , stopped instance meaning you don’t pay for it, but just its storage, as described https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html . So AgitatedDove14 increasing the IDLE timeout would still make me pay for the instance while they are idle.
Do you get stopped instances instantely when you ask for them?
Well that’s a good question, that’s what I observed some time ago, but according to their https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/...
Interesting! Something like that would be cool yes! I just realized that custom plugins in Mattermost are written in Go, could be a good hackday for me 😄 to learn go
can it be that the merge op takes so much filesystem cache that the rest of the system becomes unresponsive?
Yes, super thanks AgitatedDove14 !
I’ll definitely check that out! 🤩
No worries! I asked more to be informed, I don't have a real use-case behind. This means that you guys internally catch the argparser object somehow right? Because you could also simply use sys argv to find the parameters, right?
but the post_packages does not reinstalls the version 1.7.1
ha sorry it’s actually the number of shards that increased
I just checked if something changed in https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_config.html#web-login-authentication
AgitatedDove14 I cannot confirm at 100%, the context is different (see previous messages) but it could be the same bug behind the scene...
Sure! Here are the relevant parts:
` ...
Current configuration (clearml_agent v1.2.3, location: /tmp/.clearml_agent.3m6hdm1_.cfg):
...
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = ==20.2.3
agent.package_manager.system_site_packages = false
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 ...
This is not the case, I downloaded it and I got a cuda error at runtime
I made sure before deleting the old index that the number of docs matched
It could be: I am running the clearml aws autoscaler in an ec2 instance having iam roles allowing for creating/deleting instances, but I get Warning! exception occurred: An error occurred (UnauthorizedOperation) when calling the RunInstances operation: You are not authorized to perform this operation. Encoded authorization failure message: ...
I suspect that since the agent is running in docker mode, the boto3 lib doesn’t automatically get the right permissions from the ec2-instance. To...
it also happens without hitting F5 after some time (~hours)