Reputation
Badges 1
981 × Eureka!Ho nice, thanks for pointing this out!
Thanks AgitatedDove14 ! I created a project with a default output destination to a s3 bucket but I don't have local access to this bucket (only agents have access to it for security reasons). Because of that, I cannot create a task in this project programmatically locally because it tries to access the bucket and fails. And there is no easy way to change the default output location (not in the web UI, not in the sdk)
/data/shared/miniconda3/bin/python /data/shared/miniconda3/bin/clearml-agent daemon --services-mode --detached --queue services --create-queue --docker ubuntu:18.04 --cpu-only
I would probably leave it to the ClearML team to answer you, I am not using the UI app and for me it worked just well with different regions. Maybe check permissions of the key/secrets?
but if you do that and the package is already installed it will not install using the git repo, this is an issue with pip
Exactly, that’s my problem: I want to remove it to make sure it is reinstalled (because the version can change)
I think that since the agent installs everything from scratch it should work for you. Wdyt?
With env caching enabled, it won’t reinstall this private dependency, right?
I have two controller tasks running in parallel in the trains-agent services queue
If I don’t start clearml-session , I can easily connect to the agent, so clearml-session is doing something that messes up the ssh config and prevent me from ssh into the agent afterwards
Interesting! Something like that would be cool yes! I just realized that custom plugins in Mattermost are written in Go, could be a good hackday for me 😄 to learn go
Should I try to disable dynamic mapping before doing the reindex operation?
Adding back clearml logging with matplotlib.use('agg') , uses more ram but not that suspicious
So I created a symlink in /opt/train/data -> /data
Or even better: would it be possible to have a support for HTML files as artifacts?
AgitatedDove14 WOW, thanks a lot! I will dig into that 🚀
AgitatedDove14 Yes exactly, I tried the fix suggested in the github issue urllib3>=1.25.4 and the ImportError disappeared 🙂
And I am wondering if only the main process (rank=0) should attach the ClearMLLogger or if all the processes within the node should do that
I’ve set dynamic: “strict” in the template of the logs index and I was able to keep the same mapping after doing the reindex
Hi AgitatedDove14 , that’s super exciting news! 🤩 🚀
Regarding the two outstanding points:
In my case, I’d maintain a client python package that takes care of the pre/post processing of each request, so that I only send the raw data to the inference service and I post process the raw output of the model returned by the inference service. But I understand why it might be desirable for the users to have these steps happening on the server. What is challenging in this context? Defining how t...
I am already trying with latest of pip 😞
Ok, now I would like to copy from one machine to another via scp, so I copied the whole /opt/trains/data folder, but I got the following errors:
Could be, but not sure -> from 0.16.2 to 0.16.3
now I can do nvcc --version and I getCuda compilation tools, release 10.1, V10.1.243
Thanks! I will investigate further, I am thinking that the AWS instance might have been stuck for an unknown reason (becoming unhealthy)
Thanks a lot, I will play with that!
No worries! I asked more to be informed, I don't have a real use-case behind. This means that you guys internally catch the argparser object somehow right? Because you could also simply use sys argv to find the parameters, right?
Hi AgitatedDove14 , How should we proceed to fix this bug? Should I open an issue in github? Should I try to make a minimal reproducible example? It’s blocking me atm
Note: I can verify that post_packages is well picked up by the trains-agent, since in the experiment log I see:agent.package_manager.type = pip agent.package_manager.pip_version = \=\=20.2.3 agent.package_manager.system_site_packages = true agent.package_manager.force_upgrade = false agent.package_manager.post_packages.0 = PyJWT\=\=1.7.1