Reputation
Badges 1
25 × Eureka!PompousParrot44 I see what you mean, yes multiple context switching might cause a bit of decline in performance. not sure how much though ... The alternative of course is to set cpu affinity... Anyhow if you do get there we can try to come up with something that makes sense, but at the end there is no magic there π
I think that just backing up /opt/clearml and moving it should be just fine π€
Hi CostlyElephant1
What do you mean by "delete raw data"? Data is always fetched to cached folders and clearml takes care of cache cleanup
That said notice that get mutable copy is a target you specify, in this case you should definetly delete after usage. Wdyt ?
Hi SubstantialElk6
try:--docker "<image_name> --privileged"
Notice the quotes
EmbarrassedSpider34 I can update that an RC should be out later today with a fix π
That makes total sense, this is exactly an OS scenario for signal 9 π
It seems something is wrong with the server itself...
Hi RotundHedgehog76
we have issues with
clearml-agent
when using standalone mode. ...
What is the use case for standalone mode? is this venv or docker mode?
Hi WickedStarfish97
As a result, I donβt want the Agent to parse what imports are being used / install dependencies whatsoever
Nothing to worry about here, even if the agent detects the python packages, they are installed on top of the preexisting packages inside the docker. That said if you want to over ride it, you can also pass packages=[]
Funny enough Iβm running into a new issue now.
Sorry my bad, I thought have known π yes it probably should be packages=["clearml==1.1.6"]
BTW: do you have any imports inside the pipeline function itself ? if you do not, then no need to pass "packages" at all, it will just add clearml
If this is the case, there is nothing you need to change, just provide the docker image (no need to pass packages
)
Could it be these packages (i.e. numpy etc) are not installed as system packages in the docker (i.e. inside a venv, inside the docker) ?
So what is the difference ? both running from the same machine ?
regrading the artifact, yes that make sense, I guess this is why there is "input" type for an artifact, the actual use case was never found (I guess until now?! what are you point there?)
Regrading the configuration
It's very useful for us to be able to see the contents of the configuration and understand
Wouldn't that just do exactly what you are looking for:
` local_config_file_that_i_can_always_open = task.connect_configuration("important", "/path/to/config/I/only/have/on/my/machi...
Yep it should :)
I assume you add the previous iteration somewhere else, and this is the cause for the issue?
and sometimes there are hanging containers or containers that consume too much RAM.
Hmmm yes, but can't you see it in CLearML dashboard ?
unless I explicitly add container name in container arguments, it will have a random name,
it would be great if we could set default container name for each experiment (e.g., experiment id)
Sounds like a great feature! with little implementation work π Can you add a GitHub issue on clearml-agent ?
Yep that will fi it, nice one!!
BTW I think we should addtge ability to continue aborted datasets, wdyt?
Actually it hasn't changed ...
Yes, hopefully they have a different exception type so we could differentiate ... :) I'll check
You actually have to login/ssh under said user, have another dedicated mountpoint and spin the agent from that user.
Hi @<1684010629741940736:profile|NonsensicalSparrow35>
So sorry I missed this thread π
Basically your issue is the load balancer that prevents the post command, you can change that, just add to any clearml.conf the following line:
api.http.default_method: "put"
Hi DilapidatedDucks58 ,
Are you running in docker or venv mode?
Do the works share a folder on the host machine?
It might be syncing issue (not directly related to the trains-agent but to the facts you have 4 processes trying to simultaneously access the same resource)
BTW: the next trains-agent RC will have a flag (default off) for torch-nightly repository support π
First let's try to test if everything works as expected. Since 405 really feels odd to me here. Can I suggest following one of the examples start to end to test the setup, before adding your model?