
Reputation
Badges 1
100 × Eureka!This is the path:
/Remote/moshe/Experiments/trains_bs_pipe_new/ypi/OKAY/Try_That/baseline/evaluation_validation/results/images/bottom_scores/0.0_slot02_extracted_23_01__1035__1.png
Hmm, I've changed my trains-server config location to use a config in a different location, and successfully set up in the second server the trains-agent. But I don't see any new worker created, why is that?
I bet it has something to do with the server or DB, any clue?
So I'd guess they would inherit my user as well
I've investigated it some more, It isn't path related as far as I can tell, as these same paths worked 2 weeks ago and a normal path doesn't work now
I'm confused. Why would that matter what my local code is when trying to replicate an already ran experiment?
Also, between which files is the git diff performed? (I've seen the linediff --git a/.../run.py b/.../run.py
but I'm not sure what's a and what's b in this context)
Since my servers have a shared file system, the init process tells me that the configuration file already exists. Can I tell it to place it in another location? GrumpyPenguin23
Something else, If I want to designate only some of the GPUs of a worker, how can I do that?
Furthermore, let's say I have 6 GPUs on a machine, and I'd like trains to treat this machine as 2 workers (gpus 0-2, 3-5), is there a way to do that?
But it still doesn't answer one thing, why when I cloned a previously successful experiment, it failed on git diff?
If I'd be exact that's a trains agent task that creates in a new subprocess another trains agent task
Hmm, is there a way to do this via code? I wish to do that before running the Pipeline so each task it contains would be updated to latest branch
I am aware this is the current behavior, but could it be changed to something more intelligent? 😇
It's important to say that this happens when I have more than like 4 workers but when I run thetrains-agent daemon --stop
With less than 4 workers it works well
it uses the api credentials generated by the trains dashboard
Is there a way to set this via a config file? like the docker compose yml?
I will send it to you privately, if that's okay
how could I configure this in the docker compose?
Edit: the trains-agent points to a different trains.conf config as I wis., I want the dev environment to point to a different location trains.conf as well
or do you mean it tries to apply the already ran experiment's uncommitted changes? If that's the case, why did the new experiment fail if the previous experiment ran successfully?
Run a remote task with trains agent that would create inside another task that would again run remotely as well and check the permissions of the second task created file?
I understand how this is problematic. This might require more thinking if you guys wish to support this.