Reputation
Badges 1
533 × Eureka!The weirdest thing, is that the execution is "completed" but it actually failed
Okay so regarding the version - we are using 1.1.1
The thing with this error it that it happens sometimes, and when it happens it never goes away...
I don't know what causes it, but we have one host where it works okay, then someone else checks out the repo and tried and it fails for this error, while another guy can do the same and it will work for him
So could you re-explain assuming my piepline object is created by pipeline = PipelineController(...)
?
is this already available or only on github?
You should try trains-agent daemon --gpus device=0,1 --queue dual_gpu --docker --foreground
and if it doesn't work try quoting trains-agent daemon --gpus '"device=0,1"' --queue dual_gpu --docker --foreground
Example code? I didn't see anywhere an example of filtering using project name
I mean the code in whatever form it is - I'm working with git specifically, but if i have diffs I'd like to see the code with the diffs applied
eventually i think it should display the contents of the script executed in the most straightforward manner regardless of version control
That's a fix, but I think it is a basic feature and very usefull to see the actual code in the UI
Especially coming from the standpoint of a team leader or other kind of supervision (or anyone who wants to view the experiment which is not the code author), when looking at an experiment you want to see the actual code
I showed you this phenomenon in the UI photos in the other thread
I don't htink I can, this is private IP and to create a dummy example of a pipeline and execution will take me more time than I can dedicate to this
I prefer we debug on my machine (tell me what you want to check) than create a snippet
that will require restarting the agent again?
okay, that's acceptable
I think a good idea is to add to the error message when the clearml agent fails due to import error, a suggestion ot try out with pip freeze
I am noticing that the files are saved locally, is there any chance that the files are over-written during the run or get deleted at some point and then replaced?
Yes they are local - I don't think there is a possibility they are getting overwritten... But that depends on how clearml names them. I showed you the code that saves the artifacts, but this code runs multiple times from a given template with different values - essentially it creates like 10 times the same task with different param...
It's working! π
:face_palm: π€ :man-tipping-hand:
But does it disable the agent? or will the tasks still wait for the agent to dequeue?