Reputation
Badges 1
100 × Eureka!If I'm exact I would like to add "commit id" to the override arguments when adding a task as a step to the pipeline
or do you mean it tries to apply the already ran experiment's uncommitted changes? If that's the case, why did the new experiment fail if the previous experiment ran successfully?
If I change the file at the entry point (let's say, I delete all of its content), how will trains behave when I try to clone and execute such task?
But it still doesn't answer one thing, why when I cloned a previously successful experiment, it failed on git diff?
I've investigated it some more, It isn't path related as far as I can tell, as these same paths worked 2 weeks ago and a normal path doesn't work now
I bet it has something to do with the server or DB, any clue?
Fixed. the issue was the project name containing /
I've ran this 8 times:trains-agent --config-file /opt/trains/trains.conf daemon --detached --cpu-only --queue important_cpu_queue cpu_queue
The version is 0.16.2rc0 (a version Mushik gave me that supports local conda env)
AgitatedDove14 Quite hard for me to try this right now. but I've validated that the relevant code segments are untouched between the versions. (at least current master branch at the ClearML repo)
I am aware this is the current behavior, but could it be changed to something more intelligent? 😇
When you say I can still get race/starvation cases, you mean in the enterprise or regular version?
I see, will keep that in mind. Thanks Martin!
You can try copying all the contents of requirements.txt to the installed packages tab in the trains dashboard of your experiment (in the UI)
I think it should be treated as failed, I am truly not convinced as why aborting a task should be anything beside a user terminating an unwanted behavior of the task (be it bug, running with wrong config, task getting stuck etc..)
If this could be so, I'd be happy to have this as a feature, this really impacts my pipeline flow.
I aborted the task because of a bug on my side
But maybe only one step in the dag is flawed and I want to continue the rest of the pipeline as usual (despite the branch of the flawed task).
I am not sure what you mean by automatic stopping flows, could you give an example?
otherwise if you empty the installed packages and the requirements.txt is in one of the parents folder of the files that ran trains should detect it automatically
For example HPO, early stopping. It would mark the Task as aborted
Why? The task should have completed successfully, how is this aborting?
Including this?auto_connect_frameworks={"matplotlib": False}
I think Mushon has told me otherwise a while ago
Oof, if all I have is a project bame to set? (Which could be a non existing project as well)