SmarmySeaurchin8 I might be missing something in your description. The way the pipeline works,
the Tasks in the DAG are pre-executed (either with "execute_remotely" or actually fully executed once").
The DAG nodes themselves are executed on the trains-agent , which means they reproduce the code / env for every cloned Task in the DAG (not on the original Tasks).
Yeah I understand that. But since overriding parameters of pre executed Tasks is possible, I was wondering if I could change the commit id to the current one as well.
What do you mean by execute remotely? (I didn't really understand this one from the docs)
, I was wondering if I could change the commit id to the current one as well.
Actually that would be possible, but will need a bit of code to support controlling Task properties (not just configuration parameters)
How can I do that without running this Task by it's own?
Assuming you have a committed code that already supports it. You can clone the executed Task, and then change the commit ID to the "latest on branch" (see drop down when editing)
Would that help ?
The easiest example for such use case as I describe is for example trying to run the full pipeline but in this experiment I wish to try Batch Norm which I haven't used in the pre executed Task. How can I do that without running this Task by it's own? (Which is quite problematic for me since it runs as a part of a pipeline, therefore using DAG)
Hmm, is there a way to do this via code? I wish to do that before running the Pipeline so each task it contains would be updated to latest branch
If I'm exact I would like to add "commit id" to the override arguments when adding a task as a step to the pipeline
I've seen that the file location of a task is saved
What do you mean by that? is it the execution section "entry point" ?
On another topic, I've just now copied a Task that ran successfully yesterday and tried to run it. It failed to run and I got a
ERROR! Failed applying git diff, see diff above.Why is that?
If I change the file at the entry point (let's say, I delete all of its content), how will trains behave when I try to clone and execute such task?
That is exactly that, the trains-agent is replicating the code from the git repo, and trying to apply the git diff (see uncommitted changes section). Obviously it failed 🙂
I'm confused. Why would that matter what my local code is when trying to replicate an already ran experiment?
Also, between which files is the git diff performed? (I've seen the line
diff --git a/.../run.py b/.../run.pybut I'm not sure what's a and what's b in this context)
Okay, let's take a step back and I'll explain how things work.
When running the code (initially) and calling Task.init
A new experiment is created on the server, it automatically stores the git repo link, commit ID, and the local uncommitted changes . these are all stored on the experiment in the server.
Now assume the trains-agent is running on a different machine (which is always the case even if it is actually on the same machine).
The trains-agent will create a new virtual-environment for every experiment created, in the new venv it will install the packages based on what is written in "installed packages" section under experiment execution. Then it will clone the git repository (based on the definition written on the experiment), once the cloning is done, it will apply the "uncommitted changes" on the newly cloned code. This process will reproduce the state of the code in the original machine on a new remote machine.
Once everything is done, it will run the python script based on the "working directory" and "entry point" as written on the experiment.
Make more sense ?
or do you mean it tries to apply the already ran experiment's uncommitted changes? If that's the case, why did the new experiment fail if the previous experiment ran successfully?
But it still doesn't answer one thing, why when I cloned a previously successful experiment, it failed on git diff?
Can you do it manually, i.e. checkout the same commit id, then take the uncommitted changes (you can copy paste it to diff.txt) then call git apply diff.txt ?
Nope, I didn't change anything
Could you send the full log ?
That is odd...
Sure, but before that, it seems that the script path parameter (which I think you refer to as entry_point) is not relative to the base of the repo, as I expected it to be, could that interfere?
I will send it to you privately, if that's okay
sure no prob
Did you change the commit ID ?
I will try that.
In addition, I've seen that the file location of a task is saved, does it mean that when rerunning said task (for example clone it and enqueue it) trains will search for the file in the stored location? Or will it clone the repo with the given commit id and use the relative path to find this file?
Hmm, is there a way to do this via code?
Yes, clone the Task
data=task.export_task() and edit the data object (see execution section)
Then update back with