Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
One More Thing, I'M Trying To Take Full Advantage Of The Controller, But I Run Into A Problem In My Use Case. The Controller Is Super Useful For Creating A Dag Of Tasks Which Is A Behaviour Of Interest. But Issues Rise When The Tasks Are Changing. Not On

One more thing, I'm trying to take full advantage of the controller, but I run into a problem in my use case.
The controller is super useful for creating a DAG of tasks which is a behaviour of interest. But issues rise when the tasks are changing. Not only parameter wise but code wise. So for example a task in the DAG might point to run.py. the Task created representing this file might be in commit a while I'm on commit b (changing same file) or even another team member might want to run the same DAG but his own run.py (on his local repo, different file location than one described by the Task) with his own commit id.
Running the Task so it would generate a draft missing the whole point of exploiting the DAG capabilities to my understanding. So how would you suggest we do that? (If there is a way to feed the Task a commit id by a parameter that would be great, but I don't know if that's possible dynamically through code without using the UI for that)

  
  
Posted 4 years ago
Votes Newest

Answers 27


SmarmySeaurchin8 I might be missing something in your description. The way the pipeline works,
the Tasks in the DAG are pre-executed (either with "execute_remotely" or actually fully executed once").
The DAG nodes themselves are executed on the trains-agent , which means they reproduce the code / env for every cloned Task in the DAG (not on the original Tasks).
WDYT?

  
  
Posted 4 years ago

Could you send the full log ?

  
  
Posted 4 years ago

I will send it to you privately, if that's okay

  
  
Posted 4 years ago

Hmm, is there a way to do this via code?

Yes, clone the Task Task.clone
Then do data=task.export_task() and edit the data object (see execution section)
Then update back with task.update_task(data)

  
  
Posted 4 years ago

Yes

  
  
Posted 4 years ago

On another topic, I've just now copied a Task that ran successfully yesterday and tried to run it. It failed to run and I got a
ERROR! Failed applying git diff, see diff above.Why is that?

  
  
Posted 4 years ago

Yeah I understand that. But since overriding parameters of pre executed Tasks is possible, I was wondering if I could change the commit id to the current one as well.
What do you mean by execute remotely? (I didn't really understand this one from the docs)

  
  
Posted 4 years ago

If I change the file at the entry point (let's say, I delete all of its content), how will trains behave when I try to clone and execute such task?

  
  
Posted 4 years ago

That is odd...

  
  
Posted 4 years ago

That is exactly that, the trains-agent is replicating the code from the git repo, and trying to apply the git diff (see uncommitted changes section). Obviously it failed 🙂

  
  
Posted 4 years ago

Sure

  
  
Posted 4 years ago

Okay, let's take a step back and I'll explain how things work.
When running the code (initially) and calling Task.init
A new experiment is created on the server, it automatically stores the git repo link, commit ID, and the local uncommitted changes . these are all stored on the experiment in the server.
Now assume the trains-agent is running on a different machine (which is always the case even if it is actually on the same machine).
The trains-agent will create a new virtual-environment for every experiment created, in the new venv it will install the packages based on what is written in "installed packages" section under experiment execution. Then it will clone the git repository (based on the definition written on the experiment), once the cloning is done, it will apply the "uncommitted changes" on the newly cloned code. This process will reproduce the state of the code in the original machine on a new remote machine.
Once everything is done, it will run the python script based on the "working directory" and "entry point" as written on the experiment.
Make more sense ?

  
  
Posted 4 years ago

AgitatedDove14
The easiest example for such use case as I describe is for example trying to run the full pipeline but in this experiment I wish to try Batch Norm which I haven't used in the pre executed Task. How can I do that without running this Task by it's own? (Which is quite problematic for me since it runs as a part of a pipeline, therefore using DAG)

  
  
Posted 4 years ago

sure no prob

  
  
Posted 4 years ago

Can you do it manually, i.e. checkout the same commit id, then take the uncommitted changes (you can copy paste it to diff.txt) then call git apply diff.txt ?

  
  
Posted 4 years ago

I've seen that the file location of a task is saved

What do you mean by that? is it the execution section "entry point" ?

  
  
Posted 4 years ago

Yes!

  
  
Posted 4 years ago

If I'm exact I would like to add "commit id" to the override arguments when adding a task as a step to the pipeline

  
  
Posted 4 years ago

I will try that.
In addition, I've seen that the file location of a task is saved, does it mean that when rerunning said task (for example clone it and enqueue it) trains will search for the file in the stored location? Or will it clone the repo with the given commit id and use the relative path to find this file?

  
  
Posted 4 years ago

I'm confused. Why would that matter what my local code is when trying to replicate an already ran experiment?
Also, between which files is the git diff performed? (I've seen the line
diff --git a/.../run.py b/.../run.pybut I'm not sure what's a and what's b in this context)

  
  
Posted 4 years ago

Hmm, is there a way to do this via code? I wish to do that before running the Pipeline so each task it contains would be updated to latest branch

  
  
Posted 4 years ago

But it still doesn't answer one thing, why when I cloned a previously successful experiment, it failed on git diff?

  
  
Posted 4 years ago

Sure, but before that, it seems that the script path parameter (which I think you refer to as entry_point) is not relative to the base of the repo, as I expected it to be, could that interfere?

  
  
Posted 4 years ago

or do you mean it tries to apply the already ran experiment's uncommitted changes? If that's the case, why did the new experiment fail if the previous experiment ran successfully?

  
  
Posted 4 years ago

Did you change the commit ID ?

  
  
Posted 4 years ago

Nope, I didn't change anything

  
  
Posted 4 years ago

Hi SmarmySeaurchin8

, I was wondering if I could change the commit id to the current one as well.

Actually that would be possible, but will need a bit of code to support controlling Task properties (not just configuration parameters)

How can I do that without running this Task by it's own?

Assuming you have a committed code that already supports it. You can clone the executed Task, and then change the commit ID to the "latest on branch" (see drop down when editing)

Would that help ?

  
  
Posted 4 years ago