You might be able to also find out exactly what needs to be pickled using the f_code
of the function (but that's limited to C implementation of python).
AgitatedDove14
how would you specify the main python script entry point?
If you want to use a script, then the entry point should be the trivial one (either __main__
or main()
).
wouldn't that make more sense rather than a function call?
From what I can tell, W&B provide both options - either to specify a script path/module (that will be run regularly) or specify a function as an entry point.
Analysis of the actual repository (i.e. it will actually look for imports
) this way you get the exact versions you hve, but nit the clutter of the entire virtual environment
You could also do that here instead of what I suggested if it's easier. In what I suggested you also get the exact versions.
OddAlligator72 I like this idea.
The single thing I'm not sure about is the "function entry point"
Why would one do that? Meaning why wouldn't you have a proper python entry-point.
The reason I'm reluctant is that you might have calls/functions/variables in global scope of the file storing the function, and then users will not know why something broke, ans it will be very cumbersome to debug.
A simple script entry point seems trivial to launch and debug locally.
What do you think ? What would be your specific use case for that?
AgitatedDove14 It is not ideal, but might suffice. If I'll decide to pursue this path, I'll get back to you with this. Thanks!
I like the idea of using the timeit interface, and I think we could actually hack it to do most of the heavy lifting for us 🙂
Could you please elaborate on how to use Task.create
to achieve this?
Any chance you create an Issue on GitHub with this feature suggestion, If we have some support we could accelerate the implementation
Sure.
I do this:
` base_task = Task.create(project_name=self.regression_project_name,
task_name=BASE_TASKS[block_type][engine], task_type=task_type)
params = base_task.export_task()
Git repo
params['script']['repository'] = subprocess.check_output(['git', 'config', '--get', 'remote.origin.url'],
cwd=REPO_NAME).decode().strip()
Git commit
params['script']['version_num'] = subprocess.check_output(['git', 'rev-parse', 'HEAD'],
cwd=REPO_NAME).decode().strip()
Git branch
params['script']['branch'] = subprocess.check_output(['git', 'rev-parse', '--abbrev-ref', 'HEAD'],
cwd=REPO_NAME).decode().strip()
Git diff
params['script']['diff'] = subprocess.check_output(['git', 'diff'],
cwd=REPO_NAME).decode()
Dir to execute code from - . corresponds to base git repo
params['script']['working_dir'] = '.'
Code execution script path
params['script']['entry_point'] = os.path.relpath(EXECUTABLES[(engine, block_type)], REPO_NAME)
Conda env path to run from TODO: change this to the env on host to interperter once trains stabilizes
params['execution']['docker_cmd'] = '/home/egarbin/.conda/envs/regression'
So it wouldn't try to install packages from cache (as it takes the given conda env)
params['script']['requirements']['pip'] = '\n \n'
params['script']['requirements']['conda'] = '\n \n'
base_task.update_task(params) `
OddAlligator72 quick question:
suggest that you implement a simple entry-point API
How would the system get the correct packages / git repo / arguments if you are only passing a single function entrypoint ?
Thanks to you both. When I'll get back to this, I'll have a deeper look and see if it fits my needs. I do, however, suggest that you implement a simple entry-point API like W&B does. Maybe something along the lines ofTask.create("Name", function=start_task_func)
Implementing the whole thing Emanuel wrote above, or using raw JSON seems very tedious.
Of course you can edit which parameters you like
I think it's nicer when you want to wrap some execution path, and not just use it. If you could also provide the aforementioned pickled extra parameters, then this will be extremely useful.
The reason I'm reluctant is that you might have calls/functions/variables in global scope of the file storing the function, and then users will not know why something broke, ans it will be very cumbersome to debug.
The global scope for that function is the local scope of the current function. You could always pickle locals()
(and warn regarding unpicklable parameters), or just warn that all used parameters must be passed explicitly (as arguments, or like in timeit.timeit(..., globals=...)
).
If you want the "magic" property. Otherwise, you could also allow specifying a globals
argument like in timeit
.
OddAlligator72 just so I'm sure I understand your suggestion:
pickle the entire locals()
on current machine.
On remote machine, create a mock entry point python, restore the "locals()" and execute the function ?
BTW:
Making this actually work regardless on a machine is some major magic in motion ... 😉
OddAlligator72 FYI you can also import / export an entire Task (basically allowing you to create it from scratch/json, even without calling Task.create)Task.import_task(...) Task.export_task(...)
That's actually very easy. The correct packages and repo are the same as now - the loaded ones (if you pass a function as an argument, you already loaded its module and related packages from the relevant git repo & commit).
For the arguments, you could extract them using task.get_parameters_as_dict()
. You could also allow passing additional arguments that will be pickled (but that's unnecessary):Task.create("Name", function=start_task_func, arg1, arg2, arg3=arg3)
The W&B interface is very intuitive and simple at that point:
` def train():
run = wandb.init()
print("config:", dict(run.config))
for epoch in range(35):
print("running", epoch)
wandb.log({"metric": run.config.param1, "epoch": epoch})
time.sleep(1)
wandb.agent(sweep_id, function=train) `I was hoping that you had something similar.
GrumpyPenguin23 Actually, no. I wish to create an experiment from scratch starting at well-defined entry point (either a script or a function).
I wish to do this in order to wrap my existing framework with a new entry-point such that, at least for the time being, I will not need to modify the innards of the framework in order to deploy it well. I would also like to do this dynamically, such that the wrapped entry point could be configured externally.
OddAlligator72 what you are saying is, take the repository / packages from the runtime, aka the python code calling the "Task.create(start_task_func)" ?
Is that correct ?
BTW: notice that the execution itself will be launched on other remote machines, not on this local machine
OddAlligator72 okay, that is possible, how would you specify the main python script entry point? (wouldn't that make more sense rather than a function call?)
How do you determine which packages to require now?
Analysis of the actual repository (i.e. it will actually look for imports 🙂 ) this way you get the exact versions you hve, but nit the clutter of the entire virtual environment
OddAlligator72 can you link to the wandb docs? Looks like you want a custom entry point, I'm thinking "maybe" but probably the answer is that we do it a little differently here.
Hmm interesting ...
Any chance you create an Issue on GitHub with this feature suggestion,
If we have some support we could accelerate the implementation
AgitatedDove14 Possibly. You could also specify additional packages to require just like you do now (in params['script']['requirements']['pip']
).
How do you determine which packages to require now?
You might be able to also find out exactly what needs to be pickled using the
f_code
of the function (but that's limited to C implementation of python).
Nice!
GrumpyPenguin23 Might be. Like I wrote before - I put that path on hold for now. Thanks.
OddAlligator72 FYI, in you current code you can always doif use_trains: from trains import Task Task.init()
Might be easier 😉
OddAlligator72 I think you got sidetracked into the wrong corner here, lets decompose what you are asking for please, tell me if I am getting somewhere near what you mean:
you have an experiment you already ran you want to change the parameters in it and run it again if possible you only want to run a single function in the file attached to that experiment
OddAlligator72 so if I get you correctly, it is equivalent to creating a file called driver.py with all your entry points with an argparser and using it instead of train.py?