Reputation
Badges 1
25 × Eureka!VivaciousWalrus99 any chance the original Task was executed with python2 ?
what do you have for:ls -la /cs/usr/gal.hyams/.trains/venvs-builds/3.7/bin/
LudicrousParrot69 this is implementation issue, this entire page is based on "task comparison" single Task means totally different interface for querying the data 🙂
You can run this code from anywhere. The 'base_task_id' is actually the pipeline controller Task ID.
BTW: Next version will have a nicer interface to query it, but this code will work on the current version
Working on it as we speak 🙂 Hopefully in the next release (probably next week)
Hi LudicrousParrot69
A bit of background:
A Task is a job executed in the system (sometime it is an experiment training, sometime a controller like the pipeline). Basically everything process can be a task.
Specifically the pipeline controller itself (i.e. the process running the Bayesian optimization) is Task in the system (i.e. a job running). What it does (using the HyperParameterOptimizer) is cloning previously executed Tasks (e.g. training experiments), change their parameters and moni...
Bad news, there isn't a nice interface to get the table from the Optimizer object (I will make sure we add it, no reason not to).
But you can very easily get all the information you need and more:all_the_tasks = an_optimizer.get_top_experiments(top_k=100)
Then for every task in the list you can get All the information:for task in all_the_tasks: task_params_as_dict = task.get_parameters() task_scalars = task.get_last_scalar_metrics()
Basically the Task object enables you to que...
I see what you mean.an_optimizer = HyperParameterOptimizer( base_task_id='39d2c27baa8145929b2e21f686a17046', hyper_parameters=[], objective_metric_title='epoch_accuracy', objective_metric_series='epoch_accuracy', objective_metric_sign='max', optimizer_class=aSearchStrategy, max_iteration_per_job=0, total_max_jobs=0, auto_connect_task=False, ) print(an_optimizer.get_top_experiments(top_k=5))
, is the team open to PRs from external people?
Yes please do! PRs are welcomed! I thought we fixed the GitHub readme to reflect it, anyhow I'll make sure we do 🙂
LudicrousParrot69 you mean post execution or while you are executing the hyperparameter optimizer ?
I see, give me a minute to check what would be the easiest
I see now, give me a minute I'll check
LudicrousParrot69 ,
Are you trying to post execution parse the attached Table, then put it into a CSV on the HPO Task ?
Essentially the example provide just prints out ids to the log file,
What do mean?
The easiest would be as an artifact (I think).
Let's assume you put it into a csv file (with pandas or mnaually)
To upload (from the pipeline Task itself):task.upload_artifacts(name='summary', artifact_object='~/my/summary.csv')
Then if you want to grab it from anywhere else:task = Task.get_task(task_id='HPO controller Task id here') my_csv = Task.artifacts['summary'].get_local_copy()
If you want to store as dict it might be even easier:
` task.upload_artifacts(name='summary', artifa...
They inherit from one another, so it does make sense. Also the add_tags is on the "main" Task and not the backend parent
BTW: the new documentation should contain a full search over the docstring
Just fixed, will be merged later, basically some field you are not supposed to change post execution (but system tags should be exempt from that). The SDK checks before the backend does, so you get a nice error 🙂 anyhow the backend will obviously allow it
Pycharm does get confused sometimes
LudicrousParrot69 there is already
Task.add_tags
https://github.com/allegroai/clearml/blob/2d561bf4b3598b61525511a1a5f72a9dba74953e/clearml/task.py#L964
Hi LudicrousParrot69
I guess you are right this is not trivial distinction:
min: means we are looking for the the minimum value of a specific scalar. meaning 1.0, 0.5, 1.3 -> the optimizer will get these direct values and will optimize based on that
global min: means the optimizer is getting the minimum values of the specific scalar. With the same example: 1.0, 0.5, 1.3 -> the HPO optimizer gets 1.0, 0.5, 0.5
The same holds for max/global_max , make sense ?
Bugs, definitely GitHub, this is the easiest to track.
Documentation, if these are small issues, Slack is fine, otherwise, GitHub issue.
Regrading the documentation, we are working on another iteration of improvement, but if you find inaccuracies/broken links please report 🙂
Found it, definitely a bug in the callback, it has not effect on the HPO process itself
Hi LudicrousParrot69
Not sure I follow, is this pyfunc running remotely ?
Or are you looking for interfacing with previously executed Tasks ?
LudicrousParrot69
I "think" I have a better handle on what you wish to do.
Is it kind of generic "serving" solution?
FYI:
Model artifact is, usually, a weights/model file. The idea that later you will be able to access it and serve it. Now the problem is (and I think this is what you are referring to) there is usually a specific piece of code tied to that model that can use it (a.k.a pyfunc)
A few ideas:
These days everyone is trying to build their models with generic interface, so that scik...
Doesnt solve the issue if a HPO run is going to take a few days
The HPO Task has a table of the top performing experiments, so when you go to the "Plot" tab you get a summary of all the runs, with the Task ID of the top performing one.
No need to run through the details of the entire experiments, just look at the summary on the HPO Task.
LudicrousParrot69 we are working on adding nested project which should help with the humongous mass the HPO can create. This is a more generic solution for the nesting issue. (since nesting inside a table is probably not the best UX solution 🙂 )
Are tagging / archiving available in the API for a task?
Everything that the UI can do you can do programmatically 🙂
Tags:
task.add_tags / set_tags / get_tags
Archive:
task.set_system_tags(task.get_system_tags() + ['archived'])
Thanks DefeatedOstrich93
Let me check if I can reproduce it.
DefeatedOstrich93 what do you mean by "I am wondering why do I need to create files before applying diff ?"git diff
will not list files unless their are added (they are marked as "untracked") think temp files logs etc. until you add a file to git it will basically ignore that file. Make sense ?