Reputation
Badges 1
25 × Eureka!SmarmySeaurchin8args=parse.parse() task = Task.init(project_name=args.project or None, task_name=args.task or None)
You should probably look at the docstring π
:param str project_name: The name of the project in which the experiment will be created. If the project does
not exist, it is created. If project_name
is None
, the repository name is used. (Optional)
:param str task_name: The name of Task (experiment). If task_name
is None
, the Python experiment
...
DistressedGoat23
We are running a hyperparameter tuning (using some cv) which might take a long time and might be even aborted unexpectedly due to machine resources.
We therefore want to see the progress
On the HPO Task itself (not the individual experiments the one controlling it all) there is the global progress of the optimization metric, is this what you are looking for ? Am I missing something?
I still can't get it to work... I couldn't figure out how can I change the clearml version in the runtime of the Cleanup Service as I'm not in control of the agent that executes it
Let's take a step back. Let's remove the clearml-services from the docker compose for a second, and run it manually (then you can control everything). Once you have it running manually, let's try to replicate the setup back to the docker compose, make sense ?
Train Data Params/a = {} Train Data Params/b = ...
Then maybe we could "hack" it so that if you edit it in the UI like so:Train Data Params/a = {'new': 'value'} Train Data Params/b = ...
You end up withparam = {'a': {'new': 'value'}, 'b' : ... }
What do you think?
Hi LudicrousParrot69
I guess you are right this is not trivial distinction:
min: means we are looking for the the minimum value of a specific scalar. meaning 1.0, 0.5, 1.3 -> the optimizer will get these direct values and will optimize based on that
global min: means the optimizer is getting the minimum values of the specific scalar. With the same example: 1.0, 0.5, 1.3 -> the HPO optimizer gets 1.0, 0.5, 0.5
The same holds for max/global_max , make sense ?
By default SSH server is not running in a lot of scenarios (k8s for example, Windows, MacOS)...
IntriguedRat44 how do I reproduce it ?
Can you confirm that marking out the Task.init(..) call will fix it ?
Is this some sort of polling ?
yes
End of the day, we are just worried whether this will hog resources compared to a web-hook ? Any ideasΒ (edited)
No need to worry, it pulls every 30 sec, and this is negligible (as a comparison any task will at least send a write request every 30 sec, if not more)
Actually webhooks might be more taxing on the server, as you need to always have a webhook up (i.e. wasting a socket ...)
Hmm CourageousLizard33 seems you stumbled on a weird bug,
This piece of code only tries to get the username of the current UID, but since you are running inside a docker and probably set the environment UID but there is no "actual" UID by that number on /etc/passwd , and so it cannot resolve it.
I'm attaching a quick fix, please let me know if it solved the problem.
I'd like to make sure we have it in the next RC as soon as possible.
SmugOx94
after having installedΒ
numpy==1.16
Β in the first case orΒ
numpy==1.19
Β in the second case. Is it correct?
Correct
the reason is simply that I'd like to setup an MLOps system where
I see the rational here (obviously one would have to maintain their requirements.txt)
The current way trains-agent
works is that if there is a list of "installed packages" it will use it, and if it is empty it will default to the requirements.txt
We cou...
function and just seem to be getting an "isadirectory" error?
Can you post here what you are getting ? which clearml version are you using ?!
also tried manually adding
leap==0.4.1
in the task UI which didn't work.
That has to work, if it did not, can you send the log for the failed Task (or the Task that did not install it)?
The environment in the logs does show that leap is being installed potentially from a cache?
- leap @ file:///opt/keras-hannd...
JitteryCoyote63
are the calls from the agents made asynchronously/in a non blocking separate thread?
You mean like request processing on the apiserver are multi-threaded / multi-processed ?
Hi SquareFish25
Sure, here are a few:
HPO
https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
Pipeline
https://github.com/allegroai/trains/blob/master/examples/pipeline/pipeline_controller.py
Automation:
https://github.com/allegroai/trains/blob/master/examples/automation/task_piping_example.py
(basically python abusing types/casting where the value can be both str/bool on the same argparser aergument)
FiercePenguin76
So running the Task.init from the jupyter-lab works, but running the Task.init from the VSCode notebook does not work?
My question is, which version do you need docker compose?
Ohh sorry, there is no real restriction, we just wanted easy copy-paste for the installation process.
Hi DeliciousBluewhale87
When you say "workflow orchestration", do you mean like a pipeline automation ?
In that case, no the helm chart does not spin a default agent (You should however spin a service mode agent for running pipelines logic)
GrievingTurkey78 I'm not sure I follow, are you asking how to add additional scalars ?
Hmm you either need to run with SUDO or make sure the running user has docker run permissions
Hi EnviousStarfish54
I think this is what you are after
task.connect_configuration(my_dict_here, name='my_section_name')
BTW:
if you do task.connect(a_flat_dict, name='new section') you will have the key/value in a section name called "new section"
If I install using
pip install -r ./requirements.txt
then pip installs the packages in the order of the requirements file.
Actually this is not how it works, pip will install in any way it sees fit, and it is not consistent between versions (it has to do with dependency resolving)
However, during the installation process from ClearML, it installs the packages in order UNLESS there's a custom path provided, then it's saved for last
Correct because the custom (I...
The agent is installing the "Installed Paclages" section of the Task (think of it as requirements.txt)
And again, what do you have there? Is it the outcome of the Task.init auto populating it?
Hi JuicyFox94 ,
Actually we just added that π (still on GitHub , RC soon)
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/controller.py#L696
for example train.py & eval.py under the same repo
BTW, VexedKangaroo32 are you using torch launch ?
@<1546303254386708480:profile|DisgustedBear75> is think this was a UI bug, they are just releasing a new version that fixes that (i.e. server version), are you running a self-hosted server?