Reputation
Badges 1
25 × Eureka!Hi IrateBee40
What do you have in your ~/clearml.conf
?
Is it pointing to your clearml-server ?
No worries π glad it worked
or creating a dedicated function I would suggest also including the actual sampled point in the HP space.
Could you expand ?
This would be the most common use case, and essentially the reason for running the HPO understanding the sensitivity of metrics with respect to hyper-parameters
Does this relates to:
https://github.com/allegroai/clearml/issues/430
manually" filtering the keys I've put in for the HP space. I find it a bit strange that they are not saved as part of t...
Hi LovelyHamster1
That is a good point, sine the Pipeline kind of assumes the task are already in the system, it clone them (leaving you with the original Draft Task).
I think we should add a flag to that pipeline that if the Task is in draft it will use it (instead of cloning it) Since it seems your pipeline is quite straight forward, I'm not sure you actually need the pipeline controller class, you can perform the entire thing manually, see example here: https://github.com/allegroai/clea...
MysteriousBee56 I would do Task.create()
you can get the full Task internal representation with task.data
Then call task._edit(script={'repo': ...}) to edit/update all the Task entries.
You can check the dull details of the task object here: https://github.com/allegroai/trains/blob/master/trains/backend_api/services/v2_8/tasks.py#L954
BTW: when you have a sample script working, consider PR-ing it, I'm sure it will be useful for others π (also a great way to get us involved with debuggin...
Hi ConvolutedSealion94
You can archive / delete the SERVING-CONTROL-PLANE
Task from the DevOps project in the UI.
Do notice you will need to make sure the clearml-serving is updated with a new sesison ID or remove it (i.e. take down the pods / docker-compose)
Make sense ?
Were you able to interact with the service that was spinned? (how was it spinned?)
If i were to push the private package to, say artifactory, is it possible to use that do the install?
Yes that's the recommended way π
You add the private repo here, for the agent to use:
https://github.com/allegroai/clearml-agent/blob/e93384b99bdfd72a54cf2b68b3991b145b504b79/docs/clearml.conf#L65
are you referring toΒ
extra_docker_shell_
scrip
t
Correct
the thing is that this runs before you create the virtual environment, so then in the new environment those settings are no longer there
Actually that is better, because this is what we need to setup the pip before it is used. So instead of passing --trusted-host
just do:
` extra_docker_shell_script: ["echo "[global] \n trusted-host = pypi.python.org pypi.org files.pythonhosted.org YOUR_S...
Hi LovelyHamster1
As you noted, passing overrides in Args/overrides
, for example ['training.max_epochs=1000']
should work when running with the agent.
Could you verify with the latest RC, there was a fix to support the latest hydra versionpip install clearml==0.17.5rc5
However, are you thinking of including this callbacks features in the new pipelines as well?
Can you see a good use case ? (I mean the infrastructure supports it, but sometimes too many arguments is just confusing, no?!)
Hi WackyRabbit7 ,
Running in Docker mode provides you greater flexibility in terms of environment control, from switching cuda versions, to pre-compiled packages that are needed (think apt-get) etc. Specifically for DL if you are using multiple tensorflow versions, they are notorious for compiling against a specific CUDA version, and the only easy way to be able to switch between them would be different dockers. If your are a PyTorch user, then you are in luck, they have all the pytorch ver...
WackyRabbit7 if this is a single script running without git repo, you will actually get the entire code in the uncommitted changes section.
Do you mean get the code from the git repo itself ?
So when the agent fire up it get's the hostname, which you can then get from the API,
I think it does something like "getlocalhost", a python function that is OS agnostic
I think I found something, let me test my theory
The --template-yaml allows you to use foll k8s YAML template (the overrides is just overrides, which do not include most of the configuration options. we should probably deprecate it
Hmm, I think you should use --template-yaml
ExcitedFish86 this is a general "dummy agent" that tasks and executes them (no env created, no code cloned, as you suggested)
hows does this work with HPO?
The HPO clones Tasks, changes arguments, push them into a queue, and monitors the metrics in real time. The missing part (from my understanding) was the the execution of the Tasks themselves required setup, and that you wanted multiple machine support, in order to overcome it, I post a dummy agent that just runs the Tasks.
(Notice...
Hi @<1689808977149300736:profile|CharmingKoala14> , let me double check that
Assuming git repo looks something like:.git readme.txt module | +---- script.py
The working directory should be "."
The script path should be: "-m module.scipt"
And under the Configuration/Args, you should have:args1 = value args2 = another_value
Make sense?
Any idea why the Pipeline Controller is Running despite the task passing?
What do you mean by "the task passing"
I'm not sure on the frequency it updates though
Hi @<1532532498972545024:profile|LittleReindeer37>
Yes you are correct it should capture the entire jupyter notebook in sagemaker studio.
Just verifying this is the use case, correct ?
Hmm so is the problem having the gituser inside the code? or the k8s_glue print ?
But I think this error has only appeared since I upgraded to version 1.1.4rc0
Hmm let me check something
right click on the experiment, select Reset, now you can edit it.
In terms of creating dynamic pipelines and cyclic graphs, the decorator approach seems the most powerful to me.
Yes that is correct, the decorator approach is the most powerful one, I agree.
Hi SarcasticSquirrel56
But if I then clone the task, and execute it by sending it to a queue, the experiment succeeds,
I'm assuming that on the remote machine the "files_server" is not configured the same way as the local execution. for example it points to an S3 bucket the credentials for the bucket are missing.
(in your specific example I'm assuming that the plot is non-interactive which means this is actually a PNG stored somewhere, usually the file-server configuration). Does tha...