Reputation
Badges 1
25 × Eureka!maybe you can check alsoΒ
--version
Β that returns the helm menu
What do you mean? --version on cleaml-task ?
HmmCLEARML_CUSTOM_BUILD_OUTPUT
This might be an enterprise feature, I'm not aware of anything in the open source version
Hmm good point, it should probably return he clearml python version. Is this what you mean?
Hi @<1566959357147484160:profile|LazyCat94>
So it seems the arg parser is detecting the configuration YAML
The first thing I would suggest is changing it to a relative path (so that when launched on remote machines it will find the YAML file)
Regardless how are you launching the HPO ? are you spinning a new agent ?
(as background, argparser arguments are injected in realtime by the agent or the HPO when running as subprocesses)
This is odd, can you send th full log of the failed Task and if possible the code?
What are you seeing in the Task that was cloned (i.e. the one the HPO created not the original training task)?
by that I mean, configuration section, do you have the Args there ? (seems like the pic you attached, but I just want to make sure)
Also in the train.py file, do you also have Task.init ?
Is there a way to do this all elegantly?
Of yes there is, this is how TaskB code will look:
` task = Task.init(..., 'task b')
param = {'TaskA' :'TaskAs ID HERE'}
task.connect(param)
taska_model = Task.get_task(param['TaskA']).models['output''][-1]
torch.load(taska_model.get_local_copy())
train
torch.save('modelb') `I might have missed something there, but generally speaking this will let you:
Select TASKA as a parameter of TaskB training process Will register automagically Tasks'A...
SmallAnt76
see https://clear.ml/pricing/ , under "What plan should I choose?"
what you are looking for is the first column "open-source". make sense ?
Thank you AttractiveWoodpecker16 !
Removing the uncommitted changes so that you can launch it from an agent? Or is it visual only?
I had again the same problem but within a remote pipeline setup.
Are you saying the ussue is not fixed? can you verify the pipeline & pipeline components are using the at least 1.104rc0 version?
Makes sense
we need to figure what would be the easiest way to have an "opt-in" for the demo server, that will still make it a breeze to quickly test code integration ...
Any suggestions are welcomed π
Hi MuddySquid7
You can only add reports (scalars plots etc.) , though not to a published Task.
If you want to add an artifact, this should work.prev_task = Task.get_task(task_id='112233') prev_task.mark_started(force=True) prev_task.reload() prev_task.upload_artifact(..., wait_for_upload=True) prev_task.mark_stopped(force=True)
I can raise this as an issue on the repo if that is useful?
I think this is a good idea, at least increased visibility π
Please do π
Hi GreasyPenguin14
It looks like you are trying to delete a Task that does not exist
Any chance the cleanup service is misconfigured (i.e. accessing the incorrect server) ?
Hi CleanWhale17 let me see if I can address them all
Email Alert for finished Job(I'm not sure if it's already there).
Slack integration will be public by the end of the weekend π
It is fully customization / extendable, I'll be happy to help .
DVC
Full dataset tracking is supported using the artifacts and the ability to integrate to any central storage (shared folders/ S3 / GS / Azure etc.)
From my experience, it is easier to work with artifacts from Data-Processing Tasks...
The odd thing it was able to authenticate but then it could not find the Task to delete.
Could it be someone already deleted the Task ?
(BTW: a new version of the cleanup service is in the working π )
That makes total sense, this is exactly an OS scenario for signal 9 π
ImmensePenguin78
I think the latest RC adds it, should be released later today π
data it is going to s3 as well as ebs. Why so it should only go to s3
This sounds odd, if this is mounted then it goes to the S3 (the link will point to the files server, but it will be stored on the mounted drive i.e. S3)
wdyt?
i have it deployed successfully with istio.
Nice!
the only thing we had to do to get it to work was to modify the nginx.conf in the webserver pod to allow http 1.1
I was under the impression we fixed that, let me check
Hmm SuccessfulKoala55 any chance the nginx http was pushed to v1.1 on the latest cloud helm chart?
EnviousStarfish54 generally speaking the hyper parameters are flat key/value pairs. you can have as many sections as you like, but inside each section, key/value pairs. If you pass a nested dict, it will be stored as path/to/key:value (as you witnessed).
If you need to store a more complicated configuration dict (nesting, lists etc), use the connect_configuration, it will convert your dict to text (in HOCON format) and store that.
In both cases you can edit the configuration and then when ru...
the unclear part is how do I sample another point in the optimization space from the optimizer
Just so I'm clear on the issue, you want multiple machines to access the internals of the optimizer class ? or Do you just want a way to understand what is the optimizer sampling space (i.e. the parameters and options per parameter) ?
the optimizer such that the study object of the optimizer keeps track of the results and the next sample will be aware of all previous studies
This is done from the optimizer side, by sampling the scalars reported by any experiment the optimizer created.
I am looking for a way to manually sample and report from and to the optimizer...
.. I can avoid running unnecessary common heavy setup, for a light weight experiment
Maybe it makes sense to inherit from the Optimizer and add ...
But it does make me think, if instead of changing the optimizer I launch a few workers that "pull" enqueued tasks, and then report values for them in such a way that the optimizer is triggered to collect the results? would it be possible?
But this is Exactly how the optimizer works.
Regardless of the optimizer (OptimizerOptuna or OptimizerBOHB) both set the next step based on the scalars reported by the tasks executed by agents (on remote machines), then decide on the next set of para...
Hi ScatteredClams84
Is there any parameter that adjusts the "number of files that can be stored in the cache"? I am using clearml python version 1.0.3 to upload artifacts and get the artifacts back from a task.Β (edited)
Yes you are correct, the default value is 100 entries.
You can configure it in the clearml.conf, just add:sdk.storage.cache.default_cache_manager_size = 1000
or from code:
` from clearml.storage.cache import CacheManager
CacheManager.get_cache_manager(cache_file_...
The difference is that I want a single persistent machine, with a single persistent python script that can pull execute and report multiple tasks
So basically instead of using the agent, so simply spin a sub process ?
I have the same offset (that appear after each fail on my scalars).
Hmm, I actually would think this is the "correct" behavior, but I see your point:
Any chance you can open a GH issue ?