CrookedWalrus33 can you send the entire log? (you can DM it to me)
` from clearml.automation.parameters import LogUniformParameterRange
sampler = LogUniformParameterRange(name='test', min_value=-3.0, max_value=1.0, step_size=0.5)
sampler.to_list()
Out[2]:
[{'test': 1.0},
{'test': 3.1622776601683795},
{'test': 10.0},
{'test': 31.622776601683793},
{'test': 100.0},
{'test': 316.22776601683796},
{'test': 1000.0},
{'test': 3162.2776601683795}] `
For the on-prem you can check the k8s helm charts it case spin agents for you (static agents).
For the GKE the best solution is the k8s glue:
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py
WackyRabbit7 this is funny, it is not ClearML providing this offering
some generic company grabbed the open-source and put t there, which they should not ๐
MuddySquid7
are you saying that for some reason the models pick the artifacts ? Is that reproducible ? (they are two different things)
Can you see the df.pkl on the Models section of the Task (in the UI) ?
This should have worked with the latest clearml RC.
And you verified it is not working?
If I call explicitlyย
task.get_logger().report_scalar("test", str(parse_args.local_rank), 1., 0)
ย , this will log as expected one value per process, so reporting works
JitteryCoyote63 and do prints get logged as well (from all processes) ?
Apologies on the typo ;)
There is also a global "running_remotely" but it's not on the task
okay that makes sense, if this is the case I would just use clearml-agent execute --id <task_id here> to continue the training Task.
Do notice you have to reload your last chekcpoint from the Task's models/artifacts to continue ๐
Last question, what is the HPO optimization algorithm, is it just grid/random search or optuna hbop/optuna, if this is the later, how do make it "continue" ?
I see, actually what you should do is a fully custom endpoint,
- preprocessing -> doenload video
- processing -> extract frames and send them to Triton with gRPC (see below how)
- post processing, return a human readable answer
Regrading the processing itself, what you need is to take this function (copy paste):
None
have it as internal `_process...
GrievingTurkey78 Actually it is in progress, see the GitHub issue for details:
https://github.com/allegroai/trains/issues/219
Hi @<1523702932069945344:profile|CheerfulGorilla72>
I think more details re needed here:)
but I belive it should have work with 0.14.1 as well
Correct
Eg, i'm creating a task usingย
clearml.Task.create
ย , often it doesn't properly get the git diff correctly,
ShakyJellyfish91 Task.create does not store any "git diff" automatically, is there a reason not to use Task.init ?
This only talks about bugs reporting and enhancement suggestions
I'll make sure this is fixed ๐
LOL yes ๐
just make sure it won't be part of the uncommitted changes of the AWS autoscaler ๐
This is definitely a but, in the super class it should have the same condition (the issue is checking if you are trying to change the "main" task)
Thanks ApprehensiveFox95
I'll make sure we push a fix ๐
Hi JitteryCoyote63
If you want to stop the Task, click Abort (Reset will not stop the task or restart it, it will just clear the outputs and let you edit the Task itself) I think we witnessed something like that due to DataLoaders multiprocessing issues, and I think the solution was to add 'multiprocessing_context='forkserver' to the DataLoaderhttps://github.com/allegroai/clearml/issues/207#issuecomment-702422291
Could you verify?
Hi @<1546303269423288320:profile|MinuteStork43>
Failed uploading: cannot schedule new futures after interpreter shutdown
Failed uploading: cannot schedule new futures after interpreter shutdown
This is odd where / when exactly are you trying to upload it?
I see.
You can get the offline folder programmatically then copy the folder content (it's the same as the zip, and you can also pass a folder instead of zip to the import function)task.get_offline_mode_folder()You can also have a soft link of the offline folder (if you are working on a linux machine:ln -s myoffline_folder ~/.trains/cache/offline
My question was about the automatically uploaded models. Those that were uploaded by clearml client.
So there is a way to add a callback would that work?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/binding/frameworks/init.py#L137def callback(_, model_info): model_info.name = "my new name" return model_info
Could it be someone deleted the file? this is inside the temp venv folder but it should not get there
Yes ๐ documentation is being worked on ... Anyhow we will be uploading a new documentation site soon (hopefully in a week or so), putting it all on GitHub so it will be easier for the community to edit and add more
Hi RoundMosquito25
The main problem here is there is no way to know before running the Task how much memory it would need ... And without that parameter maximizing GPUs is quite challenging. wdyt?
By default SSH server is not running in a lot of scenarios (k8s for example, Windows, MacOS)...