Sorry for pinging you on this old thread.
...
And what was the learning strategy? ADAM? RMSProp?
Sorry, missed it...
I would actually use the HPO to test various setups (it uses Optuna under the hood so really SOTA hyper band Bayesian optimization ontop of them)
https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
Hmm can you try with additional configuration, next to "secure: true" in your clearml.conf, can you add "verify: false"
Hi TrickyRaccoon92 , TB is automatically collected and converted into data stored on the system The UI uses plotly to display the data itself (on your web browser).
You still have the original TB protobuf file, if you want to dive deeper and debug the data (it is not automatically uploaded, but some users do upload it as additional artifact on the experiment)
Make sense ?
Hi MelancholyChicken65
I'm assuming you need ssh protocol not https user/token, set this one to true ๐force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/76c533a2e8e8e3403bfd25c94ba8000ae98857c1/docs/clearml.conf#L39
Hi JumpyDragonfly13
- is "10.19.20.15" accessible from your machine (i.e. can you ping to it)?
- Can you manually SSH to 10.19.20.15 on port 10022 ?
Hi TightElk12
One option will be to call task.close() at the end of each step and task.init at the beginning of another.
Will that do?
@<1558624430622511104:profile|PanickyBee11> how are you launching the code on multiple machines ?
are they all reporting to the same Task?
hmm... try to run the trains-agent from the ml
environment with "system_site_packages: true", it might do the trick. Anyhow please let me know if it worked ๐
TroubledHedgehog16 generally speaking you can expect about 10 api calls per minute if you have many reports, and about 3 per minute on low report. We just optimized the sdk so in cases there are lots of consequential reports they are better batched, I would recommend the latest RC
Yey! okay let me make sure we add this feature to the Task.init arguments so one can control it from code ๐
It seems like the web server doesnโt log the call to AWS, I just see this:
This points to the browser actually sending the AWS delete command. Let me check with FE tomorrow
@<1657918706052763648:profile|SillyRobin38> out of curiosity did you compare performance of tensorrt-llm vs vllm ?
(the jury is still out on that, just wondered if you had a chance)
Hi SarcasticSparrow10
Is it better to post such questions on Stackoverflow so they benefit everybody?
Yes, I think you are correct it would please do ๐
Try to do " reuse_last_task_id='task_id_here'" ,t o specify the exact Task to continue )click on the ID button next to the task name in the UI)
If this value is true it will try to continue the last task on the current machine (based on project/name, combination) if the task was executed on another machine, it will just start a ...
Hi SucculentBeetle7
Sure check the latest implementation, it now has "start" and "start_remotely" ๐
SteepDeer88
Try the following:
` Task.add_requirements("pycocotools-windows", "; platform_system == "Windows"")
Task.add_requirements("pycocotools", "; platform_system != "Windows"")
Task.init(...) You should see in your "installed packages" something like:
pycocotools-windows ; platform_system == "Windows"
pycocotools ; platform_system != "Windows" `
why are all defined components shown in the UI Results/Plots/PipelineDetails/ExecutionDetails section? Shouldn't it make more sense to show only the ones that are used in that pipeline?
They are listed there (because of the decorator, you basically "say" these are steps so they are listed), the actual resolving (i.e. which steps are actually being called) is done in "real-time"
Make sense ?
@<1541954607595393024:profile|BattyCrocodile47> first let me say I โค the dark theme you have going on there, we should definitly add that ๐
When I run
python set_triggers.py; python basic_task.py
, they seem to execute, b
Seems like you forgot to start the trigger, i.e.
None
(this will cause the entire script of the trigger inc...
Hi @<1697056701116583936:profile|JealousArcticwolf24>
Awesome deployment ๐คฉ
Yes if you need another scalable model serving you can just run another instance of the clearml-serving-inference
https://github.com/allegroai/clearml-serving/blob/7ba356efc97a6ae2159283d198d981b3c1ab85e6/docker/docker-compose.yml#L77
So you end up with two of them, one per models environ...
Hi PerplexedCow66
I'm assuming an extension for this:
https://github.com/allegroai/clearml-serving/issues/32
Basically JWT can be used as a general access/block all endpoints, which is most efficnely used if handled by k8s loadbalancer (nginx/envoy),
but if you want a per-endpoint check (or maybe do something based on the JWT values)
See adding JWT to FastAPI here:
https://fastapi.tiangolo.com/tutorial/security/oauth2-jwt/?h=jwt#oauth2-with-password-and-hashing-bearer-with-jwt-tokens
T...
Okay, make sure that in your trains.conf
on all the trains-agent machine you add the following:agent.extra_docker_arguments: ["-v", "/etc/hosts:/etc/hosts",]
actually no it is not, alpine is Not a good baseline, is is very very slim missing a ton of stuff.
I would use bullseye or slim (depending how many aux things you need on the container)
https://hub.docker.com//python/tags?page=1&name=bullseye
https://hub.docker.com//python/tags?page=1&name=slim-bullseye
Hi JitteryCoyote63
Or even better: would it be possible to have a support for HTML files as artifacts?
If you report html files as debug media they will be previewed, as long as the link is accessible.
You can check this example:
https://github.com/allegroai/trains/blob/master/examples/reporting/html_reporting.py
In the artifacts, I think html are also supported (maybe not previewed as nicely but clickable.
Regrading the s3 link, I think you are supposed to get a popup window as...
Really what I need is for A and B to be separate tasks, but guarantee they will be assigned to the same machine so that the clearml dataset cache on that machine will be warm.
I think that what you are looking for is multi-machine cache (which is fully supported). Basically mount an NFS/SMB folder from a NAS to any of those machines, configure the cache folder to point to it, and not you do not need to worry about affinity ?
no?
Is there a way to group A and B into a sub-pipeline, h...
C will be submitted to a different queue and I donโt care as much
Is there a way to define โtask affinityโ in this way?
Hi RoughTiger69 ,
when you say Task affinity, you mean, I want C to be executed next to A/B ? Affinity as a concept doesn't really exist, it can be abstracted to a queue, where you have agents pulling from multiple queues. Then C can be pushed to one the the queues (in theory you might be able to programmtically control the Queue of C), wdyt?
DistressedGoat23 you are correct, since at the end this become a plotly object the extra_layout is for general purpose layout, but this specific entry is next to the data. Bottom line, can you open a github issue, so we do not forget to fix? In the mean time you can use the general plotly reporting as SweetBadger76 suggested
Hmm, I really like this one:
https://chart-studio.plotly.com/~empet/14632/plotly-joyplotridgelines/#plot
What I'm thinking is a global setting basically telling the TB binding layer to always do ridgeline instead of 3d surface.
wdyt?
Hi @<1523711619815706624:profile|StrangePelican34>
Hmm, I think this is missing from the docs, let me ping the guys about that ๐
orchestration module
When you previously mention clone the Task I the UI and then run it, how do you actually run it?
regarding the exception stack
It's pointing to a stdout that was closed?! How could that be? Any chance you can provide a toy example for us to debug?