Reputation
Badges 1
25 × Eureka!ElegantKangaroo44 definitely a bug, will be fixed in 0.15.1 (release in a week or so)
https://github.com/allegroai/trains/issues/140
Are tagging / archiving available in the API for a task?
Everything that the UI can do you can do programmatically π
Tags:
task.add_tags / set_tags / get_tags
Archive:
task.set_system_tags(task.get_system_tags() + ['archived'])
Lately I've heard of groups that do slices of datasets for distributed training, or who "stream" data.
Hmm so maybe a "glob" alike parameter for get_local_copy(select_filter='subfolder/*')
?
ElegantKangaroo44 my bad π I missed the nuance in the description
There seems to be an issue in the web ui -> viewingΒ plots in "view in experiment table" doesn't respect the "scalars to display" one sets when viewing in "view in fullscreen".
Yes the info-panel does not respect the full view selection, It's on the to do list to add this ability, but it is still no implemented...
ElegantKangaroo44 it seems to work here?!
https://demoapp.trains.allegro.ai/projects/0e152d03acf94ae4bb1f3787e293a9f5/experiments/48907bb6e870479f8b230e6b564cd52e/output/metrics/plots
Feel free to add to the UI request list:
https://github.com/allegroai/trains/issues/81
Yes there was a bug that it was always cached, just upgrade the clearmlpip install git+
ContemplativeCockroach39 unfortunately No directly as part of clearml π
I can recommend the Nvidia triton serving (I'm hoping we will have the out-of-the-box integration soon)
mean while you can manually run it , see docs:
https://developer.nvidia.com/nvidia-triton-inference-server
docker here
https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver
I am running from noebook and cell has returned
Well the Task will close when you shut down the notebook π
I want is to manually provide a name to each series equal to the subject name (Subject 1, Subject 2, etc.)
They appear as they are reported to TB. I think this is a PyTorchLightning thing... If you look as the TB produced, you will get the same naming schemes, no?!
oh, if this is the case, why not use the "main" server?
WackyRabbit7 interesting! Are those "local" pipelines all part of the same code repository? do they need their own environment ?
What would be the easiest pipeline interface to run them locally? (I would if we could support this workflow, it seems you are not alone in this approach, and of course that you can always use them remotely, i.e. clone the pipeline and launch it on an agent)
Is this a common case? maybe we should change the run_pipeline_steps_locally
argument to False?
(The idea of run_pipeline_steps_locally=True
is that it will be easier to debug the entire pipeline on the same machine)
I started running it again and it seems to have passed the phase where it failed last time
Yey!
Yes it is a common case....
I have the feeling ShinyLobster84 WackyRabbit7 you are not alone in this one π let me make sure we change the default value of Yes it is a common case
to False, so the code looks cleaner
Ohh, sorry π:param run_pipeline_steps_locally: (default False) If True, run the pipeline steps themselves locally as a subprocess (use for debugging the pipeline locally, notice the pipeline code is expected to be available on the local machine)
WackyRabbit7
we did execute locally
Sure, instead of pipe.start()
use pipe.start_locally(run_pipeline_steps_locally=False)
, this is it π
(It would be nice to have all the Pypi releases tagged in github btw)
I wanted to say, we listen ... and point to the tag , but for some reason it was not pushed LOL.
To be honest, I'm not sure I have a good explanation on why ... (unless on some scenarios an exception was thrown and caught silently and caused it)
The experiment finished completely this time again
With the RC version or the latest ?
BTW:
Just making sure, 74 was not supposed to be the last checkpoint (in other words it is not stuck on leaving the training process, but actually in the middle)
I was unable to reproduce, but I added a few safety checks. I'll make sure they are available on the master in a few minutes, could maybe rerun after?
Hmmm that sounds like a good direction to follow, I'll see if I can come up with something as well. Let me know if you have a better handle on the issue...
ElegantKangaroo44 I tried to reproduce the "services mode" issue with no success. If it happens again let me know maybe will better understand how it happened (i.e. the "master" trains-agent gets stuck for some reason)
but I'm pretty confident it was the size of the machine that caused it (as I mentioned it was a 1 cpu 1.5gb ram machine)
I have the feeling you are right π
Hi ElegantKangaroo44 ,
This is basically the number of average number of experiments running, and the number of projects, and number of users. I think this is about it. nothing like google-analytics stuff. It is mainly aimed at giving some idea on how large is the usage. Sounds reasonable?