Reputation
Badges 1
25 × Eureka!RoundMosquito25 actually you can š# check the state every minute while an_optimizer.wait(timeout=1.0): running_tasks = an_optimizer.get_active_experiments() for task in running_tasks: task.get_last_scalar_metrics() # do something here
base line reference
https://github.com/allegroai/clearml/blob/f5700728837188d7d6005726c581c9d74fd91164/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L127
HugeArcticwolf77 you can add --services-mode
to the agent, and it will basically keep on spinning Tasks in parallel (unfortunately the open source version does not include a way to limit it to a maximum of concurrent Tasks)
you should have a gpu argument there, set it to true
BTW:
I have very small text files that make up a dataset and compression seems to take most of the upload time
How long does it take? and how come it is not smaller in size ?
Now Iām just wondering if I could remove the PIP install at the very beginning, so it starts straightaway
AbruptCow41 CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
does exactly that š BTW, I would just set the venv cache and this means it will just be able to restore the entire thing (even if you have changed the requirements
https://github.com/allegroai/clearml-agent/blob/077148be00ead21084d63a14bf89d13d049cf7db/docs/clearml.conf#L115
yes you are correct, OS environment:TRAINS_PROC_MASTER_ID=1:task_id_here
CourageousLizard33 so you have a Linux server running Ubuntu VM with Docker inside?
I would imagine that you could just run the docker on the host machine, no?
BTW, I think 8gb is a good recommendation for a VM it's reasonable enough to start with, I'll make sure we add it to the docs
yep, that's the reason it is failing, how did you train the model itself ?
Is there a way to move existing pipelines between projects?
You should be able to, go to your settings page and turn on "show hidden folders"
Then go to your project, you should see " .pipeline
" sub project there, right click it and move it to another folder.
CourageousLizard33 if the two series are on the same graph, just click on the series in the legend, you can enable/disable it, and the scale will adjust automatically.
Regarding grouping, this is a feature that can be turned off, the idea is that we split the tag to title/series... So if you have the same prefix you get to group the TF scalars on the same graph, otherwise they will be on a diff title graph. That said you can make force it to have a series per graph like in TB. Makes sense?
Hi TenseOstrich47 whats the matplotlib version and clearml version you are using ?
TenseOstrich47 notice:task.logger.report_matplotlib_figure( title=f"Performance Heatmap - {name}", series="Device Brand Predictions", iteration=0, figure=figure, **report_image=True,** )
report_image=True means it will be uploaded as an image not a plot (like imshow), the default is False , which would put it under Plots section
Code you add a few prints, and see where it hangs ? there's no reason for it to hang (even the plot upload is done ...
- try with the latest RC
1.8.1rc2
, it feels like after git clone, it spend minutes without outputting anything
yeah that is odd , can you run the agent with --debug (add before the daemon
command) , and then at the end of the command add --foreground
Now launch the same task on that queue, you will have a verbose log in the console.
Let us know what you see
I'm assuming those errors are from the triton containers? where you able to run the simple pytorch mnist example serving from the repo?
Hi @<1526371965655322624:profile|NuttyCamel41>
How are you creating the model? specifically what do you have in "config.pbtxt"
specifically any python code should be in the pre/post processing code (actually not running on the GPU instance)
TartSeal39 please let me know if it works, conda is a strange beast and we do our best to tame it.
Specifically when you execute manually on a conda env we collect (separately) the conda packages & the python packages (so later we can replicate on both conda & pip, or at least do our best)
Are you running both development env and agent with conda ?
Hi @<1523701295830011904:profile|CluelessFlamingo93>
What do you mean? what's the difference between ClearML server and self hosted? both are self hosted no?
UnevenDolphin73 go to the profile page, I think at the bottom right corner you should see it
(Also ctrl-F5 to reload the web application, if you upgraded the server š )
Seems like it is working (including seaborn)
RoundMosquito25 good news, no no need to open any ports š
Basically B_i agents are always polling the server for "jobs" create an http/s request from them to the server, so all connections are out connections. Firewall is intact š
Hi @<1526371965655322624:profile|NuttyCamel41>
so sorry I just realized I have not answered it it!
I just tried the pytorch example from the clearml-serving repo and got the error about the wrong model name
okay that is odd, are you using the exact same containers / docker-compose? what is the difference ?
I0603 09:44:02.665851 41 model_lifecycle.cc:693] successfully loaded 'test_model_pytorch' version 1
does that mean that even though there is a warning there you can curl to ...
i hope can run in same day too.
Fix should be in the next RC š
Any specific use case for the required "draft" mode?
okay that's good, that means the agent could run it.
Now it is a matter of matching the TF with cuda (and there is no easy solution for that). Basically I htink that what you need is "nvidia/cuda:10.2-cudnn7-runtime-ubuntu16.04"
Hi @<1523701260895653888:profile|QuaintJellyfish58>
Based on the docs
None
I think this should have worked, are you running the actual task_scheduler
on yout machine? on the services queue ? what's the console output you see there ?
Hi UnsightlyBeetle11
Is it possible to report the model's architecture (PyTorch model) automatically on ClearML, as we do it via Netron or other neural network visualisation tools?You mean like the actual network layout? Unfortunately, there is currently no option to do that, you can however manually store a plot/image that represents it
BTW:I think that at the beginning Netron was somehow integrated, but it was rarely used and support for it was not trivial so it was phased out. You can ho...
Hi TenseOstrich47
You can check the new clearml-serving
, and the new python interfaces added to the "Model" class.
https://github.com/allegroai/clearml/blob/22d795f68f0175ba9511cabd444ea4dba464f3cd/clearml/model.py#L444
okay this seems like a broken pip install python3.6
Can you verify it fails on another folder (maybe it's a permissions thing, for example if you run in docker mode, then the permissions will be root, as the docker is creating those folders)