I'm assuming you mean for the clients, right?
WickedGoat98 what's the clearml version you are using?
WickedGoat98 the agent itself can be executed on bare metal, no need to setup a docker for it (although fully supported)
Specifically the docker compose has the docker running in services mode, i.e. for CPU light weight tasks such as running pipelines .
If the agent running on GPU, the easiest way to is run on bare metal
BoredHedgehog47 you need to configure the clearml k8s glue to spin pods (instead of allocating agents per pods statically) does that make sense ?
Good, so we narrowed it down. Now the question is how come it is empty ?
HandsomeCrow5 OMG the guys already added it to the debug samples as well, checkout the demo app (drop down "test html sample"):
https://demoapp.trains.allegro.ai/projects/4e7fef090aa849b1acc37d92b59b3360/experiments/83c9ed509f0e421eaadc1ef56b3af5b4/info-output/debugImages
BeefyCow3 if you are trying to optimizer a specific metric (i.e. a scalar on a graph). The template Task should report it with the same title/series combination, which should be easy enough to verify in the UI 🙂
You can either report with Tensorboard or with the Trains Logger, either way will work.
GiganticTurtle0 quick update, a fix will be pushed, so that casting is based on the Actual value passed not even type hints 🙂
(this is only in case there is no default value, otherwise the default value type is used for casting)
Is this per Task or for all the Tasks always ?
Yes, the agent's mode is global, i.e. all tasks are either inside docker or in venv. In theory you can have two agents on the same machine one venv one docker listening to two diff queues
Okay so my thinking is, on the pipelinecontroller / decorator we will have:abort_all_running_steps_on_failure=False (if True, on step failing it will abort all running steps and leave)
Then per step / component decorator we will havecontinue_pipeline_on_failure=False (if True, on step failing, the rest of the pipeline dag will continue)
GiganticTurtle0 wdyt?
well that depends on you, what did you write there to know it is the best one ? file name ? added some metric ?
Hi @<1523701295830011904:profile|CluelessFlamingo93>
from your log:
ImportError: cannot import name 'packaging' from 'pkg_resources' (/home/bat/.clearml/venvs-builds/3.9/lib/python3.9/site-packages/pkg_resources/__init__.py)
I'm guessing yolox/setuptools
None
Try adding to the "Installed packages"
setuptools==69.5.1
(Something about the `setup...
VexedCat68 are you manually creating the OutputModel object?
Hey WickedGoat98
I found the bug, it is due to the fact the numpy (passed to plotly) contains both datetime and nan, and plotly.js does not like it. I'll make sure this is fixed, in the meantime you can just remove the first row (it contains the nan):df = pd.concat([tickerDf.Close, tickerDf_Change.Close_pcent], axis=1) df = df[1:]
MysteriousBee56 and please this one: "when you run the trains-agent  with --foreground , before it starts the docker it print the full command line"
I think this is due to the label map including some keys with aÂ
.
 in them.
Hi TenseOstrich47 what do you mean "label"
Hmm that sounds like the agent needs to access a vault with credentials per user, unfortunately this is not covered in the open-source 😞 I "think" this is supported in the enterprise version as part of the permission management
Another (minor) issue is that all the packages that are installed using git+https are cloned and installed twice, immediately one after the other
Yes this is so that we can better log the installed package name, not a major issue, but we just fixed a bug with derivative packages from git packages.
https://github.com/allegroai/trains/issues/196
WithÂ
pipe.start(queue='services')
, it still tries to run some docker for some reason
The services agent is always running with --docker:
https://github.com/allegroai/clearml-agent/blob/e416ab526ba9fe05daa977b34c9e46b50fb214a0/docker/services/entrypoint.sh#L16
Actually I think we should have it as an argument, so it is easier to control from docker-compose
I'll be waiting for the full log to check the "git clone" issue
Hi AdventurousRabbit79
Try:"extra_clearml_conf" : "aws { s3 {key: A, secret : B, region: C, }} ",Generally speaking no need for the quotes on the secret/key
You also need the comma to separate between keys.
You can test if it is working by adding the same string to your local clearml.conf and importing the cleaml package
Hi ColossalDeer61 ,
the next trains-agent RC (solving the #196 issue) will also solve the double install issue 🙂
Hi GiddyTurkey39
Glad to see that you are already diving into the controllers
(the stable release will be out early next week)
A bit of background on how the pipeline controller are designed:
All steps in the pipeline are experiments already registered in the system (i.e. you can see them in the UI). Regardless on how you created those experiments they have to be there prior to the pipeline launch. The pipeline itself can be executed on any machine (it does very little, and...
Hmm so there is a way to add callbacks (somewhat cumbersome, and we would live feedback) so you can filter them out.
What do you think, would that work?
Hi CloudySwallow27
This error occurs randomly during training (in other words training does successfully start).
What's the cleamrl-agent version you are using, and the clearml version ?
Is this example working for you?
https://github.com/allegroai/clearml/blob/master/examples/reporting/model_config.py
UnevenDolphin73 it seems this is a UI browser limit, this means we will need to move it into the server ...
See here: https://clearml.slack.com/archives/CTK20V944/p1640247879153700?thread_ts=1640135359.125200&cid=CTK20V944
Hi @<1561885941545570304:profile|PunyKangaroo87>
What do mean by store data locally?
Like clearml-data? I.e Dataset?
You can always use file:///root/path/folder as destination, this will store everything into the local folder, is that it?
BTW:
This is very odd "~/.clearml/venvs-builds.3/3.6/bin/python" it thinks it is using "python 3.6" but it is linked with python 2.7 ...
No idea how that could happen