Are you asking regrading the k8s integration ?
(This is not a must, you can run the clearml-agent
bare-metal on any OS)
EnviousStarfish54 following on this issue, the root cause is that dictConfig will clean All handlers if Not passed "incremental": True
conf_logging = { "incremental": True, ... }
Since you pointed that Kedro is internally calling logging.config.dictConfig(conf_logging)
,
this seems like an issue with Kedro as this call will remove All logging handlers, which seems problematic. wdyt ?
It might be that the worker was killed before unregistered, you will see it there but the last update will be stuck (after 10min it will be automatically removed)
DeliciousBluewhale87 could you restart the pod and ssh to the Host and make sure the folder /opt/clearml/agent
exists and there is not *.conf file in it ?
Oh task_id is the Task ID of step 2.
Basically the idea is, you run your code once (lets call it debugging / programming), that run creates a task in the system, the task stores the environment definition and the arguments used. Then you can clone that Task and launch it on another machine using the Agent (that basically will setup the environment based on the Task definition and will run your code with the new arguments). The Pipeline is basically doing that for you (i.e. cloning a task chan...
DefeatedOstrich93 many thanks I was able to reproduce it (basically newly added files caused git apply to fail)
Fix will be part of the next clearml-agent RC
Hi RipeGoose2
when I'm using the set_credentials approach does it mean the trains.conf is redundant? if
Yes this means there is no need for trains.conf , all the important stuff (i.e. server + credentials you provide from code).
BTW: When you execute the same code (i.e. code with set_credentials call) the agent's coniguration will override what you have there, so you will be able to run the Task later either on prem/cloud without needing to change the code itself ๐
GrievingTurkey78 please feel free to send me code snippets to test ๐
LOL totally ๐
It seems to follow a structure specific to clearml,
Actually plotly.js ๐
Nice!!!
Are you aware of a limitation of "/events.get_task_events" preventing from fetching some of the images stored on the server
Are you saying you see them in the UI, but cannot access them via the API ?
(this would be strange as the UI is firing the same API requests to the back end)
Hi EnviousStarfish54
Color coding on the entire UI is stored per user (I think that on your local cookies, but I might be wrong). Anyhow any title/series combination will have the select color regardless of the project.
This way you can configure once that loss is red and accuracy is green, etc.
Thanks!
In the conf file, I guess this will be where ppl will look for it.
is there a way to visualize the pipeline such that this step is โstuckโ in executing?
Yes there is, the pipelline plot (see plots section on the Pipeline Task, will show the current state of the pipeline.
But I have a feeling you have something else in mind?
Maybe add Tag on the pipeline Task itself (then remove it when it continues) ?
I'm assuming you need something that is quite prominent in the UI, so someone knows ?
(BTW I would think of integrating it with the slack monitor, to p...
Hi JitteryCoyote63
Yes I think you are correct, since torch is installed automatically as a requirement by pip, the agent is not aware of it, so it cannot download the correct one.
I think the easiest is just to add the torch as additional package# call before Task.init() Task.add_requirements(package_name="torch", package_version="==1.7.1")
it is just local copy so you can rerun and reconfigure
The other way will not work, as if you start with "pip" you cannot fail ... (if you fail it's in run time which is too late)
I look forward to your response on Github.
Great, I would like to make this discussion a bit more open and accessible so GitHub is probably better
I'd like to start contributing to the project...
That will be awesome!
at the end it's just another env var
It should work GIT_SSH_COMMAND
is used by pip
You need to mount it to ~/clearml.conf
(i.e. /root/clearml.conf)
diff line by line is probably not useful for my data config
You could request a better configuration diff feature ๐ Feel free to add to GitHub
But this also mean I have to first load all the configuration to a dictionary first.
Yes ๐
maybe you can check alsoย
--version
ย that returns the helm menu
What do you mean? --version on cleaml-task ?
wouldn't it be possible to store this information in the clearml server so that it can be implicitly added to the requirements?
I think you are correct, and if we detect that we are using pandas to upload an artifact, we should try and make sure it is listed in the requirements
(obviously this is easier said than done)
And if instead I want to force "get()" to return me the path (e.g. I want to read the csv with a library that is not pandas) do we have an option for that?
Yes, c...
Container environment setup overhead?
Lately I've heard of groups that do slices of datasets for distributed training, or who "stream" data.
Hmm so maybe a "glob" alike parameter for get_local_copy(select_filter='subfolder/*')
?
I have one agent running on the machine. I also have only one task running. This
only
happens to us when we use pipelines
@<1724960468822396928:profile|CumbersomeSealion22> notice that when you are launching a pipeline you are actually running Two tasks, one is the "pipeline" itself (i.e. the logic) and one is the component in the pipeline (i.e. the step)
If you have one agent, I'm assuming what happens is the pipeline itself (the one that you launch on your machine)...
Hi @<1724960468822396928:profile|CumbersomeSealion22>
It starts the pipeline, logs that the first step is started, and then...does nothing anymore.
How many agents do you have running? by default an agent will run a Task per agent (unless executed with --services-mode which would allow it to run unlimited amount of parallel tasks)