
Reputation
Badges 1
25 × Eureka!I cannot test it at the moment, hence my question.
JuicyFox94 any chance you can blindly approve ?
Merged, is it working for you now?
Yes π https://discuss.pytorch.org/t/shm-error-in-docker/22755
add either "--ipc=host" or "--shm-size= 8g " to the docker args (on the Task or globally in the clearml.conf extra_docker_args)
notice the 8g depends on the GPU
Hmm good question, I'm actually not sure if you can pass 24GB (this is not a limit on the GPU memory, this affects the memblock size, I think)
MagnificentSeaurchin79 you can delay it with:task.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
I did change the
instead of 8080?
So this is the issue
link to the line please π
I want each remote task to execute one instance of the hydra multirun, but I suspect the remote will try to run the full multirun by itself
if config.clearml.remote and task.running_locally(): task.execute_remotely( queue_name=config.clearml.queue_name, clone=True, exit_process=False ) return
I think this ensures the local execution actually triggers the remote one, so it should be as you expect, no?
When is clearml-deploy coming to the open source release?
Currently available under clearml-serving (more features are being worked on, i.e. additional stats and backends)
https://github.com/allegroai/clearml-serving
agree, but setting the agentβs env variable TMPDIR
I think this needs to be passed to the docker with -e TMPDIR=/new/tmp
as additional container args:
see example
None
wdyt?
HarebrainedBear62 this is what I have.
clearml-data will store all the files for you, and version the entire thing, make is a breeze to abstract the dataset from the code. Querying data is available using Apache Drill (though currently it is still not built into the platform, but we are planning to get there soon) Since this is Image based data/meta-data, I know the paid tier of ClearML, has n additional dedicated data management solution specifically for images, with full ability to query m...
DefiantHippopotamus88 you can create a custom endpoint and do that, but it will be running I the same instance , is this what you are after? Notice that Triton actually supports it already, you can check the pytorch example
When you say status, what do you mean? Is it active? Running a task?
Actually with
base-task-id
it uses the cached venv, thanks for this suggestion! Seems like this is equivalent to cloning via UI.
exactly !
But βcloningβ via UI runs an exact copy of the code/config, not a variant,
You can override the commit/branch and get the latest ...
run exp tweak code/configs in IDE, or tweak configs via CLI have it re-rerun in exact same venv (with no install overhead etc)So you can actually launch it remotely directly from the code:
...
So for this...
Sorry, what is exactly "this" ?
Hi EagerOtter28
Let's say we query another time and get 60k images. Now it is not trivial to create a new dataset B but only upload the diff: ...
Use Dataset.sync (or clearml-data sync) to check which files where changed/added.
All files are already hashed, right? I wonder whyΒ
clearml-data
Β does not keep files in a semi-flat hierarchy and groups them together to datasets?
It kind of does, it has a full listing of all the files with their hash (SHA2) values, ...
I'm not able to compare the tables of two experiments, is it a known issue ?
How so? they should appear one next to the other, the content of the two tables is not "really" compared, the standard is too complicated π (apparently this is far from trivial)
Hi @<1729309120315527168:profile|ShallowLion60>
Clearml in our case installed on k8s using helm chart (version: 7.11.0)
It should be done "automatically", I think there is a configuration var in the helm chart to configure that.
What urls are you urls seeing now, and what should be there?
SoreDragonfly16 as SmallDeer34 mentioned, you can iterate over the Tasks, pull metrics (with either task.get_last_scalar_metrics
or task.get_reported_scalar
) then report them on the Task that runs the Loop itself with the Logger.
wdyt?
SoreDragonfly16 could you reproduce the issue?
What's your OS? trains versions?
Seems like pip package install issue of a sort
but this would be still part of the clearml.conf right?
You can pass it per Task , also you can configure the agent to always pass it add this env.
https://github.com/allegroai/clearml-agent/blob/5a080798cb4292e198948fbe16cba70136cb6bdf/docs/clearml.conf#L137
Hi ClumsyElephant70
extra_docker_shell_script: ["export SECRET=SECRET", ]
I think ${SECRET}
will not get resolved you have to specifically have text value there.
That said it is a good idea to resolve it if possible, wdyt?
I like this approach more but it still requires resolved environment variables inside the clearml.conf
Yes π maybe this is a feature request ?
so that one app I am using inside the Task can use the python packages installed by the agent and I can control the packages using clearml easily
That's the missing part for me, You have all the requiremnts on the Task (that you can fully control), the agent is setting a brand new venv for each Task inside a container (the venv is cahced, and you can also make the agent just use the default python without installing anything). The part where I'm lost is why would you need the path to t...
JitteryCoyote63 you mean in runtime where the agent is installing? I'm not sure I fully understand the use case?!
in my repo I maintain a bash script to setup a separate python env.
Hmm interesting, now I have to wonder what is the difference ? meaning why doesn't the agent build a similar one based on the requirements ?