Reputation
Badges 1
25 × Eureka!yeah I tend to agree... keep me posted hen you find the root cause π€
Hi @<1691620877822595072:profile|FlutteringMouse14>
Yes, feast has been integrated by at least a couple if I remember correctly.
Basically there are two ways offline and online feature transformation. For offline your pipeline is exactly what would be recommended. The main difference is online transformation where I think feast is a great start
Yes I was thinking a separate branch.
The main issue with telling git to skip submodules is that it will be easily forgotten and will break stuff. BTW the git repo itself is cached so the second time there is no actual pull. Lastly it's not clear on where one could pass a git argument per task. Wdyt?
Hi JumpyPig73
import data from old experiments into the dashboard.
what do you mean by "old experiments" ?
Because submodules inside a git are basically a requirement for a git repo to run. Skipping over a few or selecting manually will break the agent. That said maybe shallow clone might be easier or faster. Regardless it should be an environment passed per Task. Feel free to add a GH issue request, if this is not a unique edge case we will add it
Hi @<1562973083189383168:profile|GrievingDuck15>
Thanks for noticing, yes the api is always versioned, we should make it clear in the docs. Also if you need the latest one use version 999 , it will default to the latest one it can support
Hi @<1694157594333024256:profile|DisturbedParrot38>
You mean how to tell the agent to pull only some submodules of your git?
If this is the case you can actually remove them on your git branch, submodule is a file with a soft link. Wdyt?
- yes they will! This is exactly the idea :)
- yes it will store it as text file (as is raw text) notice the return value is the file you should open. This is because when running via agent the return file will contain the conf file from the UI. Make sense?
Omg that's a lot of submodules!
It has nothing with what the tasks sees if you are inside a git repo you will have to cone it on the remote machine. Let me check in the code maybe you have a workaround
I have a process that cleans theΒ
/tmp
Β each day,
WackyRabbit7 the files (configuration etc.) that are mapped into the containers are stored there.
They should clean themselves, that said, we have noticed that the services-mode skips this cleanup, and it will be solved on the next RC of clearml-agent.
Make sense ?
Suppose that I have three models and these models can't be loaded simultaneously on GPU memory(
Oh!!!
For now, this is the behavior I observe: Suppose I have two models, A and B. ....
Correct
Yes this is a current limitation of the Triton backend BUT!
we are working on a new version that does Exactly what you mentioned (because it is such a common case where in some cases models are not being used that frequently)
The main caveat is the loading time, re-loading models from dist...
Okay we got to the bottom of this. This was actually because of the load balancer timeout settings we had, which was also 30 seconds and confusing us.
Nice!
btw:
in the clearml.conf we put this:
for future reference, you are missing the sdk section:
sdk.http.timeout: 300
.
notation works as well as {}
Hi @<1658281093108862976:profile|EncouragingPenguin15>
Should work, I'm assuming multiple nodes are running agents ? or are you saying Ray spins the jobs and clearml logs them ?
Hi CloudySwallow27
Is there a way to still use the auto_connect but limit the amount of debug imgs?
Basically you can set the number of image it will store for you (per title/series combination)m the way it works it rotates the image names so essentially overriding old images (the UI is ware and will only show the last X of them)
See here on setting it:
https://github.com/allegroai/clearml/blob/81de18dbce08229834d9bb0676446a151046e6a7/docs/clearml.conf#L32
Hi MotionlessCoral18
You can set all mount points here:
https://github.com/allegroai/clearml-agent/blob/6e31171d314a6e9b276c36d45314025783956b00/docs/clearml.conf#L241
It uses only one CPU core, could I use multiprocessing somehow?
Hi EcstaticMouse10
Hmm, yes it should be multi core:
https://github.com/allegroai/clearml/blob/a9774c3842ea526d222044092172980ae505e24f/clearml/datasets/dataset.py#L1175
wdyt?
Switching to process Pool might be a bit of an overkill here (I think)
wdyt?
What sort of data would be stored in the
venvs-build
folder?
ClumsyElephant70 temporary (lifetime of the task execution) virtual environment, including the code etc. It is deleted and recreated for every new task launched (or restored from cache, if venvs_cache is enabled)
Hi @<1610083503607648256:profile|DiminutiveToad80>
Yes, it does. They are also cached by default (on the machine with the agent)
None
Sounds good to me, adding it to the to do list, probably should not be very complicated to add π
Hi ScantChimpanzee51
In order to get it to work:conf_file = "options.yml" conf_file = task.connect_configuration(conf_file, "Yaml options") with open(conf_file, "r") as f: ...
The reason is it will not overwrite the local file but return a temp file for you to read.
And come to think about it, maybe we should add an argument saying, it should allow it to overwrite the local file, wdyt?
Hi SteepCockroach81CLEARML_CONFIG_FILE
point to the configuration file being used
See here:
https://clear.ml/docs/latest/docs/configs/env_vars#server-connection
Lol yeah Hydra is great. Notice you still have the ability to override Hydra from the UI so you really have the best of the two worlds
I double checked the code it's always being passed π
Oh that is odd. Is this reproducible? @<1533620191232004096:profile|NuttyLobster9> what was the flow that required another task.init?
We are working on 1.3.0 so this is right in time
This is assuming you can just run two copies of your code, and they will become aware of one another.
, when I am running the pipeline remotely is there a way the remote machine can access it?
Well for the dataset to be accessible, you need to upload it with Dataset class, then the remote machine can do Dataset.get(...).get_local_copy() to get the actual data on the remote machine
Hmm, so currently you can provide help, so users know what they can choose from, but there is no way to limit it.
I know the Enterprise version has something similar that allows users to create a custom "application" from a Task, there you can define a drop and as such, but that might be an overkill here, wdyt?
In the UI you can see all the agents and their IDs
Then you can so
clearml-agent daemon --stop <agent id>