Disable automatic model uploads
Disable the auto uploadtask = Task.init(..., auto_connect_frameworks{'pytorch': False})
you mean to spin a pod with the agent inside it (daemon in services mode).
Or connect the services queue to the k8s cluster (i.e. define the pod template that uses cpu with not a lot of ram)?
Hi ColossalAnt7
Following on SuccessfulKoala55 answer
I saw that there is a config file where you can specify specific users and passwords, but it currently requires
- mount the configuration file (the one holding the user/pass) into the pod from a persistent volume .
I think the k8s way to do this would be to use mounted config maps and secrets.
You can use ConfigMaps to make sure the routing is always correct, then add a load-balancer (a.k.a a fixed IP) for the users a...
Hi OddAlligator72
for instance - remove all the metrics from some step onward?
(I think that as long as the Task is not published you could do such a thing directly with the RestAPI (aka APIClient from python)
What's the use case?
I'm not sure how the helm is built but do we have a "services queue" on the helm?
Hi FiercePenguin76
should return all datasets from all projects?
Correct 🙂
TenseOstrich47 it's based on free "index" so the first index not in used will be captured, but if you remove agents, then the order will change e.g. you take down worker #1 , the next worker you spin will be #1 becuase it is not taken)
BTW: this is probably more efficient than pickling
https://pandas.pydata.org/pandas-docs/version/1.1.5/reference/api/pandas.DataFrame.to_parquet.html
NastyOtter17
Usually the first report will happen after 30 seconds, could that be the difference ?
For example, ServerA stores file at /opt/clearml but ServeB stores at /some_path/clearml
As long as you adjust your docker-compose yaml file, should be just fine
BurlyRaccoon64 by default if .ssh exists in the host user folder it should mount it to the container (actually mount a copy of it). do you have a log of two tasks from two diff machines, one failing one passes? because this is quite odd (assuming the setup itself is identical)
Do you think this is better ? (the API documentation is coming directly from the python doc-string, so the code will always have the latest documentation)
https://github.com/allegroai/clearml/blob/c58e8a4c6a1294f8acec6ed9cba81c3b91aa2abd/clearml/datasets/dataset.py#L633
So the original looks good, could it be you tried to clone a Task that was executed with an agent with pip, and then pushed into an agent running conda?
ClumsyElephant70 yes there is 🙂clearml-agent build --id <task id> --target <folder>
(I might have a typo there, but you can basically check the full help clearml-agent build --help
)
Yeah we should definitely have get_requirements 🙂
it is a pickle issue
‘package model doesn’t exist’
Sounds like it, why do you think clearml
has anything there ?
BTW:
import_bind
.
__patched_import3
this is just so when packages that clearml autoconnects with are patched if imported After Task.init was called.
TrickySheep9 is this a conda package or a wheel you are installing manually ?
This will set more time before the timeout right?
Correct.
task.freeze_monitor()
download()
task.defrost_monitor()
Currently there isn't, but that's a good ides.
What would be the argument of using it vs increasing the timeout ?
btw: setting the resource timeout to 99999 will basically mean that it will wait until the first reported iteration, Not that it will just sleep for 99999sec 🙂
@<1571308003204796416:profile|HollowPeacock58> seems like an internal issue copying this object config.model
This is a complex object, and it seems that for some reason
None
As a workaround just do not connect this object. it seems you cannot pickle it / copy it (see GH issue)
Lambda’s are designed to be short-lived, I don’t think it’s a fine idea to run it in a loop TBH.
Yeah, you are right, but maybe it would be fine to launch, have the lambda run for 30-60sec (i.e. checking idle time for 1 min, stateless, only keeping track inside the execution context) then take it down)
What I'm trying to solve here, is (1) quick way to understand if the agent is actually idling or just between Tasks (2) still avoid having the "idle watchdog" short lived, to that it can...
but I am think they done it for a reason no?
Not a very good one, they just installed everything under the user and used --user for the pip.
It really does not matter inside a docker, the only reason one might want to do that is if you are mounting other drives and you want to make sure they are not accessed with "root" user, but with 1000 user id.
Thanks DilapidatedDucks58 ! We ❤ suggestions for improvements 🙂
Did you try to print a page using the browser (I think that they can all store it as pdf these days) Yes I agree, it would 🙂 we have some thoughts on creating plugins for the system, I think this could be a good use-case. Wait a week or two ;)
So the agent installed okay. It's the specific Task that the agent is failing to create the environment for, correct?
if this is the case, what do you have in the "Installed Packages" section of the Task (see under the Execution tab)
how to make sure it will traverse only current package?
Just making sure there is no bug in the process, if you call Task.init in your entire repo (serve/train) you end up with "installed packages" section that contains all the required pacakges for both use cases ?
I have separate packages for serving and training in a single repo. I don’t want serving requirements to be installed.
Hmm, it cannot "know" which is which, because it doesn't really trace all the import logs (this w...