Hi @<1649221394904387584:profile|RattySparrow90>
: Are the models I defined to be served e.g. via the CLI downloaded to the serving pod
Yes this is done automatically and online (i.e. when you update the using CLI/API) , based on the models/endpoints you set
So that they are physically lying there as a file I can see in the filesystem?
They are, and cached there
Or is it more the case that the pod gets the model when needed/when an API call for this model is incoming?
I...
As long as you import clearml on the main script, it should work. Regarding the Nvidia container, it should not interfere with any running processes, the only issue is memory limit. BTW any reason not to spin an agent on a dedicated machine? What is the gpu used for in the ckearml server machine?
PleasantGiraffe85 can you send examples of the different git repo links (one internal one public) ?
JitteryCoyote63
Yes this extremely annoying, I think it was updated on the community server, let me check if we deployed a new docker with a fix ...
RobustGoldfish9 I see.
So in theory spinning an experiment on an gent would be clone code -> build docker -> mount code -> execute code inside docker?
(no need for requirements etc.?)
can someone show me an example of howÂ
PipelineController.create_draft
I think the idea is to store a draft versio of the pipeline (not the decorator type, I think, but the one launching pre-executed Tasks).
GiganticTurtle0 I'm not sure I fully understand how / why you are using it, can you expand?
EDIT:
However, my intention is ONLY to create it to be executed later on.
Hmm so may like enqueue it?
GiganticTurtle0 fix was just pushed to GitHub 🙂pip install git+
Ex: Expecting value: line 1 column 1 (char 0)
K8S Glue pods monitor: Failed parsing kubectl output:
Run with --debug as the first parameter
Are you running the latest from the git repo ?
Also in the same open docker session, can you try:$LOCAL_PYTHON -m clearml_agent execute --disable-monitoring --id <task_id_here>
Where the Task ID is one of the failed executions (only reset it before)
Yes, you are too quick for the resource monitoring 🙂
Ohh sorry you will also need to fix the def _patched_task_function
The parameter order is important as the partial call relies on it.
This is odd, how are you spinning clearml-serving ?
You can also do it synchronously :
predict_a = self.send_request(endpoint="/test_model_sklearn_a/", version=None, data=data)
predict_b = self.send_request(endpoint="/test_model_sklearn_b/", version=None, data=data)
I'm assuming some package imports absl (the TF define package) and that's the reason you see the TF defines). Does that make sense?
Hi UnsightlyHorse88
Hmm, try adding to your clearml.conf file:agent.cpu_only = true
if that does not work try adding to the OS environmentexport CLEARML_CPU_ONLY=1
As a result, I need to do somethig which copies the files (e.g. cp -r or StorageManager.upload_folder(‘b’, ‘a’)
but this is expensive
You are saying the copy is just wasteful (but you do have the files locally)?
TenseOstrich47 FYI:
This might what you are looking for 🙂
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L61
Hmm this is odd in deed, let me verify (thanks! @<1643060801088524288:profile|HarebrainedOstrich43> )
Hi @<1556450111259676672:profile|PlainSeaurchin97>
You mean instead of the parallel coordinates ?
None
Hi @<1571308003204796416:profile|HollowPeacock58>
parameters = task.connect(config, name='config_params')
It seems that your DotDict does not support the python copy
operator?
i.e.
from copy import copy
copy(DotDict())
fails ?
, but what I really want to achieve is to share this code:
You mean to share the code between them, unless this is a "preinstalled" package in the container, each endpoint has it's own separate set of modules / files
(this is on purpose, so you could actually change them, just image diff versions of the same common.py file)
one of them has been named incorrectly and now I'm trying to remove it and it's not running anywhere,
Oh I see, meaning until it "times out".
You could search for it in the UI (based on the session ID) and abort/archive it
Hi @<1661542579272945664:profile|SaltySpider22>
Basically you need to put all of these files into a repository , which is always a good practice.
The reason is that the pipeline (and for that matter any Task on the system) can store wither a single script or a git reference, but not multiple scripts.
BTW: you can always set different config files by with an environment variable:CLEARML_CONFIG_FILE="path/to/cobfig/file
Ok..so I should generally avoid connecting complex objects? I guess I would create a 'mini dictionary' with a subset of params, and connectvthis instead.
In theory it should always work, but this specific one fails on a very pythonic paradigm (see below)
from copy import copy
an_object = copy(object)
A good rule of thumb is to connect any object/dict that you want to track or change later
Hi @<1571308003204796416:profile|HollowPeacock58>
could you share the full log ?
@<1571308003204796416:profile|HollowPeacock58> seems like an internal issue copying this object config.model
This is a complex object, and it seems that for some reason
None
As a workaround just do not connect this object. it seems you cannot pickle it / copy it (see GH issue)
When is clearml-deploy coming to the open source release?
Currently available under clearml-serving (more features are being worked on, i.e. additional stats and backends)
https://github.com/allegroai/clearml-serving