Reputation
Badges 1
25 × Eureka!SubstantialElk6 when you say "Triton does not support deployment strategies" what exactly do you mean?
BTW: updated documentation already up here:
https://clear.ml/docs/latest/docs/clearml_serving/clearml_serving
Hi ReassuredOwl55
a dialogue box that opens with a βdeletingβ bar swishing across, but then it just hangs, and becomes completely unresponsive
I believe this issue was fixed in the latest server version, seems like you are running 1.7 but the latest is 1.9.2. May I suggest an upgrade ?
Hi @<1541954607595393024:profile|BattyCrocodile47>
It seems to me that instead of implementing webhooks to react to things like adding a tag to a model
Did you look at this example ?
None
Can we straightforwardly stream ALL ClearML events to another system?
what would you consider an event?
The "basic" object type is Task, a state in task is changed via an api call, would that be an e...
Does what you suggested here >
Yes, it is basically the same underlying mechanism, only instead of 1-to-1 it's 1-to-many
what is the best approach to update the package if we have frequent update on this common code?
since this package has an indirect affect on the model endpoint, I would package with the preprocess code of the endpoint.
Each server is updating it's own local copy, and it will make sure it can take it and deploy it hand over hand without breaking its ability to serve these endpoints.
the "wastefulness" of holding multiple copies is negligible when comparing to a situation where everyone ...
But I believe it would be harder for our team to detect and respond to failures in the event handler functions if they were placed there because it seems unclear how we could use our existing systems and practices to do that.
Okay I think this is the issue, handler functions
are not "supposed" to fail, they are supposed to trigger Tasks, these can fail.
e.g.:
Model Tag Trigger -> handler function creates a Task -> Task does something, like build container, trigger CI/CD etc -> Task...
Makes total sense!
Interesting, you are defining the sub-component inside the function, I like that, this makes the code closer to how this is executed!
UnevenDolphin73 FYI: clearml-data is documented , unfortunately only in GitHub:
https://github.com/allegroai/clearml/blob/master/docs/datasets.md
I called task.wait_for_status() to make sure the task is done
This is the issue, I will make sure wait_for_status() calls reload at the ends, so when the function returns you have the updated object
is it normal that it's slower than my device even though the agent is much more powerful than my device? or because it is just a simple code
Could be the agent is not using the GPU for some reason?
So clearml server already contains an authentication layer (JWT Token), and you do have a full user management on top:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#web-login-authentication
Basically what I'm saying if you add httpS on top of the communication, and only open the 3 ports, you should be good to go. Now if you really need SSO (AD included) for user login etc, unfortunately this is not part of the open source, but I know they have it in the scale/ent...
I want the model to be stored in a way that clearml-serving can recognise it as a model
Then OutputModel or task.update_output_model(...)
You have to serialize it, in a way that later your code will be able to load it.
With XGBoost, when you do model.save clearml automatically picks and uploads it for you
assuming you created the Task.init(..., output_uri=True)
You can also manually upload the model with task.update_output_model or equivalent with OutputModel class.
if you want to dis...
A single query will return if the agent is running anything, and for how long, but I do not think you can get the idle time ...
It is http btw, i don't know why it logged https://
This is odd could it be it automatically forwards to https ?
I would try the certificate check thing first
can i run a random task from a queue? like thisΒ
clearml-agent execute --id <TASK_ID>
Β or
ChubbyLouse32 This will just work out of the box π
No need to enqueue the Task, just reset it (in the UI)
Hi SubstantialElk6
you can do:from clearml.config import config_obj config_obj.get('sdk')
You will get the entire configuration tree of the SDK section (if you need sub sections, you can access them with '.' notation, e.h. sdk.storage
)
Hi JitteryCoyote63 , I have to admit, we have not thought of this scenario... what's the exact use case to clone a Task and change the type?
Obviously you can always change the task type, a bit of a hack but should work:task._edit(type='testing')
and do you have import tensorflow in your code?
EnviousStarfish54 something is also off in the git detection, it has not remote address, it just says "origin"
Any chance you have no git server ?
Regrading the installed packages, any chance you can send a sample code for me to debug ?
So this should be easier to implement, and would probably be safer.
You can basically query all the workers (i.e. agents) and check if they are running a Task, then if they are not (for a while) remove the "protection flag"
wdyt?
So when the agent fire up it get's the hostname, which you can then get from the API,
I think it does something like "getlocalhost", a python function that is OS agnostic
Hi ExuberantParrot61 the odd thing is this, message
No repository found, storing script code instead
when you are actually running from inside the repo... (
is it saying that on a specific step, or is it on the pipeline logic itself?
Also any chance you can share the full console output ?
BTW:
you can manually specify a repo branch for a step:
https://github.com/allegroai/clearml/blob/a492ee50fbf78d5ae07b603445f4983feb9da8df/clearml/automation/controller.py#L2841
Example:
https:/...
We are always looking for additional talented people π DM me...
@<1523701083040387072:profile|UnevenDolphin73> it's looking for any of the files:
None
Still I wonder if it is normal behavior that clearml exits the experiments with status "completed" and not with failure
Well that depends on the process exit code, if for some reason (not sure why) the process exits with return code 0, it means everything was okay.
I assume this is doing something "Detected an exited process, so exiting main" this is an internal print of your code, I guess it just leaves the process with exitcode 0
Hi MagnificentSeaurchin79
Unfortunately there is currently no way to reorder the plots, but you have a valid point. I suggest a GitHub UX issue ?
Regrading the debug samples, the difference is that the confutation matrix report is actually metadata, you can get these numbers by the API or the download, but the debug samples are static images ...
BTW: you can try to produce an interactive side by side confusion matrix with plotly, and use report_plotly_figure
JitteryCoyote63
IAM role to the web app could access
you mean the web client key/secret to access S3 data ?
TroubledHedgehog16
but doesn't run when I deploy it using clearml. Here's the log of the error:
...
My guess is that clearml is reimporting keras somewhere, leading to circular dependencies.
It might not be circular, but I would guess it does have something to do with order of imports. I'm trying to figure out what would be the difference between local run and using an agent
Is it the exact same TF version?