In the UI you can see all the agents and their IDs
Then you can so
clearml-agent daemon --stop <agent id>
exactly! it is very cool to see it in action, and it really works very well, kudos for these guys
Something like the TYPE_STRING that Triton accepts.
I saw the github issue, this is so odd , look at the triton python package:
https://github.com/triton-inference-server/client/blob/4297c6f5131d540b032cb280f1e[…]1fe2a0744f8e1/src/python/library/tritonclient/utils/init.py
Nice! So out of curiosity why didn't it work this time and you had to do it manually?
Hi ContemplativeCockroach39
Assuming you wrap your model with a flask app (or using any other serving solution), usually you need:
Get the model Add some metrics on runtime performance package in a dockerGetting a pretrained model is straight forward one you know either the creating Task or the Model ID
` from clearml import Task, Model
model_file_from_task = Task.get_task(task_id).models['output'][-1].get_local_copy()
or
model_file_from_model = Model(model_id=<moedl_id>).get_local_copy()...
Hi @<1657918706052763648:profile|SillyRobin38>
I have included some print statements
you should see those under the Task of the inference instance.
You can also do:
import clearml
...
def preprocess(...):
clearml.Logger.current_logger().report_text(...)
clearml.Logger.current_logger().report_scalar(...)
, specifically within the containers where the inferencing occurs.
it might be that fastapi is capturing the prints...
[None](https://github.com/tiangolo/uvicor...
then when we triggered a inference deploy it failed
How would you control it? Is it based on a Task ? like a property "match python version" ?
Hi @<1569496075083976704:profile|SweetShells3>
These environment variable are injected into the new process, are you passing them on the vault?
None
But how do you specify the data hyperparameter input and output models to use when the agent runs the experiment
They are autodetected if you are using Argparse / Hydra / python-fire / etc.
The first time you are running the code (either locally or with an agent), it will add the hyper parameter section for you.
That said you can also provide it as part of the clearml-task command with --args
(btw: clearml-task --help will list all the options, https://clear.ml/docs/...
Do people generally update the same model “entry”? That feels so wrong to me…how do you reproduce a older model version or do a rollback etc?
Correct, they do not 🙂 On the Task itself the output models will reflect the diff filenames you saved, usually ppl just add a running number.
And you have the exact same folder structure / content, and server A/B give a different set of experiments ?
(is serverB empty, meaning no experiments at all?)
Could it be in a python at_exit event ?
btw: what's the OS and python version?
task.mark_completed()
You have that at the bottom of the script, never call it on yourself, it will kill the actual process.
So what is going on you are marking your own process for termination, then it terminates itself leaving the interpreter and this is the reason for the errors you are seeing
The idea of mark_* is to mark an external Task, forcefully.
By just completing your process with exit code (0) (i.e. no error) the Task will be marked as completed anyhow, no need to call...
HealthyStarfish45 if I understand correctly the trains-agent is running as daemon (i.e. automatically pulling jobs and executes them), the only point might be cancelling a daemon will cause the Task executed by that daemon to be canceled as well.
Other than that, sounds great!
I was thinking mainly about AWS.
Meaning S3?
Hi SharpHedgehog60
Task type is another way to declare the type of processing the Task performs.
Later you can filter based on the Task type (like you would with a Tag).
For example Datasets are always of a Type "data processing"
Thanks ElegantCoyote26 I'll look into it. Seems like someone liked our automagical approach 🙂
Interesting use case, do you already have the connect_configuration in the code? or do we need to somehow create it ?
is it normal that it's slower than my device even though the agent is much more powerful than my device? or because it is just a simple code
Could be the agent is not using the GPU for some reason?
WackyRabbit7 basically starting v1.1 if you are running code without any configuration file, you will get an error (in contrast to previous versions where it defaulted to the demo-server)
Hmm maybe this is the issue, :
Conda error: UnsatisfiableError: The following specifications were found to be incompatible with a past
explicit spec that is not an explicit spec in this operation (cudatoolkit):
- pytorch~=1.8.0 -> cudatoolkit[version='>=10.1,<10.2|>=10.2,<10.3']
This makes no sense, conda is saying pytorch=1.8 needs cudatoolkit <10.2/10.3 but actually it needs cudatoolkit 11.1
Hi GrotesqueOctopus42
In theory it can be built, the main hurdle is getting elk/mongo/redis containers for arm64 ...
Hi WittyOwl57
I think what happens is it auto-logs the joblib load/save calls, these calls track models used/created by the code, and attach them to the model repository representing these model.
I'm assuming there are multiple load/save , and there are multiple model instances pointing to the same local file "file:///tmp/..." . The earning basically says it is re-registering existing models.
Make sense ?
Hi @<1533620191232004096:profile|NuttyLobster9>base_task_factory is a function that gets the node definition and returns a Task to be enqueued ,
pseudo code looks like:
def my_node_task_factory(node: PipelineController.Node) -> Task:
task = Task.create(...)
return task
Make sense ?
task.connect is two way, it does everything for you:base_params = dict(param1=123, param2='text') task.connect(base_params) print(base_params)If you run this code manually, then print is exactly what you initialized base_params with. But when the agent is running it, it will take the values from the UI (including casting to the correct type), so print will result in values/types from the UI.
Make sense ?
Hi ContemplativeCockroach39
Seems like you are running the exact code as in the git repo:
Basically it points you to the exact repository https://github.com/allegroai/clearml and the script examples/reporting/pandas_reporting.py
Specifically:
https://github.com/allegroai/clearml/blob/34c41cfc8c3419e06cd4ac954e4b23034667c4d9/examples/reporting/pandas_reporting.py