Reputation
Badges 1
25 × Eureka!Hi SubstantialElk6
Generically, we would 'export' the preprocessing steps, setup an inference server, and then pipe data through the above to get results. How should we achieve this with ClearML?
We are working on integrating the OpenVino serving and Nvidia Triton serving engiones, into ClearML (they will be both available soon)
Automated retraining
In cases of data drift, retraining of models would be necessary. Generically, we pass newly labelled data to fine...
Using agent v1.01r1 in k8s glue.
I think a fix was recently committed, let me check it
Hi FierceHamster54
I'm this is solvable, get in touch with them either in the contact form on the website or email support@clear.ml , should not be complicated to fix π
Hope you donβt mind linking to that repo
LOL π
Hi WickedGoat98
Will I need to wrap their execution in python by system calls?
That would probably be the easiest solution π
Then you can plug it into your pipeline as a preprocessing Task:
You can check this example:
https://github.com/allegroai/trains/tree/master/examples/pipeline
Hi AbruptHedgehog21
can you send the two models info page (i.e. the original and the updated one) ?
do you see the two endpoints ?
BTW: --version would add a version to the model (i.e. create a new endpoint with version "endpoint/{version}"
Yes, it's a bit confusing, the gist of it is that we wanted to have the ability to have diff configurations for diff buckets
Is it possible to substitute these steps using containers instead.
I'm not sure I follow, could you expand ?
DilapidatedDucks58 by default if you continue to execution, it will automatically continue reporting from the last iteration . I think this is what you are seeing
sorry that I keep bothering you, I love ClearML and try to promote it whenever I can, but this thing is a real pain in the assΒ
No worries I totally feel you.
As a quick hack in the actual code of the Task itself, is it reasonable to have:task = Task.init(....) task.set_initial_iteration(0)
SmallDeer34 I have to admit this reference is relatively old, maybe we should update to auther http://clearml.ml (would that make sense ?)
Hi @<1541954607595393024:profile|BattyCrocodile47>
Does clearML have a good story for offline/batch inference in production?
Not sure I follow, you mean like a case study ?
Triggering:
We'd want to be able to trigger a batch inference:
- (rarely) on a schedule
- (often) via a trigger in an event-based system, like maybe from AWS lambda function(2) Yes there is a great API for that, checkout the github actions it is essentially the same idea (RestAPI also available) ...
Hi GrievingTurkey78
First, I would look at the CLI clearml-data
as a baseline for implementing such a tool:
Docs:
https://github.com/allegroai/clearml/blob/master/docs/datasets.md
Implementation :
https://github.com/allegroai/clearml/blob/master/clearml/cli/data/main.py
Regrading your questions:
(1) No, a new dataset version will only store the diff from the parent (if files are removed it stored the metadata that says the file was removed)
(2) Yes any get operation will downl...
Hi FiercePenguin76
It seems it fails detecting the notebook server and thinks this is a "script running".
What is exactly your setup?
docker image ?
jupyter-lab version ?
clearml version?
Also are you getting any warning when calling Task.init ?
Is there a helper function option at all that means you can flush the clearml-agent working space automatically, or by command?
Every Task execution the agent clears the venv (packages are cached locally, but the actual venv is cleared). If you want you can turn on the venv cache, but there is no need to manually clear the agent's cache.
Thanks VivaciousPenguin66 !
BTW: if you are running the local code with conda, you can set the agent to use conda as well (notice that if you are running locally with pip, the agent's conda env will use pip to install the packages to avoid version mismatch)
FiercePenguin76
So running the Task.init from the jupyter-lab works, but running the Task.init from the VSCode notebook does not work?
EnviousStarfish54 generally speaking the hyper parameters are flat key/value pairs. you can have as many sections as you like, but inside each section, key/value pairs. If you pass a nested dict, it will be stored as path/to/key:value (as you witnessed).
If you need to store a more complicated configuration dict (nesting, lists etc), use the connect_configuration, it will convert your dict to text (in HOCON format) and store that.
In both cases you can edit the configuration and then when ru...
Hmm so VSCode running locally connected to the remote machine over the SSH?
(I'm trying to figure out how to replicate the setup for testing)
okay, let me check it, but I suspect the issue is running over SSH, to overcome these issues with pycharm we have specific plugin to pass the git info to the remote machine. Let me check what we can do here.
FiercePenguin76 BTW, you can do the following to add / update packages on the remote sessionclearml-session --packages "newpackge>x.y" "jupyterlab>6"
diff line by line is probably not useful for my data config
You could request a better configuration diff feature π Feel free to add to GitHub
But this also mean I have to first load all the configuration to a dictionary first.
Yes π
Hi EnviousStarfish54
I think this is what you are after
task.connect_configuration(my_dict_here, name='my_section_name')
BTW:
if you do task.connect(a_flat_dict, name='new section') you will have the key/value in a section name called "new section"
Notice that the StorageManager has default configuration here:
https://github.com/allegroai/trains/blob/f27aed767cb3aa3ea83d8f273e48460dd79a90df/docs/trains.conf#L76
Then a per bucket credentials list, with detials:
https://github.com/allegroai/trains/blob/f27aed767cb3aa3ea83d8f273e48460dd79a90df/docs/trains.conf#L81
Thanks EnviousStarfish54 we are working on moving them there!
BTW, in the mean time, please feel free to open GitHub issue under train, at least until they are moved (hopefully end of Sept).
that machine will be able to pull and report multiple trials without restarting
What do you mean by "pull and report multiple trials" ? Spawn multiple processes with different parameters ?
If this is the case: the internals of the optimizer could be synced to the Task so you can access them, but this is basically the internal representation, which is optimizer dependent, which one did you have in mind?
Another option is to pull Tasks from a dedicated queue and use the LocalClearMLJob ...
Or use python:3.9 when starting the agent
This is probably the best solution π
Hi DefeatedCrab47
Not really sure, and it smells bad ...
Basically you can always use the TB logger, and call Task.init.