Reputation
Badges 1
25 × Eureka!Sorry @<1657918706052763648:profile|SillyRobin38> I missed this reply
Is ClearML-Serving using either System or CUCA shared memory? O
This needs to be set on the docker-compose:
and I think this line actually includes ipc: host which means there is no need to set the shm_size, but you can play around with it and let me know if you see a difference
[None](https://github.com/allegroai/clearml-serving/blob/7ba356efc97a6ae2159283d198d981b3c1ab85e6/docker/docker-compose-triton-gpu.yml#L1...
Thanks DilapidatedDucks58 ! We β€ suggestions for improvements π
Did you try to print a page using the browser (I think that they can all store it as pdf these days) Yes I agree, it would π we have some thoughts on creating plugins for the system, I think this could be a good use-case. Wait a week or two ;)
Hi @<1547028116780617728:profile|TimelyRabbit96>
Notice that if running with docker compose you can pass an argument to the clearml triton container an use shared mem. You can do the same with the helm chart
Hi @<1547028116780617728:profile|TimelyRabbit96>
Trying to do model inference on a video, so first step in
Preprocess
class is to extract frames.
Basically this depends on the RestAPI, usually would will be sending a link to data to be processed and returned Synchronously
What you should have a custom endpoint doing the extraction, send Raw data into another endpoint doing the model inference, basically think "pipeline" end points:
[None](https://github.com/allegro...
can we use a currently setup virtualenv by any chance?
You mean, if the cleamrl-agent needs to setup a new venv each time? are you running in docker mode ?
(by default it is caching the venv so the second time it is using a precached full venv, installing nothing)
So actually while weβre at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).
Is this being returned from your Triton Model? or the pre/post processing code?
One issue that I see is that the Dockerfile inside the agent container
Not sure I follow, these are settings for the default container to be used when the agent spins a Task for you.
How are you running the agent itself ?
Hi WackyRabbit7
First always check the functions on the Task object, they are the most straight forward access to the system.
Then if you need general purpose API calls, currently they are only documented in the doc-string of the API schema (that said it should be quite documented)
You can check all the endpoints https://github.com/allegroai/trains/tree/master/trains/backend_api/services/v2_8
And finally if you want to easily use the RestAPI :
` from trains.backend_api.session.client impo...
DistressedGoat23
We are running a hyperparameter tuning (using some cv) which might take a long time and might be even aborted unexpectedly due to machine resources.
We therefore want to see the progress
On the HPO Task itself (not the individual experiments the one controlling it all) there is the global progress of the optimization metric, is this what you are looking for ? Am I missing something?
but can it NOT use /tmp for this iβm merging about 100GB
You mean to configure your Temp folder for when squashing ?
you can do hack the following:
` import tempfile
tempfile.tempdir = "/my/new/temp"
Dataset squash
tempfile.tempdir = None `But regradless I think this is worth a GitHub issue with feature request, to set the temp folder///
Thanks GrievingTurkey78
Sure just PR (should work with any Python/Hydra version):kwargs['config']=config kwargs['task_function']=partial(PatchHydra._patched_task_function, task_function,) result = PatchHydra._original_run_job(*args, **kwargs)
ExcitedFish86 this is a general "dummy agent" that tasks and executes them (no env created, no code cloned, as you suggested)
hows does this work with HPO?
The HPO clones Tasks, changes arguments, push them into a queue, and monitors the metrics in real time. The missing part (from my understanding) was the the execution of the Tasks themselves required setup, and that you wanted multiple machine support, in order to overcome it, I post a dummy agent that just runs the Tasks.
(Notice...
Hey GiganticTurtle0 ,
So basically the issue is the the pipeline function ( prediction_service ) is getting a dict as input, and it is expecting to get basic types... if you were to do the following, it would have worked as expected.prediction_service(**default_config)I will make sure we flatten any dictionary so that we end up with config/start , instead of a serialized version of the dict.
wdyt?
Hi WickedGoat98
I try to write an article on medium about ClearML and face some a problem with plotly figures.
This is awesome !
I ran the plotly_reporting.py example locally and the uploaded plot was ok.
So are you saying the same example code from the repository worked okay on your server but showed nothing on the hosted server ?
Yeah the hack would work but iβm trying to use it form the command line to put in airflow. Iβll post on GH
Oh, then set TMP/TMPDIR environment variable, it should have the same effect
Yes, found the issue :) I'll see to it there is a fix in the next RC. ETA early next week
GrittyStarfish67
I do not wish for data duplication. Any Idea how to do this with clearml-data CLI/GUI/python?
At least in theory creating a new version with parents from multiple Datasets should just work out of the box.
wdyt?
JitteryCoyote63 I meant to store the parent ID as another "hyper-parameter" (under its own section name) not the data itself.
Makes sense ?
PompousBeetle71 so basically exclude parameters that are considered "local" only, so that other people will not accidentally use them?
Make sense. BTW: you can manually add data visualization to a Dataset with dataset.get_logger().report_table(...)
basically the default_output_uri will cause all models to be uploaded to this server (with specific subfolder per project/task)
You can have the same value there as the files_server.
The files_server is where you have all your artifacts / debug samples
There was an issue in some versions where seeborn plots were blank. Is that the case?
Ohh so even easier:print(client.workers.get_all())
Hi @<1619867994005966848:profile|HungryTurtle13>
I'm using Python's joblib library and the Parallel class to run an experiment in multiple parallel threads.
I believe joblib creates subprocesses not threads, but yes you are correct,
Basically once Task.init is called, every forked/spawned process will be automatically logged to the main process Task (you can, and probably should call either Task.init or Task.current_task() from the forked processes, but this is just a detial)
The mai...
Hi JitteryCoyote63 report_frequency_sec=30. controller how frequently monitoring events are sent to the server, default is every 30 seconds (you can change the UI display to wall-time to review). You can change it to 180 so it will only send an event every 3 minutes (for example).
sample_frequency_per_sec is the sampling frequency it uses internally, then it will average the results over the course of the report_frequency_sec time window, and send the averaged result on the repo...
do I still need to specify a OutputModel
No need, only if you want to upload a local model file (but I assume in this case, no new model is created)
Okay this more complicated but possible.
The idea is to write a glue layer (service) that pulls from the (i.e UI) queue
sets the slurm job
and puts it in a pending queue (so you know the job s waiting in the slurm scheduler)
There is a template here:
https://github.com/allegroai/trains-agent/blob/master/trains_agent/glue/k8s.py
I would love to help and setup a slurm glue in a similar manner
what do you think?