Hi @<1547028116780617728:profile|TimelyRabbit96>
You are absolutely correct, we need to allow to override configuration
The code you want to change is here:
None
You can try:
channel = self._ext_grpc.aio.insecure_channel(triton_server_address, options=dict([('grpc.max_send_message_length', 512 * 1024 * 1024), ('grpc.max_receive_message_len...
btw:# in another process
How do you spin the subprrocess, is it with Popen ?
also what's the OS and python version you are using?
Hi @<1547028116780617728:profile|TimelyRabbit96>
Start with the simple scikit learn example
https://github.com/allegroai/clearml-serving/tree/main/examples/sklearn
The pipeline example is more complicated, it needs the base endpoints, start simple 😃
And the agent continue running.
oh just kill al the processes with clearml-agent
in the cmd line
pkill -9 -f clearml-agent
In the UI you can see all the agents and their IDs
Then you can so
clearml-agent daemon --stop <agent id>
Could I use "register artifact"
I think this is somewhat deprecated and we should probably replace it with something similar to what you mentioned (i.e. watch a file change).
Right now the easiest way would e to manually upload the trainer_state.json
every checkpoint:Task.current_task().upload_artifact('trainer_state.json
, name='state') `
training loop is within line 469, I think.
I think the model state is just post training loop (not inside the loop), no?
Basic setup:
glues service per "job template" (e.g. k8s resources, for example cpu requirement, or gpu requirement).
queue per glue service, e.g. cpu_machine
queue, and 1xGPU
queue
wdyt?
MortifiedDove27 did you update to the latest cleaml python package ?
Hi SubstantialElk6
noted that clearml-serving does not support Spacy models out of the box and
So this is a good point.
To add any pissing package to the preprocessing docker you can just add them in the following environment variable here: https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/docker/docker-compose.yml#L83EXTRA_PYTHON_PACKAGES="spacy>1"
Regrading a custom engine, basically this is supported with --engine custom
you c...
Can I assume that if we have two agents spinning the same experiment, your code will take it from there?
Is this true ?
time-based, dataset creation, model publish (tag),
Anything you think is missing ?
This line 🙂
None
Notice Triton (and so is clearml-serving) needs the pytorch model to be converted into torchscript, so that the triton backend can load it
Could you manually configure the ~/trains.conf ?
(Just copy paste the section from the UI)
then try to run:trains-agent list
None of them is problematic, this is what I'm trying to say 🙂
I think the minio browser gets confused.
if you want to test the upload time on the client you can try:task.flush(wait_for_uploads=True) tic = time() task.upload_artifact('test', '/tmp/localfile') task.flush(wait_for_uploads=True) print(time() - tic)
There is some overhead, but it should be negligible.
Thanks MuddyCrab47 !!!
I found it!
It turns out the artifact upload will always upload from stream (aka no multi-upload). I will make sure we fix it in the next RC (I think the plan is to have it out this weekend)
It should work 🙂 as long as the versions match, if they don't the venv will install the version you need (which is great, only penalty is the install, download wise it will be cached)
What's the OS running the server?
DAG which get scheduled at given interval and
Yes exactly what will be part of the next iteration of the controller/service
an example achieving what i propose would be greatly helpful
Would this help?from trains.automation import TrainsJob job = TrainsJob(base_task_id='step1_task_id_here') job.launch(queue_name='default') job.wait() job2 = TrainsJob(base_task_id='step2_task_id_here') job2.launch(queue_name='default') job2.wait()
Hi MinuteCamel2
I can I disable it from automatically uploading model checkpoints to ClearML servers?
Maybe this one can help :)
https://www.youtube.com/watch?v=etGjxOKG9lo
deleted all of the models from my ClearML project but I still receive this message. Do you know why?
It might take it a few hours to update... 😞
Hi @<1575656665519230976:profile|SkinnyBat30>
Streamlit apps are backend run (i.e. the python code drives the actual web app)
This means running your Tasks code and exposing the web app (i.e. http) streamlit.
This is fully supported with ClearML, but unfortunately only in the paid tiers 😞
You can however run your Task with an agent, make sure the agent's machine is accessible and report the full IP+URL as a hyper-parameter or property, and then use that to access your streaml...
FierceFly22 wow that is a cool hack! Trains will capture any torch.save , so I think the actual driver here is the 'model.summary' . You can also upload any artifact with task.upload_artifact('name', 'modelsummary.txt')
Touching a file will not trigger Trains, as it does not monitor the files themselves. Make sense?
BTW, how will you get the file when running with the agent? If you are using the connect_configuration it will be downloaded from the trains-server for you. Otherwise you can alw...