I am providing a helper to run a task in queue after running it locally in the notebook
Is this part of a pipeline process or just part of the workflow ?
(reason for asking is that if this is a pipeline thing we might be able to support it in v2)
Dynamic GPU option only available with Enterprise version right?
Correct 🙂
My clearml-server server crashed for some reason
😞 No worries
"regular" worker will run one job at a time, services worker will spin multiple tasks at the same time But their setup (i.e. before running the actual task) is one at a time..
Hi @<1567321739677929472:profile|StoutGorilla30>
Is it necessary to serve keras model using triton engine?
It is not, but it is the most efficient way to serve keras models, and this is why by default clearml-serving is using Nvidia Triton (we are talking 10x factors)
I would start with the keras example, see that it works and then work your way into your example (notice you always need to provide the layers form the in/out of the model)
[None](https://github.com/allegroai/clearml-s...
think perhaps it came across as way more passive aggressive than I was intending.
Dude, you are awesome for saying that! no worries 🙂 we try to assume people have the best intention at heart (the other option is quite depressing 😉 )
I've been working on a Azure load balancer example, ...
This sounds exciting, let me know if we can help in any way
For example, could you test if this one works:
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py
Hi LovelyHamster1
Could you think of a toy code that reproduces this issue ?
Even before we had a chance to properly notice everyone 🙂
Thank you! All the details will follow in a dedicated post, for the time being, I can say that pushing a model with pre/post processing python code and full scalable inference solution has never been easier
https://github.com/allegroai/clearml-serving/tree/main/examples/sklearn
but when I run the same task again it does not map the keys.. (edited)
SparklingElephant70 what do you mean by "map the keys" ?
It analyses the script code itself, going over all imports and adding only the directly imported packages
Actually this should be a flag
Hmm, interesting, why would you want that? Is this because some of the packages will fail?
Hmm I think everything is generated inside the c++ library code, and python is just an external interface. That means there is no was to collect the metrics as they are created (i.e. inside the c++ code), which means the only was to collect them is to actively analyze/read the tfrecord created by catboost 😞
Is there a python code that does that (reads the tfrecords it creates) ?
Hi @<1533982060639686656:profile|AdorableSeaurchin58>
Notice the scalars and console are stored on the elasticsearch DB, this is usually under/opt/clearml/data/elastic_7
For visibility, after close inspection of API calls it turns out there was no work against the saas server, hence no data
Oh I see, what you need is to pass '--script script.py' as entry-point and ' --cwd folder' as working dir
I see TrickyFox41 try the following:--args overrides="param=value"
Notice this will change the Args/overrides argument that will be parsed by hydra to override it's params
Okay here is a standalone code that should be close enough? (if I missed anything let me know)
` import tempfile
from datetime import datetime
from pathlib import Path
import tensorflow as tf
import tensorflow_datasets as tfds
from clearml import Task
task = Task.init(project_name="debug", task_name="test")
(ds_train, ds_test), ds_info = tfds.load(
'mnist',
split=['train', 'test'],
shuffle_files=True,
as_supervised=True,
with_info=True,
)
def normalize_img(image, labe...
callbacks.append( tensorflow.keras.callbacks.TensorBoard( log_dir=str(log_dir), update_freq=tensorboard_config.get("update_freq", "epoch"), ) )
Might be! what's the actual value you are passing there?
Hi ShortElephant92
You could get a local copy from the local server, then switch credentials to the hosted server and upload again, would that work?
Yes please, just to verify my hunch.
I think that somehow the docker mounts the agent is creating are (for some reason) messing it up.
Basically you can just run the following (it will do everything automatically) (replace the <TASK_ID_HERE> with the actual one)
` docker run -it --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig ...
WickedGoat98 Forever 🙂
The limitation is on the storage size
And is there an easy way to get all the metrics associated with a project?
Metrics are per Task, but you can get the min/max/last of all the tasks in a project. Is that it?
Hi @<1541954607595393024:profile|BattyCrocodile47>
I
do
have the SSH key placed at
/root/.ssh/id_rsa
on the machine,
Notice that the .ssh folder is mounted from the host (EC2 / GCP) into the container,
'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh'
This is odd, why is it mounting it to /.ssh and not /root/.ssh ?
LazyTurkey38 I think this is caused by new versions of pip to report the wrong link:
https://github.com/bwoodsend/pip/commit/f533671b0ca9689855b7bdda67f44108387fe2a9