
Reputation
Badges 1
25 × Eureka!Check here:
https://github.com/allegroai/trains/blob/master/docs/trains.conf#L78
You can configure credentials based on the bucket name. Should work for Azure as well
Notice that we are using the same version:
https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/clearml_serving/engines/triton/Dockerfile#L2
The reason was that previous version did not support torchscript, (similar error you reported)
My question is, why don't you use the "allegroai/clearml-serving-triton:latest" container ?
Would you have an example of this in your code blogs to demonstrate this utilisation?
Yes! I definitely think this is important, and hopefully we will see something there 🙂 (or at least in the docs)
GiganticTurtle0 fix was just pushed to GitHub 🙂pip install git+
(apologies I just got to it now)
First of all, kudos on the video, this is so nice!!!
And thanks to you I think I found it:
None
we have to call serialize Before the execute_remotely
(the reason why sometimes it works is that it syncs in the background, so sometimes it's just fast enough and you get the config object)
Let me check if we can push an RC with a ...
Hi SubstantialElk6
We will be running some GUI applications so is it possible to forward the GUI to the clearml-session?
If you can directly access the machine running the agent, yes you could. If not reverse proxy is in the working 😉
We have a rather locked down environment so I would need a clear view of the network view and the ports associated.
Basically all connections are outgoing only, with the exception of the clearml-server (listening on ports 8008 8080 8081)
os.environ['CLEARML_PROC_MASTER_ID'] = ''
Nice catch! (I'm assuming you also called Task.init somewhere before, otherwise I do not think this was necessary)
I think i solved it by deleting the project and running the base_task one time before the hyper parameter optimzation
So isit working now? everything is there ?
Hi SubstantialElk6 I believe you just need to use clearml 1.0.5 , and make sure you rae passing the correct OS environment to the agent
Hi @<1523701066867150848:profile|JitteryCoyote63>
Setting to redis from version 6.2 to 6.2.11 fixed it but I have new issues now
Was the docker tag incorrect in the docker compose ?
It could be the model storing? could it be the peak is at the end of the epoch ?
@<1595587997728772096:profile|MuddyRobin9> are you sure it was able to spin the EC2 instance ? which clearml version autoscaler are you running ?
No worries, condatoolkit is not part of it. "trains-agent" will create a new clean venv for every experiment, and by default it will not inherit the system packages.
So basically I think you are "stuck" with the cuda drivers you have on the system
one of them has been named incorrectly and now I'm trying to remove it and it's not running anywhere,
Oh I see, meaning until it "times out".
You could search for it in the UI (based on the session ID) and abort/archive it
The data I'm syncing by an data provider wich supports only an ftp connection....
Right ... that makes sense :)
No worries WickedGoat98 , feel free to post questions when they arise. BTW: we are now improving the k8s glue, so by the time you get there the integration will be even easier 🙂
However, SNPE performs quantization with precompiled CLI binary instead of python library (which also needs to be installed). What would be the pipeline in this case?
I would imagine a container with preinstalled SNPE compiler / quantizer, and a python script triggering the process ?
one more question: in case of triggering the quantization process, will it be considered as separate task?
I think this makes sense, since you probably want a container with the SNE environment, m...
And is Exectuer actually runs something, or is it IO?
Out of curiosity, what ended up being the issue?
ReassuredTiger98 you mean when calling clearml-init
? or default value?
the storage configuration appears to have changed quite a bit.
Yes I think this is part of an the cloud ready effort.
I think you can find the definitions here:
https://artifacthub.io/packages/helm/allegroai/clearml
Wait, @<1686547375457308672:profile|VastLobster56> per your config clearml-fileserver
who sets this domain name? could it be that it is only on our host machine? you can quickly test by running any docker on your machine and running ping clearml-fileserver
from the docker itself.
also your log showed "could not download None ..." , I would expect it to be None ...
, no?
OddAlligator72 FYI, in you current code you can always doif use_trains: from trains import Task Task.init()
Might be easier 😉
2021-07-11 19:17:32,822 - clearml.Task - INFO - Waiting to finish uploads
I'm assuming a very large uncommitted changes 🙂
Sorry @<1689446563463565312:profile|SmallTurkey79> just notice your reply
Hmm so I know the enterprise version has a built-in support for slurm, which would remove the need to deploy agents on the slurm cluster.
What you can do is on the SLURM login server (i.e. a machine that can run sbatch), write a simple script that pulls the Task ID from the queue and calls sbatch with clearml-agent execute --id <task_id_here>
, would this be agood solution