Reputation
Badges 1
25 × Eureka!Hi SkinnyPanda43
I realized that the params are not being saved anymore
Could you test with clearml==1.0.4 ?
we will try to use Triton, but itβs a bit hard with transformer model.
Yes ...
All extra packages we add in serving)
So it should work, you can also run your preprocess class manually from your own machine (for debugging), if you pass to it a local file (basically the downloaded model file from the UI, it should work
it. But itβs maybe not the best solution
Yes... it is not, separating the pre/post to CPU instance and letting triton do the GPU serving is a lot more effici...
Hi ReassuredTiger98
When clearml is running inside the docker the installed packages of the WebUI get updated.
Yes, this is by design, so the agent can always reproduce the exact python environment.
(internal the original requirements is also stored, but not available in the UI).
What exactly is the use case here ? wouldn't make sense to reproduce the entire working environment when you clone the executed Task ?
Hi AstonishingSwan80 , what do you mean by "ec2 API"?
After it finishes the 1st Optimzation task, what's the next job which will be pulled ?
The one in the highest queue (if you have multiple queues)
If you use fairness it will pull in round robin from all queues, (obviously inside every queue it is based on the order of jobs).
fyi, you can reorder the jobs inside the queue from the UI π
DeliciousBluewhale87 wdyt?
Hi ApprehensiveFox95
You mean from code remove the argparse arguments ?
Or post execution in the UI?
BTW: seems like conda doesn't support git+git:// packages
How about switching to pip ? you can still run the entire thing from conda env, it will just use pip & venv to install everything, other than that it should work as expected.
I'm assuming your are looking for the AWS autoscaler, spinning EC2 instances up/down and running daemons on them.
https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py
https://clear.ml/docs/latest/docs/guides/services/aws_autoscaler
GiganticTurtle0 quick update, a fix will be pushed, so that casting is based on the Actual value passed not even type hints π
(this is only in case there is no default value, otherwise the default value type is used for casting)
So clearml server already contains an authentication layer (JWT Token), and you do have a full user management on top:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#web-login-authentication
Basically what I'm saying if you add httpS on top of the communication, and only open the 3 ports, you should be good to go. Now if you really need SSO (AD included) for user login etc, unfortunately this is not part of the open source, but I know they have it in the scale/ent...
Python3.8 I can quickly check, give me a minute
Is this caused by running the script with the arguments
Yep π
- try with the latest RC
1.8.1rc2
, it feels like after git clone, it spend minutes without outputting anything
yeah that is odd , can you run the agent with --debug (add before the daemon command) , and then at the end of the command add --foreground
Now launch the same task on that queue, you will have a verbose log in the console.
Let us know what you see
Yes RipeGoose2 you are totally correct π if you want the models to be auto uploaded in the offline session you have to pass output_uri (or default_output_uri).
Thanks ScantChimpanzee51 !
Let me see what I can find, should be easy enough to fix now π
Hi LazyLeopard18
I suggest removing the trains.conf and running:trains-initAt the end of the wizard it verifies the credentials, so you should be good to go.
I would also recommend using the machine IP and not local host, as on some setups (Windows / VM etc) localhost will no be bridged to the VM/Docker but machine IP will be.
Bottom line the driver version in the host machine does not support the CUDA version you have in the docker container
JitteryCoyote63 nice hack π
how come it is not automatically logged as console output ?
Hi OutrageousSheep60
AS-IS
- without compressing or breaking it up into chunks.
So for that I would suggest to manually archive it, and upload as external link?
Or are you saying you want to control the compression used by Dataset class ?
https://github.com/allegroai/clearml/blob/72d9b22e0d27f317a364acfeacbcf5c70f852e8c/clearml/datasets/dataset.py#L603
Ohh I see now the force SSH did not replace the user in the SSH link (only if the original was http), right ?
What's the exact error you are getting ?
(Maybe this is privilege error on the cache folder, what are the folders it is using, you can see in the configuration as well)
Hi @<1545216070686609408:profile|EnthusiasticCow4>
is there a way to get the date from the InputModel?
You should be able to with model._get_model_data()
But I think we should have it all exposed, wdyt?
Hi TenseOstrich47 whats the matplotlib version and clearml version you are using ?
Hi SkinnyPanda43
Can you attache the full log?
Clearml agent is installed before your requirements.txt , at least in theory it should not collide
Hi CloudySwallow27
This error occurs randomly during training (in other words training does successfully start).
What's the cleamrl-agent version you are using, and the clearml version ?
DrabSwan66
Did you set "docker_install_opencv_libs: true" in your clearml.conf on the host machine ?
https://github.com/allegroai/clearml-agent/blob/e416ab526ba9fe05daa977b34c9e46b50fb214a0/docs/clearml.conf#L150
Just making sure, you are running clearml-agent in docker mode, correct?
What's the container you are using ?