oh that makes sense.
I would add to your Task's docker startup script the following:
ls -la /.ssh
ls -la ~/.ssh
cat ~/.ssh/id_rsa
Let's see what you get
Maybe different API version...
What's the trains-server version?
can i run it on an agent that doesn't have gpu?
Sure this is fully supported
when i run clearml-serving it throughs me an error "please provide specific config.pbtxt definion"
Yes this is a small file that tells the Triton server how load the model:
Here is an example:
https://github.com/triton-inference-server/server/blob/main/docs/examples/model_repository/inception_graphdef/config.pbtxt
Nice SoreHorse95 !
BTW: you can edit the entire omegaconf yaml externally with set/get configuration object (name = OmegaConf) , do notice you will need to change Hydra/allow_omegaconf_edit to true
task.set_script(working_dir=dir, entry_point="my_script.py")
Why do you have this part? isn't it the same code, the script entry point is auto detected ?
... or when I run my_script.py locally (in order to create and enqueue the task)?
the latter, When the script is running locally
So something like
os.path.join(os.path.dirname(file), "requirements.txt")
is the right way?
Sure this will work 🙂
Sorry ScaryLeopard77 I missed the reply,
the tutorial in the readme of clearml-serving repo doesn't mention it though. Where should I set it?
oh dear ... you are right (I think it was there in previous versions)clearml-serving --help
https://github.com/allegroai/clearml-serving/blob/ce6ec847b1e01c6f5bf35d638e6ceb8148db8a7a/clearml_serving/main.py#L142
This is the equivalent of what is created here in the example:
https://github.com/allegroai/clearml-serving/blob/ce6ec847b...
oh dear ...
ScrawnyLion96 let me check with front-end guys 😞
or creating a dedicated function I would suggest also including the actual sampled point in the HP space.
Could you expand ?
This would be the most common use case, and essentially the reason for running the HPO understanding the sensitivity of metrics with respect to hyper-parameters
Does this relates to:
https://github.com/allegroai/clearml/issues/430
manually" filtering the keys I've put in for the HP space. I find it a bit strange that they are not saved as part of t...
How is this different from argparser btw?
Not different, just a dedicated section 🙂 Maybe we should do that automatically, the only "downside" is you will have to name the Dataset when getting it (so it will have an entry name in the Dataset section), wdyt ?
What is the recommended way of providing S3 credentials to cleanup task?
cleaml.conf or OS environment (AWS_ACCESS_KEY_ID ...)
, i thought there will be some hooks for deploying where the integration with k8s was also taken care automatically.
Hi ObedientToad56
Yes you are correct, basically now you have a docker-compose (spinning everything, even though per example you can also spin a standalone container (mostly for debugging).
We are working on a k8s helm chart so the deployment is easier, it will be based on these docker-compose :
https://github.com/allegroai/clearml-serving/blob/main/docker/docker-comp...
Hmm okay let me check that, I think I understand the issue
I'll make sure they get back to you
and those env variables are credentials for ClearML. Since they are taken from k8s secrets, they are the same for every user.
Oh ...
I can create secrets for every new user and set env variables accordingly, but perhaps you see a better way out?
So the thing is, if a User spins the k8s job, the user needs to pass their credentials (so the system knows who it is)... You could just pass the user's key/secret (not nice, but probably not a big issue, as everyone is an Admin anyhow,...
The pipeline itself is also a task, so this line works in a pipeline. Task.current_task is a class method that returns the running task (pipeline in our case), then then the usual interface. BTW what are you having in the conf file ?
Hi RotundSquirrel78
How did you end up with this command line?/home/sigalr/.clearml/venvs-builds/3.8/code/unet_sindiff_1_level_2_resblk --dataset humanml --device 0 --arch unet --channel_mult 1 --num_res_blocks 2 --use_scale_shift_norm --use_checkpoint --num_steps 300000
the arguments passed are odd (there should be none, they are passed inside the execution) and I suspect this is the issue
Only as "default docker + argument" , if you need the "extra_docker_arguments" (which I think a mount point is a good example for), then you have to add it in the conf file
This will fix it, the issue is the "no default value" that breaks the casting@PipelineDecorator.component(cache=False) def step_one(my_arg=""):
Hi ElegantCoyote26 , yes I did 🙂
It seems cometml puts their default callback logger for you, that's it.
Hi @<1557899668485050368:profile|FantasticSquid9>
There is some backwards compatibility issue with 1.2 (I think).
Basically what you need it to spin a new one on a new session ID and rergister the endpoints
while I want to upload a converted
.onnx
weights with custom tags to my custom project
Oh I see, sure, see this one?
https://github.com/allegroai/clearml/blob/master/examples/reporting/model_reporting.py
Or:output_model.update_weights(weights_filename="/path/to/file.onnx")
Hi AttractiveCockroach17
. Many of these experiments appear with status running on clearml even though they have finish running,
Could it be their process just terminated? (i.e. not properly shutdown) ?
How are you running these multiple experiments?
BTW: if the server does not see any change in a Task for (I think the default is 2 hours) it will automatically mark these Task as aborted
Hi JuicyFox94 ,
Actually we just added that 🙂 (still on GitHub , RC soon)
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/controller.py#L696
Hi StoutElephant16
You mean like cron Job?
(Unfortunately if this is the case, then currently no CLI for that, but it is a great idea, maybe open a github issue to make sure we do not forget to add it 😄 )
I believe a process is still running in the background. Is it expected? (v0.17.4)
Yes it is expected.
Basically it reports that the resource monitoring did not detect any "iterations"/"steps" reporting, so instead of reporting resources based on iterations it reports based on time. Make sense ?
why not let the user start with an empty comparison page and add them from "Add Experiment" button as well?
Apologies, I was not clear. Yes I'm with you, this is a great idea 🙂