Hi @<1600299043865497600:profile|MagnificentSeaurchin90>
Any chance you can provide more info on the error?
if I want to compare two experiments the scalar plots do not load ( loading forever ).
I'm assuming the issue is the Plots tab? or is it the Scalars? what do you have in the Plots? can you send an image of the single experiment ?
UnsightlyShark53 Awesome, the RC is still not available on pip, but we should have it in a few days.
I'll keep you posted here :)
SlipperyDove40 following on the missing section name, this seems like backwards compatibility issue. Try calling with backwards_compatibility=False
my_params = Task.get_parameters(backwards_compatibility=False)
This should always add the section name prefix.
Lol yeah Hydra is great. Notice you still have the ability to override Hydra from the UI so you really have the best of the two worlds
Feel free to add to the UI request list:
https://github.com/allegroai/trains/issues/81
Hmm how do you launch the autoscaler, code?
Hi @<1547028074090991616:profile|ShaggySwan64>
. If I have a local repo cloned with ssh, the agent will attempt to replace the repo url with https,
Yes if you provide git user/pass (or user / app-pass) the agent would automatically replace and ssh:// repo link with the equivalent https:// and user the user/pass for authentication
but it seems that it doesn't remove the 2222 port in my case. That leads to
Hmm,,, what's the clearml-agent version? if this is not the latest 2.0.0r...
Hi DrabCockroach54
... and no logs for python script.
what do you mean by "no logs" , is it clearml logs? or k8s pod logs ?
Hi @<1631826770770530304:profile|GracefulHamster67>
if you want your current task:
task = Task.current_task()
if you need the pipeline Task from the pipeline component
pipeline = Task.get_task(Task.current_task().parent)
where are you trying to get the pipelines from? I'm not sure I understand the use case?
Notice you have configure the shared driver for the docker, as the volume mount doesn't work without it. https://stackoverflow.com/a/61850413
Hi @<1636175432829112320:profile|PlainSealion45>
I am trying to automatically generate an online endpoint for inference when manually adding tag
released
to a model.
So the "automatic" here means that the model endpoint will be updated with the latest model, but not that a new endpoint will be created.
Does that make sense ?
To add a new endpoint on Tagging a model, you should combine it with ModelTrigger
and have a fucntion that calls the clearml-serving to cr...
Hi SarcasticSparrow10
The plots in the UI allow you to control the colors of the graphs interactively (click on the color in the legend), it also allows you you toggle the legend on/off. This is on purpose so you can later adjust according to your taste 🙂
Is the layout okay (it was hard for me to understand form the screen-grab) ?
I'll make sure to reply the GitHub issue as well
So the thing is, regardless of the link you should end with:helper <clearml.storage.helper.StorageHelper object at 0x....>
But the code that failed seemed to return None, which makes me suspect the url itself is somehow broken.
Any chance you have a space before the "s3://" ?
BTW : what's the clearml version you are using ?
Hi @<1636175432829112320:profile|PlainSealion45>
- I used this initial model to create the endpoint with
model add
command.
I think that the initial model needs to be added with model auto-aupdate
Not with model add
basically do not call model add - this is static, always using the model ID specified (you can deploy new models with manually callign model add on the same endpoint and specifying diffrent model ID , but again manual)
To Automatically have the m...
Thanks for the logs @<1627478122452488192:profile|AdorableDeer85>
Notice that the log you attached means the preprocessing is executed and the GPU backend is returning an error.
Could you provide the log of the docker compose specifically the intersting part is the Triton container, I want to verify it loads the model properly
No, I just want to register a new model in the storage.
Is the model file is already uploaded, you can register it without a Task:InputModel.import_model(...)
https://github.com/allegroai/clearml/blob/b3a2b3425c5098ebfc0598c9dfb3e670d4a87706/clearml/model.py#L521
I need to create a separate task for this right?
If you want the model to be uploaded, then yes you have to create a Task.
DilapidatedDucks58 by default if you continue to execution, it will automatically continue reporting from the last iteration . I think this is what you are seeing
. I am not sure this is related to the fact the model is not correctly converted to TorchScript
Because Triton Only supports TorchScript (Not torch models) 🙂
GreasyLeopard35 I think you are on to something, I think UniformParameterRange just misses a min value:
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/clearml/automation/parameters.py#L168
Should be:[self.min_value + v*step_size for v in range(0, int(steps))]
ValueError: Missing key and secret for S3 storage access
Yes that makes sense, I think we should make sure we do not suppress this warning it is too important.
Bottom line missing configuration section in your clearml.conf
Hmm I suspect the 'set_initial_iteration' does not change/store the state on the Task, so when it is launched, the value is not overwritten. Could you maybe open a GitHub issue on it?
Also what's the additional p
doing at the last line if the screenshot ?
WickedBee96 the return value of dataset.get_local_copy is the Folder where all your files are located, Not the filename itself 🙂
Hi ReassuredTiger98
Good point, since the user actually "running" the code is the agent, all the api calls are registered under its name, including the Model creation.
This is a good point, though ...
I know the enterprise tiers add "impersonate" as part of the security layer, meaning that the agent is Not actually running the code but the creating "user" is, which solve this problem. I'm not sure what actually can be done without this feature... thoughts?
Okay this is indeed reported in the UI, but the trains-agent
is running the experiment, and seems to be failing to clone the repository in question.
Seems like a "https" error, git is actually failing to clone the repository error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function.
Can you manually run the clone command on that machine ? I would guess there is some kind of firewall sitting in the middle of the https connection, and that is causing the git to ...
Yep it is the scale 🙂 and yes it should appear once you upgrade
However, this one should be a feature to work on, and should be fairly easy to implement.
Feel free to add as GitHub issue 🙂
Main challenge is understanding what needs to be added as "uncommitted changes"