MoodyCentipede68 from your log
clearml-serving-triton | E0620 03:08:27.822945 41 model_repository_manager.cc:1234] failed to load 'test_model_lstm2' version 1: Invalid argument: unexpected inference output 'dense', allowed outputs are: time_distributed
This seems the main issue of triton failing to.load
Does that make sense to you? how did you configure the endpoint model?
Well from the error it seems there is no layer called "dense" , hence triton failing to find the layer returning the reult. Does that make sense?
just to check. Does the k8s glue install torch by default?
SubstantialElk6 what do you mean the glue installs torch ?
The glue will take a Task from the queue create a k8s job (basically use the same docker and inside the docker run get the agent to execute the requested Task). Where would the "torch" come into play?
DeliciousBluewhale87 great we have progress, this look slike it is inheriting from the system packages:
For example you can see in the log,Requirement already satisfied: future>=0.16.0 in /usr/local/lib/python3.6/dist-packages
Now the question is which docker it is running, because as you can see at the bottom of the log, tensorflow is not listed as installed, but other packages installed inside the docker are listed.
wdyt?
DeliciousBluewhale87 this is exactly how it works,
The glue puts a k8s job with the requested docker image (the one on the Task), the job itself (k8s job) starts the agent inside the requested docker, then the agent inside the docker will install all the required packages.
When I look at the details, model artifact in the ClearML UI, it's been saved the usual way, and no tags that I added in the OutputModel constructor are there.
Did you disable the autologging ? Are you saying the tags not appearing is a bug (it might be) ?
Also, I don't mind auto logging either if I have control over publishing the model or not directly from that script, and adding tags etc, like OutputModel.
Sure you can publish models / add tags etc, wither from the UI or pr...
VexedCat68 are you manually creating the OutputModel object?
Once a model is saved and published, it should be downloadable right
Well that depends if you configured CLearML to autoupload it (by default it will just log the "local location").
To auto-upload add output_uri=True
to Task.Init
(or specify a destination with output_uri= ` s3://bucket/ )
You can also configure it as default here:
https://github.com/allegroai/clearml/blob/65f1c0baa124efb05fb7894a5386f0dd52c0536b/docs/clearml.conf#L163
is it a shared network mount ? could you just delete the entire ~/.clearml on the host machine ?
like this.. But when I am cloning the pipeline and changing the parameters, it is running on default parameters, given when pipeline was 1st run
Just making sure, you are running the cloned pipeline with an agent. correct?
What is the clearml version you are using?
Is this reproducible with the pipeline example ?
If you have the check point (see output_uri for automatically uploading it) then you can always load it. Do you mean if you can change the iteration/ step counter? Or do you mean with trains-agent?
Or you can do:
param={'key': 123}
task.connect(param)
Hi PompousBeetle71 , Trains will log all the torch.save call, I'm assuming they do not actually use it for the rest of the files on that folder.
If you like to share a code snippet we could see if we could auto-magically log it You could use artifacts and store the entire folder. It will zip it an upload it. Then you can reuse it from other experiments. https://allegro.ai/docs/task.html?highlight=artifact#trains.task.Task.upload_artifact
Example:
` task.upload_artifact('transformer', './my_...
Hi NutritiousBear41 , asking questions here is exactly the reason we open the Slack channel :)
Regrading the error, it might be that you are stubbled on a bug , do you get the git repo on the UI?
Hmm, yes this fits the message. Which basically says that it gave up on analyzing the code because it run out of time. Is the execution very short? Or the repo very large?
Hi RipeGoose2 all PR's are welcome, feel free to submit :)
Hi LivelyLion31
Yes, the reason we designed Trains with an automagic integration is exactly that reason, so users do not need to learn another package and that with almost no effort you get most of the benefits.
Regrading the TB files, from experience most users will use the TB files short after they executed the experiment, usually for debugging and in depth capabilities (like network debugger profile etc), metric view is something that is much easier to do on a centralized server (like on...
Also. finally the columns will be movable and re sizable, I can't wait for the next version ;)
Hi FriendlyKoala70 you can edit the installed package section and add the missing package. See more details on how trains-agent works here (although it's on conda the same rules apply for pip) https://github.com/allegroai/trains-agent/issues/8
Hi UnsightlyShark53 I think you are absolutely right, there is no reason for the trains.errors.UsageError: ArgumentParser.parse_args() ...
Error.
As you mentioned, if auto_connect_arg_parser=False
is False, it should just ignore what it picked automatically.
I will make sure the error is resolved I will also make sure, you will still be able to connect the argparse manually with task.connect(parser)
after the Task has been created. Thanks for the reference! I took a look o...
Hi UnsightlyShark53 , just a quick FYI, you can also log the entire config file config.json
this will be stored as model configuration, and you can see it in the input/output models under the artifacts tab.
See example here you can path either the path to the configuration file, or the dictionary itself after you loaded the json, whatever is more convenient :)
Hi SteadyFox10 the way it works is that Trains limits the debug image history by reusing the same files names, so the UI will only present the iterations where the debug images are relevant for. With your sample code it looks like it exposes a bug , the generated link should contain iteration number, it does not and so it overwrites the debug images every iteration. Here is the image link: https://demofiles.trains.allegro.ai/Test/test_images.6ed32a2b5a094f2da47e6967bba1ebd0/metrics/Test/te...
WickedElephant66 is this issue the same as this one?
https://clearml.slack.com/archives/CTK20V944/p1656537337804619?thread_ts=1656446563.854059&cid=CTK20V944
Hmmm, I'm not sure that you can disable it. But I think you are correct it should be possible. We will add it as another argument to Task.init. That said, FriendlyKoala70 what's the use case for disabling the code detection? You don't have to use it later, but it is always nice to know :)
I see... We could definitely add an argument to control it. I'll update here once there is an RC
You can always access the entire experiment data from python
'Task.get_task(Id).data'
It should all be there.
What's the exact use case you had in mind?
So the way it will work, is you will also need to have a Task.init in main process (the one using the launch function) and the same Task.init in the main_func. What it does is it signals the sub processes to use the main process task. This way they all report to the same task. Obviously to test it you will need to wait for the RC (after the weekend :)
Thanks VexedKangaroo32 , this is great news :)