Reputation
Badges 1
25 × Eureka!ShallowGoldfish8 how did you get this error?self.Node(**eager_node_def) TypeError: __init__() got an unexpected keyword argument 'job_id'
Thanks CooperativeFox72 ! I'll test and keep you posted π
Also, don't be shy, we love questions π
YummyWhale40 you mean like continue training?
https://github.com/allegroai/trains/issues/160
What would be the best way to get all the models trained using a certain Task, I know we can use query_models to filter models based on Project and Task, but is it the best way?
On the Task object itself you have all the models.Task.get_task(task_id='aabb').models['output']
its should logged all in the end as I understand
Hmm let me check the code for a minute
Although it's still really weird how it was failing silently
totally agree, I think the main issue was the agent had the correct configuration, but the container / env the agent was spinning was missing it,
I'll double check how come it did not print anything
The bug was fixed π
Can you test with the credentials also in the global section
None
key: "************"
secret: "********************"
Also what's the clearml python package version
Hi RipeGoose2
You can also report_table them? what do you think?
https://github.com/allegroai/clearml/blob/master/examples/reporting/pandas_reporting.py
https://github.com/allegroai/clearml/blob/9ff52a8699266fec1cca486b239efa5ff1f681bc/clearml/logger.py#L277
Hi GiddyTurkey39
Glad to see that you are already diving into the controllers (the stable release will be out early next week)
A bit of background on how the pipeline controller are designed:
All steps in the pipeline are experiments already registered in the system (i.e. you can see them in the UI). Regardless on how you created those experiments they have to be there prior to the pipeline launch. The pipeline itself can be executed on any machine (it does very little, and...
In my understanding requests still go through
clearml-server
which configuration I left
DefiantHippopotamus88 actually this is Not correct.
clearml-server only acts as a control plane, no actual requests are routed to it, it is used to sync model state, stats etc. not part of the request processing flow itself.curl: (56) Recv failure: Connection reset by peer
This actually indicates 9090 port is not being listened to...
What's the final docker-compose you are usi...
Task.add_requirements does not handle it (traceback in the thread). Any suggestions?
Hmm that is a good point maybe we should fix that π
I'm assuming someone already created this module? Or is it part of the repository?
(if it than the assume this executed from the git root)
I just tested the master with https://github.com/jkhenning/ignite/blob/fix_trains_checkpoint_n_saved/examples/contrib/mnist/mnist_with_trains_logger.py on the latest ignite master and Trains, it passed, but so did the previous commit...
basically the default_output_uri will cause all models to be uploaded to this server (with specific subfolder per project/task)
You can have the same value there as the files_server.
The files_server is where you have all your artifacts / debug samples
RipeGoose2 yes, the UI cannot embed the html yet, but if you go click on the link itself it will open the html in a new tab.
Could you verify it works ?
So actually while weβre at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).
Is this being returned from your Triton Model? or the pre/post processing code?
Thank you so much!! π€©
how would I get an agent to launch in the same instance of my clearml server
Actually that is my point, you do not have to spin the agent on the clearml-server instance. We added the services agent as part of the docker-compose for easier deployment, that said you can always manually SSH to the server, or run on any other machine, like you would spin any other clearml-agent
.
Does that make sense ?
- This then looks for a module called
foo
, even though itβs just a namespaceI think this is the issue, are you using python package name spaces ?
(this is a PEP feature that is really rarely used, and I have seen break too many times)
Assuming you have fromfrom foo.mod import
what are you seeing in pip freeze ? I'd like to see if we can fix this, and better support namespaces
Hi @<1671689437261598720:profile|FranticWhale40>
You mean the download just fails on the remote serving node becuause it takes too long to download the model?
(basically not a serving issue per-se but a download issue)
Should pass only_published:
https://github.com/allegroai/clearml/blob/071caf53026330f3bb8019ee5db3d039562072f3/clearml/model.py#L444
Actually unless you specifically detached the matplotlib automagic, any plt.show() will be automatically reported.
Okay so my thinking is, on the pipelinecontroller / decorator we will have:abort_all_running_steps_on_failure=False
(if True, on step failing it will abort all running steps and leave)
Then per step / component decorator we will havecontinue_pipeline_on_failure=False
(if True, on step failing, the rest of the pipeline dag will continue)
GiganticTurtle0 wdyt?
BTW: which clearml version are you using ?
(I remember there was a change in the last one, or the one before, making the config loading differed until accesses)
I think there is a bug on the UI that causes series with "." to only use the first part of the series name for the color selection. This means "epsilon 0" and "epsilon 0.1" will always get the same color, and this will explain why it works on other graphs
ElegantKangaroo44 my bad π I missed the nuance in the description
There seems to be an issue in the web ui -> viewingΒ plots in "view in experiment table" doesn't respect the "scalars to display" one sets when viewing in "view in fullscreen".
Yes the info-panel does not respect the full view selection, It's on the to do list to add this ability, but it is still no implemented...