Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Guys! For Some Reason I Can'T Get Trains To Log Images To

Hi guys!

For some reason I can't get Trains to log images to Debug Samples tab using Tensorboard SummaryWriter.add_image . Also tried explicit reporting, pytorch tensors, numpy tensors, pillow images - to no avail. What could be the problem?

  
  
Posted 3 years ago
Votes Newest

Answers 29


Hi ProudMosquito87 , when you say you can't get Trains to log images, what exactly do you see? Do you get an error?

  
  
Posted 3 years ago

No, it's like nothing happens at all

  
  
Posted 3 years ago

Yes, there's a lot of DataCloneError:
DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 sendRemoveListener on closed conduit simple-translate@sienori.2199023255865 ConduitsChild.jsm:108 InvalidStateError: An attempt was made to use an object that is not, or is no longer, usable PictureInPictureChild.jsm:143 sendRemoveListener on closed conduit simple-translate@sienori.2199023255875 ConduitsChild.jsm:108 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 sendRemoveListener on closed conduit simple-translate@sienori.2199023255888 ConduitsChild.jsm:108 sendRemoveListener on closed conduit simple-translate@sienori.2199023255905 ConduitsChild.jsm:108 DataCloneError: The object could not be cloned. 3 ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. 6 ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. 6 ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. 6 ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. 5 ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. 2 ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. 3 ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. 6 ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. 6 ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. 5 ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. 3 ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. 4 ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. 4 ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. 2 ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. 15 ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. 3 ExtensionChild.jsm:813 Use of nsIFile in content process is deprecated. 13 NetUtil.jsm:253:8 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 Relative positioning of table rows and row groups is now supported. This site may need to be updated because it may depend on this feature having no effect. RequestListHeader.js:545:20 sendRemoveListener on closed conduit simple-translate@sienori.4123168604366 ConduitsChild.jsm:108 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 sendRemoveListener on closed conduit simple-translate@sienori.4123168604447 ConduitsChild.jsm:108 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813 DataCloneError: The object could not be cloned. ExtensionChild.jsm:813

  
  
Posted 3 years ago

Where does debug samples being stored? Maybe I can validate if they were uploaded to the server, I have an access to it

  
  
Posted 3 years ago

Do you see any error in the browser's developer's console? In the console log, on in REST calls in the Network tab?

  
  
Posted 3 years ago

Also projects.get_all_ex method seems to be blocked

  
  
Posted 3 years ago

They are uploaded to the Trains File Server component

  
  
Posted 3 years ago

Do you see any errors in the experiment log?

  
  
Posted 3 years ago

No error, no warnings, images just don't show up

  
  
Posted 3 years ago

You should see a call sent to the server to get the debug images when you switch to the debug images tab - can you see if it returns anything?

  
  
Posted 3 years ago

events.get_task_metrics is the relevant call

  
  
Posted 3 years ago

What does it look like?

  
  
Posted 3 years ago

Also event.get_all_ex

  
  
Posted 3 years ago

Here's the payload:

{"meta":{"id":"0fbbcd4fd5cf4b40a47997396521bae6","trx":"0fbbcd4fd5cf4b40a47997396521bae6","endpoint":{"name":"events.get_task_metrics","requested_version":"2.9","actual_version":"1.0"},"result_code":200,"result_subcode":0,"result_msg":"OK","error_stack":""},"data":{"metrics":[{"task":"9757604caf0342b6907d5eb672fe53cd","metrics":[]}]}}

  
  
Posted 3 years ago

Here're the events.scalar_metrics payload:

{"meta":{"id":"08de8b5099dd4484a0aad19af2a140c2","trx":"08de8b5099dd4484a0aad19af2a140c2","endpoint":{"name":"events.scalar_metrics_iter_histogram","requested_version":"2.9","actual_version":"1.0"},"result_code":200,"result_subcode":0,"result_msg":"OK","error_stack":""},"data":{}}

  
  
Posted 3 years ago

So there were no debug images reported as far as the server is concerned...

  
  
Posted 3 years ago

That's what I'm thinking, but I'm not sure what could be wrong

  
  
Posted 3 years ago

Yes, looks like it

  
  
Posted 3 years ago

Ok guys we fixed it, the issue was an unavailable files_server. After I've changed iptables rules everything works fine

  
  
Posted 3 years ago

I think you should first start with the simple Trains example for reporting images, and get it to work

  
  
Posted 3 years ago

You can Pm it to me if you'd like (after redacting sensitive info)

  
  
Posted 3 years ago

I'll remove sensitive info and send it here

  
  
Posted 3 years ago

What are the relevant parts?

  
  
Posted 3 years ago

Perhaps your traind.conf is not configured correctly for the file server?

  
  
Posted 3 years ago

I see events.get_task_metrics call

  
  
Posted 3 years ago

Another thing by the way: seems like when I try to upload images scalars cease to update too

  
  
Posted 3 years ago

No, nothing

  
  
Posted 3 years ago

And events.get_task_plots

  
  
Posted 3 years ago

image

  
  
Posted 3 years ago