It's just the print (_ repr _) not showing the datafor w in client.workers.get_all(): print(w.data)
server-->agent is fast, but agent-->server is slow.
Then multiple connection will not help, this is the bottleneck of the upload speed of your machine, regardless of what the target is (file-server, S3, etc...)
How did you define the decorator of "train_image_classifier_component" ?
Did you define:@PipelineDecorator.component(return_values=['run_model_path', 'run_tb_path'], ...
Notice two return values
It seems like you are correct, everything should just work. Are you still getting the error? What's the clearml agent version?
in the docker-compose file. Still strange...
hmm yes it is... If you have an idea on what went wrong let me know, we would love to fix it
the other repos i have are constantly worked on and changing too
Not only it will be cloned automatically, the git diff of the sub-modules are stored as well 🙂
I think so (you can also comment out the Task.init() just to verify this is not a clearml issue)
Sure LazyTurkey38 here's a nice hack for that:
` # code here
task.execute_remotely(queue_name=None, clone=False, exit_process=False)
patch the Task and actually send it for execution
if Task.running_locally():
task.update_task(task_data={'script': {'branch': 'new_branch', 'repository': 'new_repo'}})
# now to actually enqueue the Task
Task.enqueue(task, queue_name='default') You can also clear the git diff by passing
"diff": "" `
wdyt?
SmarmySeaurchin8 regrading (2)
I'm not sure the current visualization supports it. I mean we can put "{}", but that would imply you can edit it, which then we have to support, possible but weird, and this is why:task.connect({'a':{},'b': {'nested': 'value}}
will become'a' = '{}'
'b/nested' = 'value'
But then if you edit to:'a' = '{'nested': 'value'}'
'b/nested' = 'value'
you have two different ways of presenting the same type of structure...
I failed to update the "STARTED AT" and the "COMPLETED AT" attributes in the "INFO" tab.
I'm not sure this can actually be overridden...
Do you have any advice for this step, (monitoring)? I feel like it's not very well documented.
Yeah I think it is complicated.
I would start with the example here: None
Basically what it does is create histogram over time of the values the Rest API gets. Then in graphana it visualizes those values.
Notice that the request latency / frequency are automatically logged ...
I can but that is not a configuration we would want to run with in production
Agreed, I just want to isolate the issue. I think this is the bottom python interface missing some configuration or environment variables
WackyRabbit7 This is a json representation of the entire plot (basically how plotly sees it).
What you are after is:full_json[0]['cells']['values']
Which is a list of lists (row order) in the table
If I edit directly the OmegaConf in the UI than the port changes correctly
This will only work if you change the Hydra/allow_omegaconf_edit to True in the UI. Did you?
or shall I call the Task.init even from the agent
WorriedParrot51 I think something is lost here.
Task.init() is always called, even when the agent is executing the code. The difference is in what happens inside the Task.init() call. When the codebase itself is executed by the trains-agent, it signals through OS environment to the task.init() that instead of a new created task, it should use the already created one. from this point all data flows from the trains-server back into the c...
Hi LazyTurkey38
, is it possible to have the agents keep a local version and only download the diff of the job commit to speed things up?
This is what it does, it has a local cached copy and it only pulls the latest changes
Hi TrickySheep9
Could you post the pipeline code here?
Also which clearml version are you using ?
Yes, that sounds like the issue, is the file actually there ?
LazyTurkey38 , ohh I think you are correct 😞
it should be:# patch the Task and actually send it for execution if Task.running_locally(): # this will verify all auto repo detection and python is done. task.close() # so that we can edit the task task.reset() # update the repo task.update_task(task_data={'script': {'branch': 'new_branch', 'repository': 'new_repo'}}) # now to actually enqueue the Task Task.enqueue(task, queue_name='default')
wdyt?
Can you fix locally, just to verify ?
SmugLizard25 are you saying that with the latest version it does not work?
But this is not copy, this is mount, your log showed cp failing
Hi @<1610083503607648256:profile|DiminutiveToad80>
do you have a full log? can you share the code you are trying to run?
Expected behaviour is that it reads last iteration correctly. At least it is stated in docs so.
This is exactly what should happen, are you saying that for some reason it fails?
Also, how do pipelines compare here?
Pipelines are a type of Task, so like Tasks you can clone and enqueue them, or set them as the target of the trigger.
the most flexible solution would be to have some way of triggering the execution of a script in the parent task environment,
This is the exact idea of the TriggerScheduler None
What am I missing here?