Reputation
Badges 1
533 × Eureka!I don't htink I can, this is private IP and to create a dummy example of a pipeline and execution will take me more time than I can dedicate to this
Well done to you!
Manual model registration?
But does it disable the agent? or will the tasks still wait for the agent to dequeue?
This is a part of a bigger process which times quite some time and resources, I hope I can try this soon if this will help get to the bottom of this
What do you mean by submodules?
She did not push, I told her she does not have to push before executing as trains figures out the diffs.
When she pushes - it works
👍
Searched for "custom plotly" and "log plotly" in search, didn't thinkg about "report plotly"
Worth mentioning, nothing has changed before we executed this, it worked before and now after the update it breaks
So regarding 1, I'm not really sure what is the difference
When running in docker mode what is different the the regular mode? No where in the instructions is nvidia docker a prerequisite, so how exacly will tasks on GPU get executed?
I feel I don't underatand enough of the mechanism to (1) understand the difference between docker mode and not and (2) what is the use casr for each
We try to break up every thing into independent tasks and group them using a pipeline. The dependency on an agnet caused an unnecessary overhead since we just want to execute locally. It became a burden once new data scientists join the project and instead of just telling them "yeah, just execute this script" you have to now teach them about clearml, the role of agents, how to launch them, how they behave, how to remove them and stuff like that... things you want to avoid with data scientists
now I get this error in my Auto Scaler taskWarning! exception occurred: An error occurred (AuthFailure) when calling the RunInstances operation: AWS was not able to validate the provided access credentials Retry in 15 seconds
AgitatedDove14 sorry for the late reply,
It's right after executing all the steps. So we have the following block which determines whether we run locally or remotely
if not arguments.enqueue: pipe.start_locally(run_pipeline_steps_locally=True) else: pipe.start(queue=arguments.enqueue)
And right after we have a method that calls Task.current_task() which returns None
I don't even know where trains is coming from... While using the same environment I can't even import trains, see
we are running the agent on the same machine AgitatedDove14 , it worked before upgrading the clearml... we never set these credentials
Thanks a lot, that clarifies things
the output above is what the agent has as it seems... obviously on my machine I have it installed
yeah I guessed so
I have them in two different places, once under Hyperparameters -> General
There are many ohter packages in my environment which are not listed
btw my site packages is false - should it be true? You pasted that but I'm not sure what it should be, in the paste is false but you are asking about true
When I said not the expected behavior, I meant that following the instructions on the docs, should lead to downloading the latest version