Reputation
Badges 1
533 × Eureka!why does it deplete so fast?
We try to break up every thing into independent tasks and group them using a pipeline. The dependency on an agnet caused an unnecessary overhead since we just want to execute locally. It became a burden once new data scientists join the project and instead of just telling them "yeah, just execute this script" you have to now teach them about clearml, the role of agents, how to launch them, how they behave, how to remove them and stuff like that... things you want to avoid with data scientists
what should I paste here to diagnose it?
now I get this error in my Auto Scaler taskWarning! exception occurred: An error occurred (AuthFailure) when calling the RunInstances operation: AWS was not able to validate the provided access credentials Retry in 15 seconds
Makes sense
So I assume, trains assumes I have nvidia-docker installed on the agent machine?
Moreover, since I'm going to use Task.execute_remotely (and not through the UI) is there any code way to specify the docker image to be used?
Now I remind you that using the same credentials exactly, the auto scaler task could launch instances before
I assume it has nothing to do with my client version
no need to do it again, I ahve all the settings in place, I'm sure it's not a settings thing
So just to correct myself and sum up, the credentials for AWS are only in the cloud_credentials_*
and when looking at the running task, I still see the credentials
When I ran the clearml-task --name ... -project ... -script .... it failed saying not requiremetns was found
let me repay you with a nice trick
Nope, quite some time has passed since 🙂 I might be able to replicate it later... still battling with the pipeline to make it work
👍
Searched for "custom plotly" and "log plotly" in search, didn't thinkg about "report plotly"
Sorry I meant this link
https://azuremarketplace.microsoft.com/en-us/marketplace/apps/apps-4-rent.clearml-on-centos8
I'm using iteration = 0 at the moment, and I "choose" the max and it shows as a column... But the column is not the scalar name (because it cuts it and puts the > sign to signal max).
For the sake of comparing and sorting, it makes sense to log a scalar with a given name without the iteration dimension
I have them in two different places, once under Hyperparameters -> General
is this already available or only on github?
The only way to change it is to convert apiserver_conf to a dictionary object ( as_plain_ordered_dict() ) and edit it
