Reputation
Badges 1
25 × Eureka!SubstantialElk6 it seems the auto resolve of pytorch cuda failed,
What do you have in the "installed packages" section?
SubstantialElk6 could you post "Installed packaged" section under Execution of this specific Task?
SubstantialElk6 could you try with the latest (just released)?pip install clearml-agent==0.17.2
Then if possible, could you attach the full log of the agent's execution (Task->results->Console)
Hi CleanPigeon16
You need to be able access the machine running the agent, usually the default port will be 10022.
If you need further debug message, add --debug at the beginning of the clearml-session.clearml-session --debug ...
To get all the debug print, please upgrade to clearml-session==0.3.3
Oh if this is the case, then by all means push it into your Task's docker_setup_bash_script
It does not seem to have to be done after the git clone, the only part the I can see is setting the PYTHONPATH to the additional repo you are pulling, and that should work.
The main hurdle might be passing credentials to git, but if you are using SSH it should be transparent
wdyt?
Hi DeliciousBluewhale87
Hmm, good question.
Basically the idea is that if you have ingestion service on the pods (i.e. as part of the yaml template used by the k8s glue) you can specify to the glue what are the exposed ports, so it knows (1) what's the maximum of instances it can spin, e.g. one per port (2) it will set the external port number on the Task, so that the running agent/code will be aware of the exposed port.
A use case for it would be combing the clearml-session with the k8s gl...
And as far as I can see there is no mechanism installed to load other objects than the model file inside the Preprocess class, right?
Well actually this is possible, let's assume you have another Model that is part of the preprocessing, then you could have:
something like that should work
def preprocess(...)
if not getattr(self, "_preprocess_model):
self._preprocess_model = joblib.load(Model(model_id).get_weights())
Yes! I checked it should work (it checks if you have load(...) function on the preprocess class and if you do it will use it:
None
def load(local_file)
self._model = joblib.load(local_file_name)
self._preprocess_model = joblib.load(Model(hard_coded_model_id).get_weights())
Hi HollowFish37
I think I have good news for you, the clearml-agent is only communicating with the api endpoint, so as long as this is secure, you should be fine. Do notice that the default files server endpoint should be secure as well, as by default it will allow any upload/download
Hi @<1539055479878062080:profile|FranticLobster21>
hey, how do I use local files as dependencies?
You mean like a repository ?
Can I specify in task what local files do I use that should be packaged?
In a git repo?
Basically the agent can do two things, either replicate a single script or clone a git repo + uncommitted changes
I reached over 1M API calls in about one week using clearml-serving
Oh that makes sense now 🙂
If I remember correctly, adding an additional model to a signal clearml-serving instance should not actually change the number of API calls, they are mostly affected by the number of clearml-serving / containers and not in the number of models.
@<1523701868901961728:profile|ReassuredTiger98> what are you getting with:
nvidia-smi
And here:
ls -la /usr/local/
I might gave an idea, could you test with:
` from clearml import Task
Task._report_subprocess_enabled = False
...
real code here `
Well, in that case, just change the order it should solve it (I'll make sure we have that as the default:
conda_channels: ["pytorch", "conda-forge", "defaults", ]
It should solve the issue 🙂
@<1523701868901961728:profile|ReassuredTiger98> what do you have in the clearml.conf under "conda_channels" ?
Is this it ?
None
based on this:
https://clear.ml/docs/latest/docs/references/api/endpoints#post-debugping
" http://localhost:8080/debug.ping ”
btw: What'd the usage scenario ?
These paths are
pathlib.Path
. Would that be a problem?
No need to worry, it should work (i'm assuming "/src/clearml_evaluation/" actually exists on the remote machine, otherwise useless 🙂
You mean the job with the exact same arguments ?
do you have other arguments you are passing ?
Are you using Optuna / HBOB ?
I started running it again and it seems to have passed the phase where it failed last time
Yey!
Yes it is a common case....
I have the feeling ShinyLobster84 WackyRabbit7 you are not alone in this one 🙂 let me make sure we change the default value of Yes it is a common case
to False, so the code looks cleaner
I'm guessing the extra index URL can be a URL to the github repo of interest?
The extra index URL is exactly what you would be passing to pip install, meaning it has to comply to pypi artifactory api.
Make sense ?
Can you reproduce this behavior outside of lightning? or in a toy example (because I could not)
'config.pbtxt' could not be inferred. please provide specific config.pbtxt definition.
This basically means there is no configuration on how to serve the mode, i.e. size/type of lower (input) layer and output layer.
You can wither store the configuration on the creating Task, like is done here:
https://github.com/allegroai/clearml-serving/blob/b5f5d72046f878bd09505606ca1147d93a5df069/examples/keras/keras_mnist.py#L51
Or you can provide it as standalone file when registering the mo...
It reflects what is stored by Keras, so if Keras stores the best model this is what you get. BTW if you pass output_uri=True it will automatically upload the models
Task.add_requirements('.')
Should work