@<1535793988726951936:profile|YummyElephant76>
Whenever I create any task the "uncommitted changes" are the contents of
ipykernel_launcher.py
, is there a way to make ClearML recognize that I'm running inside a venv?
This sounds like a bug, it should have the entire notebook there, no?
Yes, I mean trains-agent. Actually I am using 0.15.2rc0. But, I am using local files, I mean I clone trains and trains-agent repos and install them. Their versions are 0.15.2rc0
I see, that's why we get the git ref, not package version.
I'm assuming some package imports absl (the TF define package) and that's the reason you see the TF defines). Does that make sense?
Hi MortifiedCrow63
I have to admit this is very strange, I think the fact it works for the artifacts and not for the model is kind of a fluke ...
If you use "wait_on_upload" argument in the upload_artifact you end up with the same behavior. Even if uploaded in the background, the issue is still there, for me it was revealed the minute I limited the upload bandwidth to under 300kbps.It seems the internal GS timeout assumes every chunk should be uploaded in under 60 seconds.
The default chunk...
Hi LudicrousParrot69
I guess you are right this is not trivial distinction:
min: means we are looking for the the minimum value of a specific scalar. meaning 1.0, 0.5, 1.3 -> the optimizer will get these direct values and will optimize based on that
global min: means the optimizer is getting the minimum values of the specific scalar. With the same example: 1.0, 0.5, 1.3 -> the HPO optimizer gets 1.0, 0.5, 0.5
The same holds for max/global_max , make sense ?
Then the dynamic gpu allocation is exactly what you need, I suggest talking to the sales ppl, I'm sure they can help. https://clear.ml/contact-us/
well, it's only when adding a
- name
to the template
Nonetheless it should not break it 🙂
Okay, progress.
What are you getting when running the following from the git repo folder:git ls-remote --get-url origin
No, I mean actually compare using the UI, maybe the arguments are different or the "installed packages"
Out of curiosity, if Task flush worked, when did you get the error, at the end of the process ?
what if for some old tasks I get WARNING:root:Could not delete Task ID=a0908784a2a942c3812f947ec1f32c9f, 'Task' object has no attribute 'delete'? What's the best way of cleaning them?
This seems like an old SDK no?
EcstaticGoat95 I can see the experiment but I cannot access the notebook (I get Binder inaccessible
)
Is this the exact script as here? https://clearml.slack.com/archives/CTK20V944/p1636536308385700?thread_ts=1634910855.059900&cid=CTK20V944
EcstaticGoat95 any chance you have an idea on how to reproduce? (even 1 out of 6 is a good start)
SolidSealion72 EcstaticGoat95 I'm hoping the issue is now resolved 🤞
can you verify with ?pip install git+
Hi ReassuredOwl55
The easiest is to configure it as default output_uri in the clearml.conf of file the agent, wdyt?
https://github.com/allegroai/clearml-agent/blob/ebb955187dea384f574a52d059c02e16a49aeead/docs/clearml.conf#L430
@<1541954607595393024:profile|BattyCrocodile47> not restarting the docker, restarting the Docker service (on Mac it's an app, I think there is an option on the Docker app to do that)
Hi @<1688721797135994880:profile|ThoughtfulPeacock83>
the configuration vault parameters of a pipeline step with the add_function_step method?
The configuration vault are a per set at execution user/project/company .
What would be the value you need to override ? and what is the use case?
OH I see. I think you should use the environment variable to override it:
None
so add to the docker args something like
-e CLEARML_AGENT__AGENT__PACKAGE_MANAGER__POETRY_INSTALL_EXTRA_ARGS=
Hi @<1547028031053238272:profile|MassiveGoldfish6>
Is there a way for ClearML to simply save the model once training is done and to ignore the model checkpoints?
Yes, you can simple disable the auto logging of the model and manually save the checkpoint:
task = Task.init(..., auto_connect_frameworks={'pytorch': False}
...
task.update_output_model("/my/model.pt", ...)
Or for example, just "white-label" the final model
task = Task.init(..., auto_connect_frameworks={'pyt...
I wonder if the try/except approach would work for XGboost load, could we just try a few classes one after the other?
SubstantialElk6 when you say "Triton does not support deployment strategies" what exactly do you mean?
BTW: updated documentation already up here:
https://clear.ml/docs/latest/docs/clearml_serving/clearml_serving
btw:# in another process
How do you spin the subprrocess, is it with Popen ?
also what's the OS and python version you are using?
SolidSealion72 I'm able to reproduce, hurrah!
(and a fix is already being tested, I will keep you guys updated)
if you have an automation process, then you should have the Task object, no?
then you have task.id
What am I missing here?
It does work about 50% of the times
EcstaticGoat95 what do you mean by "work about 50%" ? do you mean the other 50% it hangs ?