Reputation
Badges 1
25 × Eureka!SubstantialElk6 I know they have full permission control in the enterprise edition, if this is something you need I suggest you contact http://allegro.ai π
Ohh if this is the case, you might also consider using offline mode, so there is no need for backend
https://clear.ml/docs/latest/docs/guides/set_offline#setting-task-to-offline-mode
I see if this is the case try to set
'output_uri="file:///full/path/to/dir"'
Notice it has to have the full path there and the file:// prefix
I assume it is reported into TB, right ?
Hi GreasyPenguin14
- Did using auto_connect_frameworks={'pytorch': False} solved the issue? ( I imagine it did )
- Maybe we should have the option to have wildcard support so I will only auto log based on filename. Basically using auto_connect_frameworks={'pytorch': "model*.pt"} will only auto log the model files saved/logged , wdyt?
Hi DrabCockroach54
... and no logs for python script.
what do you mean by "no logs" , is it clearml logs? or k8s pod logs ?
PompousParrot44 the fundamental difference is that artifacts are uploaded manually (i.e. a user will specifically "ask" to upload an artifact), models are logged automatically and a user might not want them uploaded (imagine debugging sessions, or testing).
By adding the 'upload_uri' arguments, you can specify to trains that you want all models to be automatically uploaded (not just logged).
Now here is the nice thing, when running using the trains-agent, you can have:
Always upload the mod...
I think you are onto a good flow, quick iterations / discussions here, then if we need more support or an action-item then we can switch to GitHub. For example with feature requests we usually wait to see if different people find them useful, then we bump their priority internally, this is best done using GitHub Issues π
Only the dictionary keys are returned as the raw nested dictionary, but the values remain casted.
Using which function ? task.get_parameters_as_dict does not cast the values (the values themselves are stored as strings on the backend), only task.connect will cast the values automatically
some dependencies will sometimes require different pip versions.
none π maybe setuptools, but not pip version
(pip is just a utility to install packages, it will not be a dependency of one)
I... did not, ashamed to admit.
UnevenDolphin73 π I actually think you are correct, meaning I "think" that what you are asking is the low level logging (for example debug that usually is not printed to console) to also log? is that correct ?
Hi GrievingTurkey78
I'm assuming similar to https://github.com/pallets/click/
?
Auto connect and store/override all the parameters?
Or can it also be right after
Task.init()
?
That would work as well π
Thanks GrievingTurkey78 , this is exactly what I was looking for!
Any chance you can open a GitHub issue ( jsonargparse + lighting support) ?
I really want to make sure this issue is addressed π
BTW: this is only if jsonargparse is installed:
https://github.com/PyTorchLightning/pytorch-lightning/blob/368ac1c62276dbeb9d8ec0458f98309bdf47ef41/pytorch_lightning/utilities/cli.py#L33
DistressedGoat23 check this example:
https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.pyaSearchStrategy = RandomSearchIt will collect everything on the main Task
This is a curial point for using clearml HPO since comparing dozens of experiments in the UI and searching for the best is just not manageable.
You can of course do that (notice you can actually order them by scalars they report, and even do ...
Are you running the agent in docker mode? or venv mode ?
Can you manually ssh on port 10022 to the remote agent's machine ?ssh -p 10022 root@agent_ip_here
Let me check if we can reproduce it
ProudMosquito87 Just a few pointers on how we convert the TB histograms to awesome (but less accurate) 3D surfaces.
First I have to admit, I almost never use these histograms, maybe to detect a plateau of if something goes really wrong...
The 3D surface is basically grouping all the histograms and then bucketing them (I think the default is 50 buckets) so that you get a general feel of what's going on, not necessary a detailed view. Bottom line, you are correct, the TB is the source of truth...
Itβs only on this specific local machine that weβre facing this truncated download.
Yes that what the log says, make sense
Seems like this still doesnβt solve the problem, how can we verify this setting has been applied correctly?
hmm exec into the container? what did you put in clearml.conf?
I think the clearml-session CLI is missing the ability to add cutom port to the external address, does that make sense ?
I do expect it toΒ
pip
Β install though which doesnβt root access I think
Correct, it is installed on a venv (exactly for that).
It will not fail if the apt-get fails (only warnings)
Let me know if it worked
but out of curiosity, whats the point on doing a hyperparam search on the value of the loss on the last epoch of the experiment
The problem is that you might end up with global min that is really nice, but it was 3 epochs ago, and you have the last checkpoint ...
BTW, global min and last min should not be very diff if the model converge, wdyt?
based on this:
https://clear.ml/docs/latest/docs/references/api/endpoints#post-debugping
" http://localhost:8080/debug.ping β
btw: What'd the usage scenario ?
You can check the keras example, run it twice, on the second time it will continue from the previous checkpoint and you will have input and output model.
https://github.com/allegroai/clearml/blob/master/examples/frameworks/keras/keras_tensorboard.py
how to make sure it will traverse only current package?
Just making sure there is no bug in the process, if you call Task.init in your entire repo (serve/train) you end up with "installed packages" section that contains all the required pacakges for both use cases ?
I have separate packages for serving and training in a single repo. I donβt want serving requirements to be installed.
Hmm, it cannot "know" which is which, because it doesn't really trace all the import logs (this w...
Regrading the first direction, this was just pushed π
https://github.com/allegroai/clearml/commit/597a7ed05e2376ec48604465cf5ebd752cebae9c
Regrading the opposite direction:
That is a good question, I really like the idea of just adding another section named Datasets
SucculentBeetle7 should we do that automatically?
ElegantKangaroo44 I tried to reproduce the "services mode" issue with no success. If it happens again let me know maybe will better understand how it happened (i.e. the "master" trains-agent gets stuck for some reason)