
Reputation
Badges 1
25 × Eureka!if so is there any doc/examples about this?
Good point, passing to docs π
https://github.com/allegroai/clearml/blob/51af6e833ddc5a8ba1efaaf75980f58616b25e85/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L123
I mean it is mentioned, but we should highlight it better
how to put or handle this configuration and where?
In your clearml.conf on the machine with the agent just add at the bottom of the file agent.venvs_cache.path=~/.clearml/venvs-cache
Ohh so you are saying you can store it properly, but only editing in the UI is limited ? (Maybe this is just a UI thing)
And your ~/clearml,conf ?
GiganticTurtle0
What do you mean by "reuse_last_task_id" ? each component is always a new Task generated (unless it is cached, and then it will reuse the previous executed)
What am I missing here?
Okay this seems correct...
Can you share both yaml files (server & serving) and env file?
We do upload the final model manually.
wait you said upload manually, and now you are saying "saved automatically", I'm confused.
Would it suffice to provide the git credentials ...
That should be enough, basically this is where they should be:
https://github.com/allegroai/clearml-agent/blob/0462af6a3d3ef6f2bc54fd08f0eb88f53a70724c/docs/clearml.conf#L18
Oh sorry, from the docstring, this will work:
` :param bool continue_last_task: Continue the execution of a previously executed Task (experiment)
.. note::
When continuing the executing of a previously executed Task,
all previous artifacts / models/ logs are intact.
New logs will continue iteration/step based on the previous-execution maximum iteration value.
For example:
The last train/loss scalar reported was iteration 100, the next report will b...
UnevenDolphin73 you mean the clearml-server
helm chart ?
The default cleanup service should work with S3 with a correctly configured clearml service agent if I understand the workings correctly.
Yes I think you are correct
I am referring to the UI.
In that case, no π . This is actually a backend server change (from the UI it should be relatively simple). Is this somehow a showstopper ?
Hi GrittyCormorant73
At the end everything goes through session.send, you can add a print there?
btw: why would you print all the requests? what are we debugging here?
Hi ContemplativeCockroach39
Seems like you are running the exact code as in the git repo:
Basically it points you to the exact repository https://github.com/allegroai/clearml and the script examples/reporting/pandas_reporting.py
Specifically:
https://github.com/allegroai/clearml/blob/34c41cfc8c3419e06cd4ac954e4b23034667c4d9/examples/reporting/pandas_reporting.py
That said, you might have accessed the artifacts before any of them were registered
My question was about the automatically uploaded models. Those that were uploaded by clearml client.
So there is a way to add a callback would that work?
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/binding/frameworks/init.py#L137def callback(_, model_info): model_info.name = "my new name" return model_info
ResponsiveCamel97
could you attach the full log?
Hi SharpDove45
whatΒ
Β suggested about how it fails on bad/missing credentials
Yes, this is correct, since you specifically set the hosts worst case you will end up with wrong credentials π
Okay this is very close to what the agent is building:
Could you start a new conda env,
then install cudatoolkit=11.1
then run:
conda env update -p <conda_env_path_here> --file the_env_yaml.yml
(fyi: once we have a solid idea here, please open a github issue on the feature request, I'll try to see if we can push it fwd for the next RC π )
Hi CluelessElephant89
Hi guys, if I spot issue with documentations, where should I post them?
The best way from our perspective PR the fix π this is why we put it on GitHub
It runs into the above error when I clone the task or reset it.
from here:
AssertionError: ERROR: --resume checkpoint does not exist
I assume the "internal" code state changed, and now it is looking for a file that does not exist, how would your code state change, in other words why would it be looking for the file only when cloning? could it be you put the state on the Task, then you clone it (i.e. clone the exact same dict, and now the newly cloned Task "thinks" it resuming ?!)
Verified, and already fixed with 1.0.6rc2
My bad I wrote refresh and then edited it to the correct "reload" π
Just making sure, pip package installed on your Conda env, correct?
however when I clone or reset said task after completion and then enqueue it again, I get the above error.
This part is somewhat confusing... There is no magic happening behind the scenes, cloning a Task and creating it, is basically the same ... Do you have a reference to the YOLOv5 code base itself, maybe I can figure out what's the issue?