Reputation
Badges 1
25 × Eureka!Then the dynamic gpu allocation is exactly what you need, I suggest talking to the sales ppl, I'm sure they can help. https://clear.ml/contact-us/
BTW: do notice to install the agent on the system python packages and Not on any venv.
Hi LazyTurkey38
, is it possible to have the agents keep a local version and only download the diff of the job commit to speed things up?
This is what it does, it has a local cached copy and it only pulls the latest changes
Hi PanickyMoth78
` torch.save(net.state_dict(), PATH) # auto-uploads to GCS
get all the models from the Task
output_models = Task.current_task().models["output"]
get the last one
last_model = output_models[-1]
set meta-data
last_model.set_metadata(key="my key", value="my value", type="str") `
I do expect it toΒ
pip
Β install though which doesnβt root access I think
Correct, it is installed on a venv (exactly for that).
It will not fail if the apt-get fails (only warnings)
Let me know if it worked
Seems the apiserver is out of connections, this is odd...
SuccessfulKoala55 do you have an idea ?
Is there a way to force clearml not to upload these models?
DistressedGoat23 is it uploading models or registering them? to disable both set auto_connect_frameworks https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk#automatic-logging
Their name only contain the task name and some unique id so how can i know to which exact training
You mean the models or the experiments being created ?
Hi OutrageousGiraffe8
Does anybody knows why this is happening and is there any workaround, e.g. how to manually report model?
What exactly is the error you are getting? and with which clearml version are you using?
Regrading manual Model reporting:
https://clear.ml/docs/latest/docs/fundamentals/artifacts#manual-model-logging
trained model class...
You mean the pytorch model object?
Hi @<1523702786867335168:profile|AdventurousButterfly15>
I am running cross_validation, training a bunch of models in a loop like this:
Use the wildcard or disable all together:
task = Task.init(..., auto_connect_frameworks={"joblib": False})
You can also do
task = Task.init(..., auto_connect_frameworks={"joblib": ["realmodelonly.pkl", ]})
BTW: I suspect this is the main issue:
https://github.com/python-poetry/poetry/issues/2179
DistressedGoat23 you are correct, since at the end this become a plotly object the extra_layout is for general purpose layout, but this specific entry is next to the data. Bottom line, can you open a github issue, so we do not forget to fix? In the mean time you can use the general plotly reporting as SweetBadger76 suggested
Thanks OutrageousGiraffe8
Any chance you can expand the example code to be a fully a reproducible toy code? (I would really like to make sure we fix it)
Hi @<1546303293918023680:profile|MiniatureRobin9>
Im not sure to understand the difference between a worker and an agent.
hmm we should probably make that clearer π
agent = the clearml-agent instance running on the machine
worker is the system term representing the instance of the agent
You can have one machine with multiple agents (i.e. multiple workers) running on it.
Does that make sense ?
LOL AlertBlackbird30 had a PR and pulled it π
Major release due next week after that we will put a a roadmap on the main GitHub page.
Anything specific you have in mind ?
Hey JoyousKoala59 , it seems the helm chart for the clearml server is due to be released tomorrow. My apologies for the confusion :(
I think this is the issue, it was search and replaced . The thing is I'm not sure the helm chart is updated to clearml. Let me check
Is task.parent something that could help?
Exactly π something like:# my step is running here the_pipeline_task = Task.get_task(task_id=task.parent)
So General would have created a General instead of Args?
yes,
This is a must, you have to specify the hyperparameters section you are referencing.
https://github.com/allegroai/clearml/blob/5a9155b2039413280f13dfded1121470c4c4323d/examples/pipeline/step2_data_processing.py#L21
This is actually:task.connect(args, name='General')
Basically there is no "random_state" only "General/random_state"
Make sense ?
off the top of my head, the self hosted is missing the autoscalers (there is an AWS CLI, but no UI or others), also missing a the HPO UI feature,
but you should just check the detailed table here: None
while I'm looking to upload local weights
Oh, so this is not "importing uploaded (exiting) model" but manually creating a Model.
The easiest way to do that is actually to create a Task for Model uploading, because the model itself will be uploaded to unique destination path, and this is built on top of the Task.
Does that make sense ?
We might need to change the default base docker image, but I remember it was there... Let me check again
it's saved in a
lightning_logs
folder where i started the script instead.
It should be saved there + it should upload it to your file server
Can you send the Task log? (this is odd)
agentservice...
Not related, the agent-services job is to run control jobs, such as pipelines and HPO control processes.
Hi @<1524922424720625664:profile|TartLeopard58>
Yes this is the default it is designed to serve multiple models and scale horizontally
hmm that is odd, it should have detected it, can you verify the issue still exists with the latest RC?pip3 install clearml-agent==1.2.4rc3
Hi JumpyPig73
import data from old experiments into the dashboard.
what do you mean by "old experiments" ?
if the file is untracked by git, it is not saved by clearml
Yep π
Does clearml-agent install the repo withΒ
pip install -e .
It is supported, but the path to the repo cannot be absolute (as it will probably be something else in the agent env)
You can add "git+ https://github.com ...." to the "installed packages" The root path of your repository is always added to the PYTHONPATH when the agents executes it, so in theory there is no need to install it wi...
Yes π documentation is being worked on ... Anyhow we will be uploading a new documentation site soon (hopefully in a week or so), putting it all on GitHub so it will be easier for the community to edit and add more