Hey @<1661904968040321024:profile|SpotlessOwl43> that's a great question!
how the metric should be saved, via report_single_value?
That's correct
what should I enter into the title and series fields in Project Dashboard?
The title should be "Summary" and series is the name of the single value you reported
Hey @<1644147961996775424:profile|HurtStarfish47> , you can use S3 for debug images specifically , see here: https://clear.ml/docs/latest/docs/references/sdk/logger/#set_default_upload_destination but the metrics (everything you report like scalars, single values, histograms, and other plots) are stored in the backend. The fact that you are almost running out of storage could be because of either t...
And the quota is not cumulative , otherwise we’d run out of storage with the oldest accounts 😃
Hey @<1678212417663799296:profile|JitteryOwl13> , just to make sure I understand, you want to make your imports inside the pipeline step function, and you're asking whether this will work correctly?
If so, then the answer is yes, it will work fine if you move the imports inside the pipeline step function
Which gives me an idea. Could you please remove the entrypoint from the docker image altogether and try again ?
Overriding the entrypoint in the image can lead to docker run/docker exec failing to work properly , because instead of a shell it will use your entrypoint to run everything
Hey @<1654294828365647872:profile|GorgeousShrimp11> can you abort all pending experiments that wait to be fetched from this queue and try again ? Off the top of my head it could be that the clearml-agent can’t pull the custom docker image. In general you should treat the docker images not as step definitions but only as the environment , hence setting the entrypoint is not necessary
For on-premise deployment with premium features we have the enterprise plan 😉
You can create a new dataset and specify the parent datasets as all the previous ones. Is that something that would work for you ?
Hello @<1604647689662763008:profile|PerfectSwan93> , I tend to agree with you , option one is the best given your use-case. If you keep the same name and project it will result in a version bump on the combined dataset, but it will not point to the previous combined dataset as a parent.
Yes, metrics can be saved in both steps and pipelines. As for project dashboards, I think as of now we don't support them in UI for pipelines. But what you can do instead is to run a special "reporting" Task that will query all the pipeline runs from a specific project, and with it you can then manually plot all the important information yourself.
To get the pipeline runs, please see documentation here: [None](https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelineco...
Hey @<1639074542859063296:profile|StunningSwallow12> what exactly do you mean by "training in production"? Maybe you can elaborate what kind of models too.
ClearML in general assigns a unique Model ID to each model, but if you need some other way of versioning, we have support for custom tags, and you can apply those programmatically on the model
Can you paste here the code of the pipeline that you're trying to run?
What happens if you set the new project name to f"{config.project_id}"
(notice, no .pipelines
)?
I can't quite reproduce your issue. From the traceback it seems it has something to do with torch.load
. I tried both your code snippet and creating a PyTorch model and then loading it, neither led to this error.
Could you provide a code snippet that is more like the code that is causing the issue? Also, can you please tell what clearml version are you using, and what is the Model URL in the UI? You can use the same filters in UI as the ones you used for Model.query_models
to find th...
If your git credentials are stored in the agent's clearml.conf
it means these are a HTTPS username/password pair. But you specified that the package should be downloaded via git ssh, for which I assume you don't have credentials in agent's environment. So it can't authenticate with SSH, and PIP doesn't know how to switch from git+ssh to git+https, because the downloading of the package is done by PIP not by clearml.
And there probably are auth errors if you scroll through the entire log ...
Hey @<1523705721235968000:profile|GrittyStarfish67> , we have just released 1.12.1 with a fix for this issue
Could you please run the misbehaving example, try to add a breakpoint in clearml/backend_interface/task/task.py
in Task.update_output_model
on the line with url = output_model.update_weights(
, and tell me what the value of model_path
is? In case you're using virtual environments, clearml library should be installed somewhere in <virtual env directory>/lib/python3.10/site-packages/clearml/
It happens due to an internal use of Dataset.get
, the larger the dataset, the more verbose it will be. We’ll fix this in the upcoming releases
Do you mean that you want your published experiments to be either “approved” or “not approved“ based on the presence of the attachments you mentioned ?
Hey @<1574207113163444224:profile|ShallowCoyote86> , what exactly do you mean by "depends on private_repo_b
"? Another question - after you push the changes, do you re-run script_a.py
?
Hey @<1564422650187485184:profile|ScaryDeer25> , we just released clearml==1.11.1rc2
which should solve the compatibility issues for lightning >= 2.0. Can you install it and check whether it solves your problem?
The issue may be related to the fact that right now we have some edge cases when working with lightning >= 2.0, we should have better support in the upcoming release
Can you please attach the full traceback here?
You can try to add the force_download=True
flag to .get()
to ignore the locally cached content. Let me know if it helps.
Glad I could be of help
Hey Sana, yes you can. When you open the link, check on the upper-right side the Task's menu bar, and you will notice that you can clone the shared task.
Hey @<1535069219354316800:profile|PerplexedRaccoon19> , yes it does. Take a look at this example, and let me know if there are any more questions: None
Hey @<1546303293918023680:profile|MiniatureRobin9> , to help narrow down the problem, could you try to manually download None and open it with pickle
?
Also, is your agent running on the same machine as your server and the example pipeline code? And what Python version are you using for all three components? Because I see there's a warning `could not locate requested Python version 3.11, reverting t...
Can you try checking if you have access to the model in the shared experiment?