Hey @<1644147961996775424:profile|HurtStarfish47> , you can use S3 for debug images specifically , see here: https://clear.ml/docs/latest/docs/references/sdk/logger/#set_default_upload_destination but the metrics (everything you report like scalars, single values, histograms, and other plots) are stored in the backend. The fact that you are almost running out of storage could be because of either t...
Hey @<1681836303299121152:profile|RoundElk14> , it seems you are using a self-hosted ClearML server. This error you're getting happens because your email is not configured in the server. Ask your admin to perform the following steps:
- [The admin] Go to Settings > Users & Groups > Users and click on "+ Add User" where they will be prompted to specify the user's email
- [The user] Once the admin confirms that they did step 1, the user should first Sign In with their email to the server
- [The...
Yes, you can do that. But it may make it harder to identify the task later on
Hey @<1681836314334334976:profile|GrotesqueSeaturtle83> , yes, it is possible to do so, but you must configure the docker --entrypoint
argument (as part of the docker_arguments
) and the docker image of for said task. In general this isn't a recommended approach. Rather than that, prefer a setup where your task code invokes the functionalities defined in other scripts that are pre-baked in the image.
See docker args here:
[None](https://clear.ml/docs/latest/docs/references/sdk/task/...
Hey @<1661904968040321024:profile|SpotlessOwl43> that's a great question!
how the metric should be saved, via report_single_value?
That's correct
what should I enter into the title and series fields in Project Dashboard?
The title should be "Summary" and series is the name of the single value you reported
And the quota is not cumulative , otherwise we’d run out of storage with the oldest accounts 😃
About the first question - yes, it will use the destination URI you set.
About the second point - did you archive or properly delete the experiments?
Ah, I think I understand. To execute a pipeline remotely you need to use None pipe.start()
not task.execute_remotely
. Do note that you can run tasks remotely without exiting the current process/closing the notebook, (see here the exit_process
argument None ) but you won't be able to return any values from this task....
Hey @<1639799308809146368:profile|TritePigeon86> , given that you want to retry on connection error, wouldn't it be easier to use retry_on_failure
from PipelineController
/ PipelineDecorator.pipeline
None ?
Yes, metrics can be saved in both steps and pipelines. As for project dashboards, I think as of now we don't support them in UI for pipelines. But what you can do instead is to run a special "reporting" Task that will query all the pipeline runs from a specific project, and with it you can then manually plot all the important information yourself.
To get the pipeline runs, please see documentation here: [None](https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelineco...
Hey @<1523701949617147904:profile|PricklyRaven28> , about the S3 loading issue. The path to the model in the artifact tab, is it an S3 bucket or a local path?
Hey @<1545216070686609408:profile|EnthusiasticCow4> , for requirements pointing to packages in git repositories you need to make sure that the environment the agent is running in has the valid credentials to access the repo. In your case ( git+ssh
) it means you need to have a pair of ssh keys, and the public key should be registered with the repo.
Hey @<1569858449813016576:profile|JumpyRaven4> , about your first point, what exactly is the question?
About your second point - you can try to manually save the final model and give it a proper file name, that way we will show it in the UI with the name you provided. Make sure to use xgboost.save_model
and not raw pickle.
For your final question , given that your models have customised code, I can suggest trying to use clearml.OutputModel
which will register the file you provide ...
If your git credentials are stored in the agent's clearml.conf
it means these are a HTTPS username/password pair. But you specified that the package should be downloaded via git ssh, for which I assume you don't have credentials in agent's environment. So it can't authenticate with SSH, and PIP doesn't know how to switch from git+ssh to git+https, because the downloading of the package is done by PIP not by clearml.
And there probably are auth errors if you scroll through the entire log ...
Then change from git+ssh
to git+https
Hello @<1604647689662763008:profile|PerfectSwan93> , I tend to agree with you , option one is the best given your use-case. If you keep the same name and project it will result in a version bump on the combined dataset, but it will not point to the previous combined dataset as a parent.
Thanks for pointing this out, we will need to update our documentation. Still, if you manually inspect the ~/clearml.conf
file you will see the available configurations
For on-premise deployment with premium features we have the enterprise plan 😉
The issue may be related to the fact that right now we have some edge cases when working with lightning >= 2.0, we should have better support in the upcoming release
That seems strange. Could you provide a short code snippet that reproduces your issue?
Hey @<1546303293918023680:profile|MiniatureRobin9> , to help narrow down the problem, could you try to manually download None and open it with pickle
?
Also, is your agent running on the same machine as your server and the example pipeline code? And what Python version are you using for all three components? Because I see there's a warning `could not locate requested Python version 3.11, reverting t...
Hey @<1529271085315395584:profile|AmusedCat74> , I may be wrong , but I think you can’t attach a gpu to an e2 instance , it should be at least an n1, no?
Are referring to the clearml-serving
project ?
Also, make sure you use Task.init
instead of task.init
Ah, I see now. There are a couple of ways to achieve this.
- You can enforce that the pipeline steps execute within a predefined docker image that has all these submodules - this is not very flexible, but doesn't require your clearml-agents to have access to your Git repository
- You can enforce that the pipeline steps execute within a predefined git repository, where you have all the code for these submodules - this is more flexible than option 1, but will require clearml-agents to have acce...
Do you know whether the agent VM/image has python 3.9 installed ? Also, you emphasised that this happens when setting the package manager to poetry, does it mean this issue doesn’t happen when leaving package manager settings to default values ?
Hey @<1523704757024198656:profile|MysteriousWalrus11> , given your use case, did you consider passing the path to the dataset? Like an address to an S3 bucket
Can you also tell what OS are you using? And when you mentioned that the clearml version: 1.5.1
did you mean the ClearML package or the clearml-agent
package? Because they are different
This is doing fine-tuning. Training a multi-billion parameter model from scratch would be economically unfeasible for most of existing enterprises
I think you can set the cuda version in the clearml.conf
, alternatively you can have the agent use a docker image with your required version of cuda instead of setting the environment directly on the machine