This sounds like a use case for the enterprise version of ClearML. In it you can set read/write permissions. Publishing is considered a "write", so you can limit who can do it. Another thing that might be useful in your scenario is to try using "Reports", and connect the "approved" experiments info to a report and then publish it. Here's a short video introducing reports .
By the way, please note that if the experiment/report/whatever is publis...
Hello @<1604647689662763008:profile|PerfectSwan93> , I tend to agree with you , option one is the best given your use-case. If you keep the same name and project it will result in a version bump on the combined dataset, but it will not point to the previous combined dataset as a parent.
For on-premise deployment with premium features we have the enterprise plan 😉
I see you want to use the services
queue for both the pipeline controller and pipeline steps, but you have only one worker/agent listening to this queue. In this case you need at least 2 agents listening to the services queue. Try spawning an additional agent that listens to this queue and let me know how it goes .
Could you please run the misbehaving example, try to add a breakpoint in clearml/backend_interface/task/task.py
in Task.update_output_model
on the line with url = output_model.update_weights(
, and tell me what the value of model_path
is? In case you're using virtual environments, clearml library should be installed somewhere in <virtual env directory>/lib/python3.10/site-packages/clearml/
I ’ m afraid serializing an entire class won’t be possible , but create_function_task
will send the entire environment for remote execution , so you can still access your code
It won't, for that you need full support from Ultralytics
Hello @<1523710243865890816:profile|QuaintPelican38> , could you try Dataset.get
ing an existent dataset and tell whether there are any errors or not?
Yes, works with GCP too
That's not that much. You can use the AWS autoscaler and provision a spot g4dn GPU instance with a bit more disk. This should cost you less than 50 cents an hour
Can you update the clearml version to latest (1.11.1) and see whether the issue is fixed?
Hey @<1523701949617147904:profile|PricklyRaven28> , about the S3 loading issue. The path to the model in the artifact tab, is it an S3 bucket or a local path?
Yes, that is correct. Btw, not it looks more like my clearml.conf
Hey @<1523701083040387072:profile|UnevenDolphin73> what you're building here sounds like a useful tool. Let me understand what you're trying to achieve here, please correct me if I'm wrong:
- You want to create a set of
Step
classes with which you can define pipelines, that will be executed either locally or remotely. - The pipeline execution is triggered from a notebook.
- The
steps
are predefined transformations, the user normally won't have to create their own steps
Did I get all...
Ah, I see now. There are a couple of ways to achieve this.
- You can enforce that the pipeline steps execute within a predefined docker image that has all these submodules - this is not very flexible, but doesn't require your clearml-agents to have access to your Git repository
- You can enforce that the pipeline steps execute within a predefined git repository, where you have all the code for these submodules - this is more flexible than option 1, but will require clearml-agents to have acce...
Hey @<1523701066867150848:profile|JitteryCoyote63> , could you please open a GH issue on our repo too, so that we can more effectively track this issue. We are working on it now btw
Hey @<1547390438648844288:profile|ScaryJellyfish75> , can you provide the whole code for the pipeline, and also mention what clearml version are you using?
Can you paste here the code of the pipeline that you're trying to run?
You can try to add the force_download=True
flag to .get()
to ignore the locally cached content. Let me know if it helps.
Hey @<1535069219354316800:profile|PerplexedRaccoon19> , yes it does. Take a look at this example, and let me know if there are any more questions: None
Do you know whether the agent VM/image has python 3.9 installed ? Also, you emphasised that this happens when setting the package manager to poetry, does it mean this issue doesn’t happen when leaving package manager settings to default values ?
Hey @<1644147961996775424:profile|HurtStarfish47> , you can use S3 for debug images specifically , see here: https://clear.ml/docs/latest/docs/references/sdk/logger/#set_default_upload_destination but the metrics (everything you report like scalars, single values, histograms, and other plots) are stored in the backend. The fact that you are almost running out of storage could be because of either t...
This is doing fine-tuning. Training a multi-billion parameter model from scratch would be economically unfeasible for most of existing enterprises
I can't quite reproduce your issue. From the traceback it seems it has something to do with torch.load
. I tried both your code snippet and creating a PyTorch model and then loading it, neither led to this error.
Could you provide a code snippet that is more like the code that is causing the issue? Also, can you please tell what clearml version are you using, and what is the Model URL in the UI? You can use the same filters in UI as the ones you used for Model.query_models
to find th...
Hey @<1523701083040387072:profile|UnevenDolphin73> , sorry for late reply, I’m investigating now the issue that you mentioned that running a remote task with create_function_task
fails. I can’t quite reproduce it, can you please provide a complete runnable code snippet that fails like you just described
Wait, my config looks a bit different, what clearml package version are you using?
To copy the artifacts please refer to docs here: None
This is the method you're looking for None . But make sure you have a model saved on disk before using it. And if you don't want the model to be deleted from disk after it, make sure to set auto_delete_file=False
Hey @<1546303293918023680:profile|MiniatureRobin9> , to help narrow down the problem, could you try to manually download None and open it with pickle
?
Also, is your agent running on the same machine as your server and the example pipeline code? And what Python version are you using for all three components? Because I see there's a warning `could not locate requested Python version 3.11, reverting t...