Hey @<1523701083040387072:profile|UnevenDolphin73> what you're building here sounds like a useful tool. Let me understand what you're trying to achieve here, please correct me if I'm wrong:
- You want to create a set of
Step
classes with which you can define pipelines, that will be executed either locally or remotely. - The pipeline execution is triggered from a notebook.
- The
steps
are predefined transformations, the user normally won't have to create their own steps
Did I get all...
Hey @<1523701083040387072:profile|UnevenDolphin73> , sorry for late reply, I’m investigating now the issue that you mentioned that running a remote task with create_function_task
fails. I can’t quite reproduce it, can you please provide a complete runnable code snippet that fails like you just described
Can you paste here the code of the pipeline that you're trying to run?
Hey @<1582542029752111104:profile|GorgeousWoodpecker69> can you please tell whether you're running this jupyter notebook as part of a repo or as a standalone file, and what command did you run to launch your clearml-agent?
Do you mean that you want your published experiments to be either “approved” or “not approved“ based on the presence of the attachments you mentioned ?
I can't quite reproduce your issue. From the traceback it seems it has something to do with torch.load
. I tried both your code snippet and creating a PyTorch model and then loading it, neither led to this error.
Could you provide a code snippet that is more like the code that is causing the issue? Also, can you please tell what clearml version are you using, and what is the Model URL in the UI? You can use the same filters in UI as the ones you used for Model.query_models
to find th...
You can try to add the force_download=True
flag to .get()
to ignore the locally cached content. Let me know if it helps.
Hey @<1661904968040321024:profile|SpotlessOwl43> that's a great question!
how the metric should be saved, via report_single_value?
That's correct
what should I enter into the title and series fields in Project Dashboard?
The title should be "Summary" and series is the name of the single value you reported
Hey @<1564422650187485184:profile|ScaryDeer25> , we just released clearml==1.11.1rc2
which should solve the compatibility issues for lightning >= 2.0. Can you install it and check whether it solves your problem?
Hey @<1577468626967990272:profile|PerplexedDolphin99> , yes, this method call will help you limit the number of files you have in your cache, but not the total size of your cache. To be able to control the size, I’d recommend checking the ~/clearml.conf
file in the sdk.storage.cache
section
Could you please run the misbehaving example, try to add a breakpoint in clearml/backend_interface/task/task.py
in Task.update_output_model
on the line with url = output_model.update_weights(
, and tell me what the value of model_path
is? In case you're using virtual environments, clearml library should be installed somewhere in <virtual env directory>/lib/python3.10/site-packages/clearml/
To my knowledge, no. You'd have to create your own front-end and use the model served with clearml-serving via an API
Are referring to the clearml-serving
project ?
I think you can set the cuda version in the clearml.conf
, alternatively you can have the agent use a docker image with your required version of cuda instead of setting the environment directly on the machine
That is not specific enough. Can you show the code? And ideally also the console log of the pipeline
Hey Pawel, thanks for opening the PR on Ultralytics’ side. The full support should come from them, so if it’s missing for YOLOv8 it means they didn’t enable it. Still , you can try clearml-task
for auto-logging support in case of remote execution .
Also, I’d say you could easily have the possibility to use a ClearML dataset id as input to YOLOv8 with a few lines of code by basically downloading/ get
ing the dataset by id yourself and passing the path to it as input to the ultralytics...
Hey @<1535069219354316800:profile|PerplexedRaccoon19> , yes it does. Take a look at this example, and let me know if there are any more questions: None
This sounds like you don't have clearml installed in the ubuntu container. Either this, or your clearml.conf
in the container is not pointing to the server, as a result all information is missing.
I'd rather suggest you change the approach, and run a clearml-agent
setup with docker
and when you want to run YOLOv5 training you actually execute it remotely on the queue that the agent is listening to
Hey @<1654294828365647872:profile|GorgeousShrimp11> can you abort all pending experiments that wait to be fetched from this queue and try again ? Off the top of my head it could be that the clearml-agent can’t pull the custom docker image. In general you should treat the docker images not as step definitions but only as the environment , hence setting the entrypoint is not necessary
This is doing fine-tuning. Training a multi-billion parameter model from scratch would be economically unfeasible for most of existing enterprises
Hey @<1639799308809146368:profile|TritePigeon86> , given that you want to retry on connection error, wouldn't it be easier to use retry_on_failure
from PipelineController
/ PipelineDecorator.pipeline
None ?
Hey @<1671689458606411776:profile|StormySeaturtle98> we do support something called "Model Design" previews, basically an architecture description of the model, a la Caffe protobufs. None For example we store this info automatically with Keras
Hey @<1639074542859063296:profile|StunningSwallow12> what exactly do you mean by "training in production"? Maybe you can elaborate what kind of models too.
ClearML in general assigns a unique Model ID to each model, but if you need some other way of versioning, we have support for custom tags, and you can apply those programmatically on the model
About the first question - yes, it will use the destination URI you set.
About the second point - did you archive or properly delete the experiments?
Hello @<1533257278776414208:profile|SuperiorCockroach75> , thanks for asking. It’s actually unsupervised, because modern LLMs are all trained to predict next/missing words, which is an unsupervised method
The issue may be related to the fact that right now we have some edge cases when working with lightning >= 2.0, we should have better support in the upcoming release
Hey @<1681836303299121152:profile|RoundElk14> , it seems you are using a self-hosted ClearML server. This error you're getting happens because your email is not configured in the server. Ask your admin to perform the following steps:
- [The admin] Go to Settings > Users & Groups > Users and click on "+ Add User" where they will be prompted to specify the user's email
- [The user] Once the admin confirms that they did step 1, the user should first Sign In with their email to the server
- [The...
What happens if you comment or remove the pipe.set_default_execution_queue('default')
and use run_locally
instead of start_locally
?
Because in the current setup, you are basically asking to run the pipeline controller task locally, while the rest of the steps need to run on an agent machine. If you do the changes I suggested above, you will be able to run everything on your local machine.