Hey @<1523701083040387072:profile|UnevenDolphin73> , sorry for late reply, I’m investigating now the issue that you mentioned that running a remote task with create_function_task
fails. I can’t quite reproduce it, can you please provide a complete runnable code snippet that fails like you just described
To copy the artifacts please refer to docs here: None
This is doing fine-tuning. Training a multi-billion parameter model from scratch would be economically unfeasible for most of existing enterprises
Is this a jupyter notebook or something ? Can you download it properly as either a .ipynb or .py file?
Do you mean that you want your published experiments to be either “approved” or “not approved“ based on the presence of the attachments you mentioned ?
Hey @<1547390444877385728:profile|ThickSnake12> , how exactly do you access the artifact next time? Can you provide a code sample?
You can create a new dataset and specify the parent datasets as all the previous ones. Is that something that would work for you ?
Hey @<1564422650187485184:profile|ScaryDeer25> , we just released clearml==1.11.1rc2
which should solve the compatibility issues for lightning >= 2.0. Can you install it and check whether it solves your problem?
The issue may be related to the fact that right now we have some edge cases when working with lightning >= 2.0, we should have better support in the upcoming release
Ah, I think I understand. To execute a pipeline remotely you need to use None pipe.start()
not task.execute_remotely
. Do note that you can run tasks remotely without exiting the current process/closing the notebook, (see here the exit_process
argument None ) but you won't be able to return any values from this task....
What happens if you comment or remove the pipe.set_default_execution_queue('default')
and use run_locally
instead of start_locally
?
Because in the current setup, you are basically asking to run the pipeline controller task locally, while the rest of the steps need to run on an agent machine. If you do the changes I suggested above, you will be able to run everything on your local machine.
@<1637624992084529152:profile|GlamorousChimpanzee22> using localhost I'm assuming it's minio, is the s3 path you're trying to access something like this: None <some file or dir>
?
That is not specific enough. Can you show the code? And ideally also the console log of the pipeline
The line before the last in your code snippet above. pipe.start_locally
.
Hey @<1639074542859063296:profile|StunningSwallow12> what exactly do you mean by "training in production"? Maybe you can elaborate what kind of models too.
ClearML in general assigns a unique Model ID to each model, but if you need some other way of versioning, we have support for custom tags, and you can apply those programmatically on the model
Yes, you can do that. But it may make it harder to identify the task later on
Also, make sure you use Task.init
instead of task.init
For on-premise deployment with premium features we have the enterprise plan 😉
Hello @<1533257278776414208:profile|SuperiorCockroach75> , thanks for asking. It’s actually unsupervised, because modern LLMs are all trained to predict next/missing words, which is an unsupervised method
Hey @<1639799308809146368:profile|TritePigeon86> , given that you want to retry on connection error, wouldn't it be easier to use retry_on_failure
from PipelineController
/ PipelineDecorator.pipeline
None ?
Hey @<1554275802437128192:profile|CumbersomeBee33> , aborted usually means that someone manually stopped the pipeline or one of it's experiments. Can you provide us with the code you used to run it?
That seems strange. Could you provide a short code snippet that reproduces your issue?
I can't quite reproduce your issue. From the traceback it seems it has something to do with torch.load
. I tried both your code snippet and creating a PyTorch model and then loading it, neither led to this error.
Could you provide a code snippet that is more like the code that is causing the issue? Also, can you please tell what clearml version are you using, and what is the Model URL in the UI? You can use the same filters in UI as the ones you used for Model.query_models
to find th...
Hey @<1523704757024198656:profile|MysteriousWalrus11> , given your use case, did you consider passing the path to the dataset? Like an address to an S3 bucket
To my knowledge, no. You'd have to create your own front-end and use the model served with clearml-serving via an API
Hey @<1535069219354316800:profile|PerplexedRaccoon19> , yes it does. Take a look at this example, and let me know if there are any more questions: None
It happens due to an internal use of Dataset.get
, the larger the dataset, the more verbose it will be. We’ll fix this in the upcoming releases
Can you please attach the full traceback here?