I see you want to use the services
queue for both the pipeline controller and pipeline steps, but you have only one worker/agent listening to this queue. In this case you need at least 2 agents listening to the services queue. Try spawning an additional agent that listens to this queue and let me know how it goes .
Can you update the clearml version to latest (1.11.1) and see whether the issue is fixed?
Can you please attach the code for the pipeline?
clearml-data
also supports glob patterns, so if you have your dataset files in the same directory as the experiment code, you can do something like clearml-data add --files *.csv
and only add the CSV files.
There's no .gitignore-like functionality because clearml-data
is not meant to track everything, and you need to be deliberate in what exactly you're adding. Hope this clarifies things.
What happens if you comment or remove the pipe.set_default_execution_queue('default')
and use run_locally
instead of start_locally
?
Because in the current setup, you are basically asking to run the pipeline controller task locally, while the rest of the steps need to run on an agent machine. If you do the changes I suggested above, you will be able to run everything on your local machine.
Ok, then launch an agent using clearml-agent daemon --queue default
that way your steps will be sent to the agent for execution. Note that in this case, you shouldn't change your code snippet in any way.
The line before the last in your code snippet above. pipe.start_locally
.
Hey @<1678212417663799296:profile|JitteryOwl13> , just to make sure I understand, you want to make your imports inside the pipeline step function, and you're asking whether this will work correctly?
If so, then the answer is yes, it will work fine if you move the imports inside the pipeline step function
Hello @<1604647689662763008:profile|PerfectSwan93> , I tend to agree with you , option one is the best given your use-case. If you keep the same name and project it will result in a version bump on the combined dataset, but it will not point to the previous combined dataset as a parent.
Hey @<1661904968040321024:profile|SpotlessOwl43> that's a great question!
how the metric should be saved, via report_single_value?
That's correct
what should I enter into the title and series fields in Project Dashboard?
The title should be "Summary" and series is the name of the single value you reported
Yes, metrics can be saved in both steps and pipelines. As for project dashboards, I think as of now we don't support them in UI for pipelines. But what you can do instead is to run a special "reporting" Task that will query all the pipeline runs from a specific project, and with it you can then manually plot all the important information yourself.
To get the pipeline runs, please see documentation here: [None](https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelineco...
Hey @<1639799308809146368:profile|TritePigeon86> , given that you want to retry on connection error, wouldn't it be easier to use retry_on_failure
from PipelineController
/ PipelineDecorator.pipeline
None ?
Hey @<1547390444877385728:profile|ThickSnake12> , how exactly do you access the artifact next time? Can you provide a code sample?
Also, make sure you use Task.init
instead of task.init
Glad I could be of help
That's not that much. You can use the AWS autoscaler and provision a spot g4dn GPU instance with a bit more disk. This should cost you less than 50 cents an hour
Yes, works with GCP too
Hey Yasir, to use tensorflow prefetch your data needs to be (1) chunked and (2) stored on some server/bucket/network-attached FS. If both conditions are not satisfied, TF prefetch won't help you.
How large is the dataset we're talking about?
It happens due to an internal use of Dataset.get
, the larger the dataset, the more verbose it will be. We’ll fix this in the upcoming releases
Hey @<1681836303299121152:profile|RoundElk14> , it seems you are using a self-hosted ClearML server. This error you're getting happens because your email is not configured in the server. Ask your admin to perform the following steps:
- [The admin] Go to Settings > Users & Groups > Users and click on "+ Add User" where they will be prompted to specify the user's email
- [The user] Once the admin confirms that they did step 1, the user should first Sign In with their email to the server
- [The...
Hey @<1523705721235968000:profile|GrittyStarfish67> , we have just released 1.12.1 with a fix for this issue
Wait, my config looks a bit different, what clearml package version are you using?
Hey @<1546303293918023680:profile|MiniatureRobin9> , to help narrow down the problem, could you try to manually download None and open it with pickle
?
Also, is your agent running on the same machine as your server and the example pipeline code? And what Python version are you using for all three components? Because I see there's a warning `could not locate requested Python version 3.11, reverting t...
What happens if you set the new project name to f"{config.project_id}"
(notice, no .pipelines
)?
Hey @<1523701066867150848:profile|JitteryCoyote63> , could you please open a GH issue on our repo too, so that we can more effectively track this issue. We are working on it now btw
Do you know whether the agent VM/image has python 3.9 installed ? Also, you emphasised that this happens when setting the package manager to poetry, does it mean this issue doesn’t happen when leaving package manager settings to default values ?
Hey @<1523701949617147904:profile|PricklyRaven28> , about the S3 loading issue. The path to the model in the artifact tab, is it an S3 bucket or a local path?
Is this a jupyter notebook or something ? Can you download it properly as either a .ipynb or .py file?
That is not specific enough. Can you show the code? And ideally also the console log of the pipeline