Reputation
Badges 1
21 × Eureka!SuccessfulKoala55 thanks for letting us know. I actually had that issue starting from this morning.
thanks a lot! natan, really useful!
I know I drop a very general question, so I am sorry for the confusion.
However, I was reading online how other companies were dealing with data pipeline for handling data both in training & inference in a very standard, and monitorable way.
I saw people refer to something called feature store
, the first was probably uber ( source https://eng.uber.com/michelangelo-machine-learning-platform/ ) ... I know tecton is one provider of it (source https://www.tecton.ai/ ), but there might ...
Thank CostlyOstrich36 , maybe my question was a bit too broad. So imagine I have a set of tasks that has the same name, for instancetask_name = "Prepocess data"
Imagine this task run in every ML project I have. As an MLOps I would like to have some insights, like maybe know how long does it take for "Prepocess data" to run per project.
I can use the API, fetch them, use a for loop, get when the the task finished with ( Task.get_last_update
), but how do I get when it started? ...
Is it something on the most recent version of clearml
? I am using clearml==1.0.5
and it seems not to work ... so I guess this could be one reason to start about thinking upgrading ....
Thanks a lot Natan 🙂 ,
Is there a file or a piece of documentation where I can see the fields and arguments I can apply when I am using task_filter
?
ok this makes sense, but how do I filter this tasks using the parameters? CostlyOstrich36
Imagine I create this task with a given parameters --task = Task.init(project_name='examples', task_name='Hyper-parameters example') parameters= {"customer_id" : 100} parameters = task.connect(parameters)
When is time to filter all tasks with customer_id = 100
, what can I use?
I tried this, but is not working ...
` from clearml.backend_api.session.client import APIClient
client = APIClien...
Thank you very much 🙂 I guess I finished the free tips !
Not sure why this is not working, but will give it a try ! thanks anyway if you can't help 🙂response = client.tasks.get_all( order_by=["-last_update"], _all_={"pattern":"100", "fields":["hyperparams.custom_id.value"]} )
very nice 🙂
CostlyOstrich36 I was trying to filter things using some parameters, but I was not really able to filter them before fetching the a given task... can you send some syntax or examples I can look ?
nothing against clearml, but it is more a general practice I tend to have, where stable sometimes is better than newer!
I remember we had 1/2 issues when we upgrade at first place (when you release the 1.0.0 , and all the ..1, ..2, ..3 ) ... anyway I will see how to do that!
Hey CostlyOstrich36 Thank you very much for the answer!
You want the ability to create pipeline steps in the controller simply by specifying the source control parameters + packages, correct?
So I guess yes but there are 2 cases;
case A : my running application (whose source code is in ./git/repo_1/
) wants to launch a simple clearml task (python script with Task.init()
) whose source code is ./git/repo_2
and specify branch,
script
and ` requirments_...
I usually use this 2 syntax to get the lists of tasks:
option 1from clearml import Task custom_task_filter = {...} tasks_list = Task.get_tasks( task_filter=, task_name=name_custom_tasks )
option 2from clearml.backend_api.session.client import APIClient client = APIClient() tasks_list_via_api = client.tasks.get_all( ...)
In both case if I get the element from the list, I am not able to get when the task started. Where is info stored?
Like I am saying from your code that the clearml.task.Task.started()
does not return me the datetime. could I get the starting time info from some hidden attributes?
def started(self, ignore_errors=True, force=False): # type: (bool, bool) -> () """ The signal that this Task started. """ return self.send(tasks.StartedRequest(self.id, force=force), ignore_errors=ignore_errors)
yeah I will do that!! anyway as usual, thanks a lot Martin !! maybe it would be nice in future release to add the duration attribute in the return of the API, as it shows in UI 🙂
Thanks a lot AgitatedDove14 !!!
I have a question, is there a way I can filter tasks based on the started time?
I guess I can not do it direclty from the task_filter
(via Task.get_all
or via , Post task.get_all()
) ... so I can simply use your suggestion to get that!
I am completely honest, in the sense I am starting to get familiar with feature store
in general.
For use case I have something specific to my data pipeline, but I think it can be easily generalized. I could start to look into feast and see from there what we need!
I think it would be amazing Martin 😃 . You guys are providing an amazing service at clearml!
However if I could give a feedback for you to even improve it, I would suggest to start think of implementing some feature store
API. you know better then me that for productionization of ML data pipelines are very critical.
And I really hope I don't sound arrogant by giving this feedback 🙂
Actually AgitatedDove14 I think that Feast is great, but is missing the part of really transform the data with feature engineering ... so you may want either to develop that part on your side or look at other provider ...
is a feature-store-as-a-service. A big difference between Feast and Tecton is that Tecton supports transformations, so feature pipelines can be managed end-to-end within Tecton. Tecton is a managed offering, and a great feature store choice if you need production SLA...