Reputation
Badges 1
84 × Eureka!@<1523701083040387072:profile|UnevenDolphin73> : If I do, what should I configure how?
@<1523701205467926528:profile|AgitatedDove14> , you wrote
- Components anyway need to be available when you define the pipeline controller/decorator, i.e. same codebase
No you an specify a different code base, see here:
Is the code in this "other" repo downloaded to the agent's machine? Or is the component's code pushed to the machine on which the repository is?
If the second case is true: How is the other machine (on which the other repo is lying on) turned into an agent?
@<1523701205467926528:profile|AgitatedDove14> Is it true that, when using the "pipeline from tasks" approach, my Python environment in which the pipeline is programmed, does not need to know any of the code with which the tasks have been programmed and still the respective pipeline would be executed just fine?
GrittyStarfish67 : Thanks! But how are those for ClearML vs MLRun? Granted, ClearML has a ~5 times more github stars than MLRun, but besides that: Both are from mid 2019 according to releases on git. I have not been in their slack and I know nothing about community adoption. (Btw, Kedro has twice as many stars than ClearML - even if it has far fewer feature, those that it does have, seem pretty well done.)
How would you compare those to ClearML?
@<1523701205467926528:profile|AgitatedDove14> : I am writing quite a bit of documentation on the topic of pipelines. I am happy to share the article here, once my questions are answered and we can make a pull request for the official documentation out of it.
@<1523701070390366208:profile|CostlyOstrich36>
My training outputs a model as a zip file. The way I save and load the zip file to make up my model is custom made (no library is directly used), because we invented the entire modelling ourselves. What I did so far:
output_model = OutputModel(task=..., config_dict={...}, name=f"...")
output_model.update_weights("C:\io__path\...", is_package=True)
and I am trying to load the model in a different Python process with
mymodel =...
@<1523701083040387072:profile|UnevenDolphin73> : A big point for me is to reuse/cache those artifacts/datasets/models that need to be passed between the steps, but have been produced by colleagues' executions at some earlier point. So for example, let the pipeline be A(a) -> B(b) -> C(c), where A,B,C are steps and their code, excluding configurations/parameters, and a,b,c are the configurations/parameters. Then I might have the situation, that my colleague ran the pipeline A(a) -> B(b) -> C(c...
KindChimpanzee37 , this time, I was away for a week π . I do not think, that I made the mistake you suggested. At the top of the script I wroteproject_name = 'RL/Urbansounds'
and then later
` self.original_dataset = Dataset.get(dataset_project=project_name, dataset_name='UrbanSounds example')
This will return the pandas dataframe we added in the previous task
self.metadata = Task.get_task(task_id=self.original_dataset.id).artifacts['metadata'].get() `
CostlyOstrich36 any ideas?
: What does
- The component code still needs to be self-composed (or, function component can also be quite complex)
Well it can address the additional repo (it will be automatically added to the PYTHONPATH), and you can add auxilary functions (as long as they are part of the initial pipeline script), by passing them to
helper_functions
mean? Is it not possible that I call code that is somewhere else on my local computer and/or in my code base? That makes thi...
KindChimpanzee37 , any idea π ?
No these are 3 different ways of building pipelines.
That is what I meant to say π , sorry for the confusion, @<1523701205467926528:profile|AgitatedDove14> .
@<1523701083040387072:profile|UnevenDolphin73> , your point is a strong one. What are clear situations in which pipelines can only be build from tasks, and not one of the other ways? An idea would be if the tasks are created from all kinds of - kind of - unrelated projects where the code that describes the pipeline does not ...
I think now that here the documentation means the usage of connect
.
Ah... if I run the same script not from PyCharm, but from the terminal, then it gets completed... puh...
@<1523701205467926528:profile|AgitatedDove14> In the documentation it warns about .close()
"Only call Task.close if you are certain the Task is not needed."
What does the documentation refer to? My understanding would be that if close the task within a program, I am not able to use the task object anymore as before and I need to retrieve it via query_tasks
to get it again. Is that correct?
@<1523701083040387072:profile|UnevenDolphin73> : I am not sure who you mean by "user"? I am not aware that we are building an app... π Do you mean a person that reruns the entire pipeline but with different parameters from the Web UI? But here, we are not able to let the "user" configure all those things.
Is there some other way - that does not require any coding - to build pipelines (I am not aware)?
Also, when I build pipelines via tasks, the (same) imports had to be done in each...
But still, in the web app the task is considered to be still "running". I am not sure what to do, so that the task is considered to be "completed".
Secondly, I do not understand this:
None says
Manually mark a Task as completed. This will close the running process and will change the Taskβs status to Completed (Use this function to close and change status of remotely executed tasks). To simply change the Taskβs status to completed, use task.close()
None says
Closes the current Task and cha...
Ok, I checked: A is terminated. This is not what I thought would happen and not what I intended with my documentation. I should clarify that.
AgitatedDove14 : Not sure: They also have the feature store (data management), as mentioned, which is pretty MLOps-y π . Also, they do have workflows ( https://docs.mlrun.org/en/latest/concepts/multi-stage-workflows.html ) and artifacts/model management ( https://docs.mlrun.org/en/latest/store/artifacts.html ) and serving ( https://docs.mlrun.org/en/latest/serving/serving-graph.html ).
It means "The syntax for the file name, folder name or volume label / disk is wrong" somthing along those lines. The [...] is the directory path to my project, which I opened in PyCharm and from which I run the commands in the Python Console.
GrittyStarfish67 : In terms of "has a good name" you literally mean the name or do you mean, they have a good reputation π ?
@<1537605940121964544:profile|EnthusiasticShrimp49> : The biggest advantage I see to split your code into pipeline components is caching. A little bit structuring your code, but I was told by the staff this should not one's main aim with ClearML components. What is your main take away for splitting your code into components?
My HPO on top of the pipeline is already working π I am currently experimenting on using the HPO in a (other) pipeline that creates two HPO steps (from the same funct...