Reputation
Badges 1
84 × Eureka!KindChimpanzee37 : Thank you so much! I asked follow up questions 🙂 .
I think now that here the documentation means the usage of connect.
I mean those, that you see in the screen shot. The difference in code is - at least for me - to write
- parameters_data = {'custom1': 'no', 'custom2': False}; parameters_data = task.connect(parameters_data , name='data')
- task.set_user_properties(custom1='no', custom2=False)

@<1523701083040387072:profile|UnevenDolphin73> : I am not sure who you mean by "user"? I am not aware that we are building an app... 😄 Do you mean a person that reruns the entire pipeline but with different parameters from the Web UI? But here, we are not able to let the "user" configure all those things.
Is there some other way - that does not require any coding - to build pipelines (I am not aware)?
Also, when I build pipelines via tasks, the (same) imports had to be done in each...
CostlyOstrich36 sure:[..]\urbansounds8k\venv\lib\site-packages\torchaudio\backend\utils.py:62: UserWarning: No audio backend is available. warnings.warn("No audio backend is available.") ClearML Task: overwriting (reusing) task id=[..] 2022-09-14 14:40:16,484 - clearml.Task - INFO - No repository found, storing script code instead ClearML results page: `
Traceback (most recent call last):
File "[..]\urbansounds8k\preprocessing.py", line 145, in <module>
datasetbuilder = DataSe...
But, I guess @<1523701070390366208:profile|CostlyOstrich36> wrote that in a different chat, right?
@<1537605940121964544:profile|EnthusiasticShrimp49> : The biggest advantage I see to split your code into pipeline components is caching. A little bit structuring your code, but I was told by the staff this should not one's main aim with ClearML components. What is your main take away for splitting your code into components?
My HPO on top of the pipeline is already working 🙂 I am currently experimenting on using the HPO in a (other) pipeline that creates two HPO steps (from the same funct...
KindChimpanzee37 , I ensured that the dataset_name is the same in get_data.py and preprocessing.py and that seemed to help. Then, I got the error RuntimeError: No audio I/O backend is available. , because of which I installed PySoundFile with pip; that helped. Weirdly enough then, the old id error came back. So, I re-ran get_data.py and then preprocessing.py - this time the id error was gone again. Instead, I got `raise TypeError("Invalid file: {0!r}".format(self.name))
TypeError:...
No these are 3 different ways of building pipelines.
That is what I meant to say 🙂 , sorry for the confusion, @<1523701205467926528:profile|AgitatedDove14> .
@<1523701083040387072:profile|UnevenDolphin73> , your point is a strong one. What are clear situations in which pipelines can only be build from tasks, and not one of the other ways? An idea would be if the tasks are created from all kinds of - kind of - unrelated projects where the code that describes the pipeline does not ...
@<1523701083040387072:profile|UnevenDolphin73> : How do you figure? In the past, my colleagues and I just shared the .zip file via email / MS Teams and it worked. So I don't think so.
: What does
- The component code still needs to be self-composed (or, function component can also be quite complex)
Well it can address the additional repo (it will be automatically added to the PYTHONPATH), and you can add auxilary functions (as long as they are part of the initial pipeline script), by passing them to
helper_functions
mean? Is it not possible that I call code that is somewhere else on my local computer and/or in my code base? That makes thi...
KindChimpanzee37 : Ok, will do. (More question from my side though. :-D) But I need to have pretty good idea before presenting our concept to the bosses.
Last point on component caching, what I suggest is actually providing users the ability to control the cache "function". Right now (a bit simplified but probably accurate), this is equivalent to hashing of the following dict:
{"code": "code here", "container": "docker image", "container args": "docker args", "hyper-parameters": "key/value"}
We could allow users to add a function that get's this dict and returns a new dict that will be used for hashing. This way we will e...
@<1523701083040387072:profile|UnevenDolphin73> : A big point for me is to reuse/cache those artifacts/datasets/models that need to be passed between the steps, but have been produced by colleagues' executions at some earlier point. So for example, let the pipeline be A(a) -> B(b) -> C(c), where A,B,C are steps and their code, excluding configurations/parameters, and a,b,c are the configurations/parameters. Then I might have the situation, that my colleague ran the pipeline A(a) -> B(b) -> C(c...
The first scenario is you standard "the code stays the same, the configuration changes" for the second step. Here, I want
The second and third scenario is "the configuration stays the same, the code changes", this is the case, e.g., if code is refactored, but effectively does the same as before.
@<1523701083040387072:profile|UnevenDolphin73> , you wrote
About the third scenario I'm not sure. If the configuration has changed, shouldn't the relevant steps (the ones where the configuration...
@<1523701205467926528:profile|AgitatedDove14> Is it true that, when using the "pipeline from tasks" approach, my Python environment in which the pipeline is programmed, does not need to know any of the code with which the tasks have been programmed and still the respective pipeline would be executed just fine?
GrittyStarfish67 : Thanks! But how are those for ClearML vs MLRun? Granted, ClearML has a ~5 times more github stars than MLRun, but besides that: Both are from mid 2019 according to releases on git. I have not been in their slack and I know nothing about community adoption. (Btw, Kedro has twice as many stars than ClearML - even if it has far fewer feature, those that it does have, seem pretty well done.)
@<1523701205467926528:profile|AgitatedDove14> , you wrote
- Components anyway need to be available when you define the pipeline controller/decorator, i.e. same codebase
No you an specify a different code base, see here:
Is the code in this "other" repo downloaded to the agent's machine? Or is the component's code pushed to the machine on which the repository is?
If the second case is true: How is the other machine (on which the other repo is lying on) turned into an agent?
KindChimpanzee37 , any idea 🙂 ?
I have already been trying to contribute (have three pull requests), but honestly I feel it is a bit weird, that I need to update a documentation about something I do not understand, while I actually try to evaluate if ClearML is the right tool for our company...
@<1523701087100473344:profile|SuccessfulKoala55> I think I might have made a mistake earlier - but not in the code I posted before. Now, I have the following situation:
- In my training Python process on my notebook I train the custom made model and put it on my harddrive as a zip file. Then I run the code
output_model = OutputModel(task=task, config_dict={...}, name=f"...")
output_model.update_weights(weights_filename=r"C:\path\to\mymodel.zip", is_package=True)
- I delete the "...
@<1523701087100473344:profile|SuccessfulKoala55> : I referenced this conversation in the issue None
How would you compare those to ClearML?
It is documented at None ... super deep in the code. If you don't know that output_uri in TASK's (!) init is relevant, you would never know...