 
			Reputation
Badges 1
84 × Eureka!@<1523701205467926528:profile|AgitatedDove14> : "does that make sense ?" Not really.
"you do not need to  automatically  Add/Log/Track things into the Task in the current process." - I do not  need  to  automatically  do [...]? You mean I can do it automatically, but alternatively I can do it manually? Do you mean I use  close  within a process to prevent automatic logging/adding/tracking? But, as far as I know, after I used  close  I am not able to log etc. manually either. So...
"Mark...
It is documented at  None  ... super deep in the code. If you don't know that  output_uri in TASK's (!) init is relevant, you would never know...
@<1523701083040387072:profile|UnevenDolphin73>
@<1523701070390366208:profile|CostlyOstrich36> : After more playing around, it seems that ClearML Server does not store the models or artifacts itself. These are stored somewhere else (e.g., AWS S3-bucket) or on my local machine and ClearML Server is only storing configuration parameters and previews (e.g., when the artifact is a pandas dataframe). Is that right? Is there a way to save the models completely on the ClearML server?
I have already been trying to contribute (have three pull requests), but honestly I feel it is a bit weird, that I need to update a documentation about something I do not understand, while I actually try to evaluate if ClearML is the right tool for our company...
Do you mean "exactly" as in "you finally got it" or in the sense of "yes, that was easy to miss"?
@<1523701070390366208:profile|CostlyOstrich36>
My training outputs a model as a zip file. The way I save and load the zip file to make up my model is custom made (no library is directly used), because we invented the entire modelling ourselves. What I did so far:
output_model = OutputModel(task=..., config_dict={...}, name=f"...")
output_model.update_weights("C:\io__path\...", is_package=True)
and I am trying to load the model in a different Python process with
mymodel =...AgitatedDove14 : Not sure: They also have the feature store (data management), as mentioned, which is pretty MLOps-y 🙂 . Also, they do have workflows ( https://docs.mlrun.org/en/latest/concepts/multi-stage-workflows.html ) and artifacts/model management ( https://docs.mlrun.org/en/latest/store/artifacts.html ) and serving ( https://docs.mlrun.org/en/latest/serving/serving-graph.html ).
I just see the website that I linked to. I am not sure what is meant by "python environment". I cannot make a screen shot, because I do not know where to look for this in the first place.
Last point on component caching, what I suggest is actually providing users the ability to control the cache "function". Right now (a bit simplified but probably accurate), this is equivalent to hashing of the following dict:
{"code": "code here", "container": "docker image", "container args": "docker args", "hyper-parameters": "key/value"}
We could allow users to add a function that get's this dict and returns a new dict that will be used for hashing. This way we will e...
@<1523701205467926528:profile|AgitatedDove14> : I am writing quite a bit of documentation on the topic of pipelines. I am happy to share the article here, once my questions are answered and we can make a pull request for the official documentation out of it.
@<1523701205467926528:profile|AgitatedDove14> : In general: If I do  not  build a package out of my local repository/project , I cannot reference anything
from the local project/repository directly, right? I  must  make a package out of it,  or  I must reference it with the  repo  argument,  or  I must reference respective functions using the  helper_functions  argument. Did I get this right?
@<1523701083040387072:profile|UnevenDolphin73> : A big point for me is to reuse/cache those artifacts/datasets/models that need to be passed between the steps, but have been produced by colleagues' executions at some earlier point. So for example, let the pipeline be A(a) -> B(b) -> C(c), where A,B,C are steps and their code, excluding configurations/parameters, and a,b,c are the configurations/parameters. Then I might have the situation, that my colleague ran the pipeline A(a) -> B(b) -> C(c...
@<1523701205467926528:profile|AgitatedDove14> Is it true that, when using the "pipeline from tasks" approach, my Python environment in which the pipeline is programmed, does not need to know any of the code with which the tasks have been programmed and still the respective pipeline would be executed just fine?
@<1523701083040387072:profile|UnevenDolphin73> : No, I love it ❤ . Now, I just have to read everything 😄 .
@<1523701205467926528:profile|AgitatedDove14>  In the documentation it warns about  .close()  "Only call Task.close if you are certain the Task is not needed."
What does the documentation refer to? My understanding would be that if close the task within a program, I am not able to use the task object anymore as before and I need to retrieve it via  query_tasks  to get it again. Is that correct?
: What does
- The component code still needs to be self-composed (or, function component can also be quite complex)
Well it can address the additional repo (it will be automatically added to the PYTHONPATH), and you can add auxilary functions (as long as they are part of the initial pipeline script), by passing them to
helper_functions
mean? Is it not possible that I call code that is somewhere else on my local computer and/or in my code base? That makes thi...
Also, I could not find any larger examples on github about Model, InputModel, or OutputModel. It's kind of difficult to build a PoC this way... 😅
I am running it in the Python Console in PyCharm with Task.init. I get with the log:
ClearML Task: overwriting (reusing) task id=dfa2dff538d54c18ad97ea1593cbd357
2023-02-14 13:06:44,336 - clearml.Task - WARNING - Failed auto-detecting task repository: [WinError 123] Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch: '[...]\<input>'
ClearML results page:  [None](https://app.clear.ml/projects/9acc061c880344a881790461a4baa837/experiments/dfa2dff538d54c1...
@<1523701205467926528:profile|AgitatedDove14> : Wait, so, if a task is initialized in process A and I call  mark_completed  in a process B, which process is terminated? A or B?
@<1523701070390366208:profile|CostlyOstrich36> : Thanks, where can I find more information on ClearML's model repository. I hardly find any in the documentation.
Also, that leaves the question open, what  Model  is for. I described how I understand, the workflow should look like, but my question remains open...
@<1523701070390366208:profile|CostlyOstrich36> , I am build a PoC, evaluating if we should use ClearML for our entire ML team and go Scale or Enterprise pricing. For that I need to know all/most capabilities and concepts of ClearML to see if ClearML is future-proof.
TL;DR: difficult to narrow it down, but we (amongst other things), we need a model store
No these are 3 different ways of building pipelines.
That is what I meant to say 🙂 , sorry for the confusion, @<1523701205467926528:profile|AgitatedDove14> .
@<1523701083040387072:profile|UnevenDolphin73> , your point is a strong one. What are clear situations in which pipelines can only be build from tasks, and not one of the other ways? An idea would be if the tasks are created from all kinds of - kind of - unrelated projects where the code that describes the pipeline does not ...
>pip show clearmlWARNING: Ignoring invalid distribution -upyterlab (c:\users\...\lib\site-packages)WARNING: Ignoring invalid distribution -illow (c:\users\...\lib\site-packages)Name: clearmlVersion: 1.6.4Summary: ClearML - Auto-Magical Experiment Manager, Version Control, and MLOps for AIHome-page: None 
`Auth...
@<1523701087100473344:profile|SuccessfulKoala55> I think I might have made a mistake earlier - but not in the code I posted before. Now, I have the following situation:
- In my training Python process on my notebook I train the custom made model and put it on my harddrive as a zip file. Then I run the code
output_model = OutputModel(task=task, config_dict={...}, name=f"...")
output_model.update_weights(weights_filename=r"C:\path\to\mymodel.zip", is_package=True)
- I delete the "...
"using your method you may not reach the best set of hyperparameters."
Of course you are right. It is an efficiency trade-off of speed vs effectiveness. Whether this is worth it or not depends on the use-case. Here it is worth it, because the performance of the modelling is not sensitive to the parameter we search for first. Being in the ball-park is enough. And, for the second set of parameters, we need to do a full grid search (the parameters are booleans and strings); thus, this wo...