
Reputation
Badges 1
84 × Eureka!I think now that here the documentation means the usage of connect
.
KindChimpanzee37 : Ok, will do. (More question from my side though. :-D) But I need to have pretty good idea before presenting our concept to the bosses.
@<1523701205467926528:profile|AgitatedDove14> : Wait, so, if a task is initialized in process A and I call mark_completed
in a process B, which process is terminated? A or B?
Secondly, I do not understand this:
None says
Manually mark a Task as completed. This will close the running process and will change the Task’s status to Completed (Use this function to close and change status of remotely executed tasks). To simply change the Task’s status to completed, use task.close()
None says
Closes the current Task and cha...
: What does
- The component code still needs to be self-composed (or, function component can also be quite complex)
Well it can address the additional repo (it will be automatically added to the PYTHONPATH), and you can add auxilary functions (as long as they are part of the initial pipeline script), by passing them to
helper_functions
mean? Is it not possible that I call code that is somewhere else on my local computer and/or in my code base? That makes thi...
Also, I could not find any larger examples on github about Model, InputModel, or OutputModel. It's kind of difficult to build a PoC this way... 😅
No these are 3 different ways of building pipelines.
That is what I meant to say 🙂 , sorry for the confusion, @<1523701205467926528:profile|AgitatedDove14> .
@<1523701083040387072:profile|UnevenDolphin73> , your point is a strong one. What are clear situations in which pipelines can only be build from tasks, and not one of the other ways? An idea would be if the tasks are created from all kinds of - kind of - unrelated projects where the code that describes the pipeline does not ...
As far as I understand, the workflow is like this. I define some model. Then I register it as an OutputModel. Then I train it. During training I save snapshots (not idea how, though) and then I save the final model when training is finished. This way the Model is a) connected to the task and b) available in the model store of ClearML.
Later, in a different task, I can load an already trained model with InputModel. This InputModel is read-only (regarding the ClearML model store), but I can ma...
@<1523701083040387072:profile|UnevenDolphin73> : A big point for me is to reuse/cache those artifacts/datasets/models that need to be passed between the steps, but have been produced by colleagues' executions at some earlier point. So for example, let the pipeline be A(a) -> B(b) -> C(c), where A,B,C are steps and their code, excluding configurations/parameters, and a,b,c are the configurations/parameters. Then I might have the situation, that my colleague ran the pipeline A(a) -> B(b) -> C(c...
Last point on component caching, what I suggest is actually providing users the ability to control the cache "function". Right now (a bit simplified but probably accurate), this is equivalent to hashing of the following dict:
{"code": "code here", "container": "docker image", "container args": "docker args", "hyper-parameters": "key/value"}
We could allow users to add a function that get's this dict and returns a new dict that will be used for hashing. This way we will e...
@<1523701070390366208:profile|CostlyOstrich36> : After more playing around, it seems that ClearML Server does not store the models or artifacts itself. These are stored somewhere else (e.g., AWS S3-bucket) or on my local machine and ClearML Server is only storing configuration parameters and previews (e.g., when the artifact is a pandas dataframe). Is that right? Is there a way to save the models completely on the ClearML server?
@<1523701083040387072:profile|UnevenDolphin73> : I am not sure who you mean by "user"? I am not aware that we are building an app... 😄 Do you mean a person that reruns the entire pipeline but with different parameters from the Web UI? But here, we are not able to let the "user" configure all those things.
Is there some other way - that does not require any coding - to build pipelines (I am not aware)?
Also, when I build pipelines via tasks, the (same) imports had to be done in each...
"using your method you may not reach the best set of hyperparameters."
Of course you are right. It is an efficiency trade-off of speed vs effectiveness. Whether this is worth it or not depends on the use-case. Here it is worth it, because the performance of the modelling is not sensitive to the parameter we search for first. Being in the ball-park is enough. And, for the second set of parameters, we need to do a full grid search (the parameters are booleans and strings); thus, this wo...
Thank you I found the error.myPar = task.connect(myPar, name='from TaskParameters')
is required.
@<1523701205467926528:profile|AgitatedDove14> : "does that make sense ?" Not really.
"you do not need to automatically Add/Log/Track things into the Task in the current process." - I do not need to automatically do [...]? You mean I can do it automatically, but alternatively I can do it manually? Do you mean I use close
within a process to prevent automatic logging/adding/tracking? But, as far as I know, after I used close
I am not able to log etc. manually either. So...
"Mark...
@<1523701070390366208:profile|CostlyOstrich36> , I am build a PoC, evaluating if we should use ClearML for our entire ML team and go Scale or Enterprise pricing. For that I need to know all/most capabilities and concepts of ClearML to see if ClearML is future-proof.
TL;DR: difficult to narrow it down, but we (amongst other things), we need a model store
Here is my code:
from clearml import Task, TaskTypes
from clearml.task_parameters import TaskParameters, param, percent_param
class MyParams(TaskParameters):
iterations = param(
type=int,
desc="Number of iterations to run",
range=(0, 100000),
)
target_accuracy = percent_param(
desc="The target accuracy of the model",
)
myPar = MyParams(iterations=1000, target_accuracy=0.95)
parameters_to_track1 = {'var1': 'a', 'hyper_par': 1}
parameter...
I expect either 'var1' to be 'b' or - better - there to be log of the change, so that I would be able to see how the value changed over time.
KindChimpanzee37 , I ensured that the dataset_name is the same in get_data.py and preprocessing.py and that seemed to help. Then, I got the error RuntimeError: No audio I/O backend is available.
, because of which I installed PySoundFile
with pip; that helped. Weirdly enough then, the old id error came back. So, I re-ran get_data.py and then preprocessing.py - this time the id error was gone again. Instead, I got `raise TypeError("Invalid file: {0!r}".format(self.name))
TypeError:...
CostlyOstrich36 sure:[..]\urbansounds8k\venv\lib\site-packages\torchaudio\backend\utils.py:62: UserWarning: No audio backend is available. warnings.warn("No audio backend is available.") ClearML Task: overwriting (reusing) task id=[..] 2022-09-14 14:40:16,484 - clearml.Task - INFO - No repository found, storing script code instead ClearML results page:
`
Traceback (most recent call last):
File "[..]\urbansounds8k\preprocessing.py", line 145, in <module>
datasetbuilder = DataSe...
KindChimpanzee37 : First I went to the dataset and clicked on "Task information ->" in the right bottom corner of the "VERSION INFO". I supposed that is the same as what you meant with "right click on more information"? Because I did not find any option to "right click on more information". The "Task information ->" leads me to a view in the experiment manager. I posted the two screen shots.
PS: It is weird to me that the datamanager leads me to the experiment manager, specifically an experi...
KindChimpanzee37 , this time, I was away for a week 🙂 . I do not think, that I made the mistake you suggested. At the top of the script I wroteproject_name = 'RL/Urbansounds'
and then later
` self.original_dataset = Dataset.get(dataset_project=project_name, dataset_name='UrbanSounds example')
This will return the pandas dataframe we added in the previous task
self.metadata = Task.get_task(task_id=self.original_dataset.id).artifacts['metadata'].get() `
I have already been trying to contribute (have three pull requests), but honestly I feel it is a bit weird, that I need to update a documentation about something I do not understand, while I actually try to evaluate if ClearML is the right tool for our company...
@<1523701087100473344:profile|SuccessfulKoala55> : That is the link I posted as well. But this should be mentioned also at places where it is about about the external or non-external storage. Also it should be mentioned everywhere we talk about models or artifacts etc. Not necessarily in details, but at least with a sentence and a link.
It is documented at None ... super deep in the code. If you don't know that output_uri
in TASK's (!) init is relevant, you would never know...
@<1523701083040387072:profile|UnevenDolphin73> : Thanks, but it does not mention the File Storage of "ClearML Hosted Server".
@<1523701083040387072:profile|UnevenDolphin73> : I see. I did not make the connection that output_uri=True
is what I was missing. I thought this was the default. But the default is actually "None", which is different than "True".