Reputation
Badges 1
25 × Eureka!Hm, one of the issues I have with this change is that now every dataset hat doesnβt have a semantic version cannot be loaded anymore
Okay we definitely need to solve that.
Any chance I can ask to open a github issue (just so we do not forget).
I will pass it quickly along so that we can maybe offer a fix in the next RC
You mean parameters of the pipeline? Is this a pipeline from Tasks or from function decorator?
I don't have the compose file, or at least can't seem to find it inΒ
/opt
you can manually take down all dockers with:docker ps
then docker stop <container id>
for each container id
Hi SkinnyPanda43
I realized that the params are not being saved anymore
Could you test with clearml==1.0.4 ?
Hi RoughTiger69
I like the direction this is taking, let me add some more complexity.
My thinking is that if we have βinput datasetsβ, I'd also like to be able to clone the Task and automagically change them (with the need to export the dataset_id as an argument), basically I'm thinking :train = Datasset.get('aabbcc1', name='train') valid = Datasset.get('aabbcc2', name='validation') custom = Datasset.get('aabbcc3', name='custom')
Then you end up with HyperParameter Section: "Input Datas...
Hi UnevenDolphin73
Is there an easy way to add a link to one of the tasks panels? (as an artifact, configuration, info, etc)?
You can add a link as an artifact, that is probably the easiest:tasl.upload_artifact(name="just link", artifact_object="
")
EDIT: And follow up regarding the dataset. As discussed somewhere previously, the datasets are now automatically moved to a hidden "sub-project" prefixed with
.datasets
. This creates several annoyances that I...
JitteryCoyote63 I think this only holds for the conda distribution.
(Actually quite interesting, I wonder what happens if you already installed cudatoolkit...)
AstonishingSeaturtle47 How would the code run without the sub-modules? And what is the problem we are trying to solve? (Because unfortunately there is no switch to disable it)
MinuteGiraffe30 if you are running the following command while your current directory is where you code is, what are you getting?
$ git ls-remote --get-url origin
Hi MinuteGiraffe30
Are you saying that when you are running you code locally with a gitea repository, cleamrl incorrectly adds a link to gitlab ?
So the issue is that you have two reference branches on the local git, one to gitlab one to gitea and it fails to understand which on is the correct remote ...
I wonder if "git ls-remote --get-url" will always work ?!
Hi CloudySwallow27
This error occurs randomly during training (in other words training does successfully start).
What's the cleamrl-agent version you are using, and the clearml version ?
Thanks SubstantialElk6 !
I believe an initial a fix was pushed π A full one (merging Task --env with k8s template) will be added soon
I'd prefer to use config_dict, I think it's cleaner
I'm definitely with you
Good news:
newΒ
best_model
Β is saved, add a tagΒ
best
,
Already supported, (you just can't see the tag, but it is there :))
My question is, what do you think would be the easiest interface to tell (post/pre) store, tag/mark this model as best so far (btw, obviously if we know it's not good, why do we bother to store it in the first place...)
ElegantKangaroo44 good question, that depends on where we store the score of the model itself. you can obviously parse the file name task.models['output'][-1].url
and retrieve the score from it. you can also store it on the model name task.models['output'][-1].name
and you can put it as general purpose blob o text on what is currently model.config_text
(for convenience you can have model parse a json like text and use model.config_dict
Could you download and send the entire log ?
Hmm... any idea on what's different with this one ?
I commented the upload_artifact at the end of the code and it finishes correctly now
upload_artifact caused the "failed" issue ?
Okay, could you try to run again with the latest clearml package from github?pip install -U git+
Yes
Are you trying to upload_artifact to a Task that is already completed ?
Okay I found the issue ( I think),
If the images are reported very quickly, it will "decide" you are about to override the previous one (i.e. 101 -> overwriting 0, which makes sense, the bug was it would disable the 101 from uploading and not the 0 π )
Test fix:
in /backend_interface/metrics/events.py
, line 292, change:
` last_count = self._get_metric_count(self.metric, self.variant, next=False)
if abs(self._count - last_count) > int(self._file_history_size):
...
But they are all running inside the same pod, correct ?
Hi GiganticTurtle0
dataset_task = Task.get_task(task_id=dataset.id)
Hmmm I think that when it gets the Task "output_uri" is not updated from the predefined Task (you can obviously set it again).
This seems like a bug that is unrelated to Datasets.
Basically any Task that you retrieve will default to the default ouput_uri (not the stored one)
BTW:
Task.add_requirements('tensorflow', '2.2') will make sure you get the specified version π
But I think this error has only appeared since I upgraded to version 1.1.4rc0
Hmm let me check something
So why is it trying to upload to "//:8081/files_server:" ?
What do you have in the trains.conf on the machine running the experiment ?
I guess only if autoscaling is used (one worker one machine)?
yes, basically depending on how you set autoscaling / k8s integration π