
Reputation
Badges 1
25 × Eureka!I double checked with the guys this issue was fixed in 1.14 (of clearml server). It should be released tomorrow / weekend
YummyWhale40 from the code snippet, it seems like the argument is passed.
"reuse_last_task_id=True" is the default, and it means that if the previous run of the task did not create any artifacts/models and was executed 72 hours ago (configurable), The Task will be reset (i.e. all logs cleared) and will be reused in the current run.
self.task.upload_artifact('trend_step', self.trend_step + 1)
Out of curiosity why would every request generate an artifact ? Wouldn't it be better to report as part of the statistics ?
What would be the size / type of the matrix X
(i.e. np.size / np.dtype) ?
can you tell me what the serving example is in terms of the explanation above and what the triton serving engine is,
Great idea!
This line actually creates the control Task (2)clearml-serving triton --project "serving" --name "serving example"
This line configures the control Task (the idea is that you can do that even when the control Task is already running, but in this case it is still in draft mode).
Notice the actual model serving configuration is already stored on the crea...
Hi HandsomeCrow5 hmm interesting use case,
we have seen html reports as artifacts, then you can press "download" and it should open in another tab, what would you expect on "debug samples" ?
ShaggyHare67 I'm just making sure I understand the setup:
First "manual" run of the base experiment. It creates an experiment in the system, you see all the hyper parameters under General section. trains-agent
running on a machine HPO example is executed with the above HP as optimization paamateres HPO creates clones of the original experiment, with different configurations (verified in the UI) trains-agent executes said experiments, aand they are not completed.But it seems the paramete...
I see in the UI are 5 drafts
What's the status of these 5 experiments? draft ?
.I am using pipeline from tasks method and not pipeline from decorator.
Wait I'm confused nowm if this is a pipeline from Tasks then the Tasks themselves should have clearml in the "installed packages", no? and if they do not, how were they created?
Hi @<1729309120315527168:profile|ShallowLion60>
Clearml in our case installed on k8s using helm chart (version: 7.11.0)
It should be done "automatically", I think there is a configuration var in the helm chart to configure that.
What urls are you urls seeing now, and what should be there?
I think it would make sense to have one task per run to make the comparison on hyper-parameters easier
I agree. Could you maybe open a GitHub issue on it, I want to make sure we solve this issue π
It's more or less here:
https://github.com/allegroai/clearml-session/blob/0dc094c03dabc64b28dcc672b24644ec4151b64b/clearml_session/interactive_session_task.py#L431
I think that just replacing the package would be enough (I mean you could choose hub/lab, which makes sense to me)
Container environment setup overhead?
LOL, Let me look into it, could it be the calling file is somehow deleted ?
BoredHedgehog47 you need to make sure "<path here>/train.py" also calls Task.init (again no need to worry about calling it twice with different project/name)
The Task.init call will make sure the auto-connect works.
BTW: if you do os.fork , then there is no need for the Task.init, the main difference is that POpen starts a whole new process, and we need to make sure the newly created process is auto-connected as well (i.e. calling Task.init)
pipe.start_locally() will run the DAG compute part on the same machine, where pipe.start() will start it on a remote worker (if it is not already running on a remote worker)
basically "pipe.start()" executed via an agent, will start the compute (no overhead)
does that help?
Sorry @<1524922424720625664:profile|TartLeopard58> π we probably missed it
clearml-session is still being developed π
Which issue are you referring to ?
Hi @<1653207659978952704:profile|LovelyStork78>
I have a docker container with all the dependencies.
Well I think the main question is are you using the clearml-agent to launch jobs/experiments? If you do it makes sense to specify your docker as "base docker image" (in the UI look for under the Execution tab, Container).
This means the agent will use the pre-installed environment and will add anything that your Task needs on top of it, this of course includes pushing your codebase i...
Could it be these packages (i.e. numpy etc) are not installed as system packages in the docker (i.e. inside a venv, inside the docker) ?
Hi @<1729309131241689088:profile|MistyFly99>
notice that the files server need to have an "address" that can be accessed from the browser, data is stored in a federated manner. This means your browser is directly accessing the files server, not through the API server, I'm assuming the address is not valid?
MistakenBee55 how about a Task doing the Model quantization, then trigger it with TriggerScheduler ?
https://github.com/allegroai/clearml/blob/master/examples/scheduler/trigger_example.py
Many thanks! I'll pass on to technical writers π
GiganticTurtle0 we had this discussion in the wrong thread, I moved it here.
Moved from the wrong thread
Martin.B Β Β [1:55 PM]
GiganticTurtle0 Β the sample mock pipeline seems to be running perfectly on the latest code from GitHub, can you verify ?
Martin.B Β Β [1:55 PM]
Spoke too soon, sorryΒ π Β issue is reproducible, give me a minute here
Alejandro C Β Β [1:59 PM]
Oh, and which approach do you suggest to achieve the same goal (simultaneously running the same pipeline with differen...
I think there was an issue with the entire .ml domain name (at least for some dns providers)
You should have metric :monitor:gpu
variant gpu_0_utilization
Since I see you have none of those, that points to no GPU driver ...
Could that be ?
I'm not sure the files-server supports "continue" from last position...
I understand that it uses time in seconds when there is no report being logged..but, it has already logged three times..
Hmm could it be the reporting started 3 min after the Task started ?
So are you saying the large file size download is the issue ? (i.e. network issues)
MagnificentSeaurchin79 you can delay it with:task.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
It seems that I solved the problem by moving all of the local codeΒ (local repos) imports to after the Task.init
PunyPigeon71 I'm confused, how did that solve the issue on the remote machine?