Reputation
Badges 1
25 × Eureka!Hi FierceHamster54
Are you saying the pipeline component is a standalone script?
If this is the case then you are correct, it should not need to, I think you can specify it in the decorator.
I think this might work π€@PipelineDecorator.component(..., repo=False)
How can i find queue name
You can generate as many as you like, the default one is called "default" but you can add new queues in the UI (goto workers & queus page, then Queues, and click "+ New Queue"
Hi CheekyAnt38
However now I would like to evaluate directly my machine learning model via api requests, directly over clearml. Itβs possible?
This basically means serving the model, is this what you mean?
Thanks TrickyRaccoon92
I think it's about time we remove the survey link anyhow π
I'll make sure it happens ..,
"Updates a few seconds ago"
That just means that the process is not dead.
Yes that seemed to be stuck π
Any chance you can verify with the RC version?
I'll try to dig into the commits, maybe I can come up with an explanation ...
.I am using pipeline from tasks method and not pipeline from decorator.
Wait I'm confused nowm if this is a pipeline from Tasks then the Tasks themselves should have clearml in the "installed packages", no? and if they do not, how were they created?
Change to add_missing_installed_packages=False, here, and see if you end up with git diff
https://github.com/allegroai/clearml/blob/1f82b0c4010799be6157f5c845c7f6ac48e71c0c/clearml/backend_interface/task/populate.py#L158
Hi @<1524922424720625664:profile|TartLeopard58>
canβt i embed scalars to notion using clearml sdk?
I think that you need the hosted version for it (it needs some special CORS stuff on the server side to make it work)
Did you try in the clearml report? does that work?
Hi PompousBeetle71 , Trains will log all the torch.save call, I'm assuming they do not actually use it for the rest of the files on that folder.
If you like to share a code snippet we could see if we could auto-magically log it You could use artifacts and store the entire folder. It will zip it an upload it. Then you can reuse it from other experiments. https://allegro.ai/docs/task.html?highlight=artifact#trains.task.Task.upload_artifact
Example:
` task.upload_artifact('transformer', './my_...
task.mark_completed()
You have that at the bottom of the script, never call it on yourself, it will kill the actual process.
So what is going on you are marking your own process for termination, then it terminates itself leaving the interpreter and this is the reason for the errors you are seeing
The idea of mark_* is to mark an external Task, forcefully.
By just completing your process with exit code (0) (i.e. no error) the Task will be marked as completed anyhow, no need to call...
Oh I see
but now I'm confused if this is from code, why aren't you coping the Pipeline ID from the UI?
regrading the query, it should be something like
task_to_schedule = Task.get_task(project_name='MyProject/.pipelines/PipelineName', task_name='PipelineName')
does this work for multiple levels?
Yep π
Hi JitteryCoyote63
Could it be a python mismatch ? can you send the full log?
BTW: when I dopip3.8 install pytorch3d==I get the following versions:pytorch3d== (from versions: 0.0.1, 0.1.1, 0.2.0, 0.2.5, 0.3.0)
Hi PompousParrot44
Well this kind of control is tricky. If you don't mind processes "fighting over cpu" you can just spin two trains-agents in cpu-mode. It will work as long as they have a different TRAINS_WORKER_NAME
The other option (might be a bit of an overkill) is to use K8s, which will set the CPU % for the entire agent.
What do you think?
Basically just change the helm yamlqueue: my_second_queue_name_here
CheekyFox58 what do you have in the plots Tab?
hmmm I see...
It seems to miss the fact that your process do uses the GPU.
Maybe it only happens later, that the GPU is used?
Does that make sense ?
CheerfulGorilla72 my guess is the Slack token does not have credentials for the private channel, could that be ?
Hello guys, i have 4 workers (2 in default and 2 in service queue on same machine)
Hi @<1526734437587357696:profile|ShaggySquirrel23>
I think what happens is one agent is deleting it's cfg file when it is done, but at least in theory each one should have it's own cfg
One last request can you try with the agent's latest RC version 1.5.3rc2 ?
By default the pl Trainer will output everything to TB, which we automatically store. But verify that TB is installed
RipeGoose2 you are not limited to the automagic
From anywhere in your code you can always do:from trains import Logger Logger.current_logger().report_plotly(...)So you can add any manual reporting on top of the one generated by lightning .
Sounds good?
logger.report_scalar("loss-train", "train", iteration=0, value=100)logger.report_scalar("loss=test", "test", iteration=0, value=200)
notice that the title of the graph is its uniue id, so if you send scalars to with the same "title" they will show on the same graph
I know that there is possibility to set up some budget - for example seconds of running after which optimization stops. But is there a possibility to specify a boolean condition when work should stop?
RoundMosquito25 you mean when you reach a limit of loss<Threshold or something similar ?
Hi @<1792364603552829440:profile|TestyBeetle31>
Yeah so sorry we finally changed the repository name:
None
Where is the broken this link coming from, we will fix it (we are working on it, and some of the services do not auto forward
I notice that, in my Serving Service situated in the DevOps project, the "endpoints" section doesn't seem to get updated when I tag a new model with "released".
It takes it a few minutes (I think 5 min is the default) to update.
Notice that you need to add the model with
model auto-update --engine triton --endpoint "test_model_pytorch_auto" ...
Not with model add (if for some reason that does not work please let me know)
No need to pass the model version i.e. 1 you can ...
GiddyTurkey39 what do you have in the Task itself
(i.e. git repo uncommitted changes installed packages)
JitteryCoyote63 no you should not (unless you already have the Task.init call in your code)clearml-data add the Task.init call at the beginning of the code in the entry point.
This means you should be able to get Task.current_task() and get back the object.
What do you have under the "uncommitted changes" on the Task that was created?
UnevenDolphin73 clearml.config.get_remote_task_id() will return the Task ID not the Task object. in order to get automagic to work, one h...
Just to get the full picture, are we expecting to see the newly created step (aka eager execution) on the original pipeline (i.e. as part od the DAG visualization) ?