Reputation
Badges 1
25 × Eureka!UnevenDolphin73 are you positive, is this reproducible? What are you getting?
Hi DisgustedDove53
When you say "deployment" there are a lot of way to interpret that π what exactly are you looking for ?
BTW: if you feel like pushing forward with integration I'll be more than happy to help PRing new capabilities, even before the "official" release
Hi ContemplativeCockroach39
Assuming you wrap your model with a flask app (or using any other serving solution), usually you need:
Get the model Add some metrics on runtime performance package in a dockerGetting a pretrained model is straight forward one you know either the creating Task or the Model ID
` from clearml import Task, Model
model_file_from_task = Task.get_task(task_id).models['output'][-1].get_local_copy()
or
model_file_from_model = Model(model_id=<moedl_id>).get_local_copy()...
RoundMosquito25 do notice the agent is pulling the code from the remote repo, so you do need to push the local commits, but the uncommitted changes clearml will do for you. Make sense?
do I need to have the repo that I am running on my account
If it is a public repo, then no need, credentials are only needed for private repos π
Am I missing something ?
Whatβs the general pattern for running a pipeline - train model, evaluate metrics and publish the model if satisfactory (based on a threshold, for example)
Basically I would do:
parameters for pipeline:
TaskA = Training model Task (think of it as our template Task)
Metric = title/series/sign we want to choose based on, where sign is max/min
Project = Project to compare the performance so that we could decide to publish based on the best Metric.
Pipeline:
Clone TaskA Change TaskA argu...
Is there a way to do this all elegantly?
Of yes there is, this is how TaskB code will look:
` task = Task.init(..., 'task b')
param = {'TaskA' :'TaskAs ID HERE'}
task.connect(param)
taska_model = Task.get_task(param['TaskA']).models['output''][-1]
torch.load(taska_model.get_local_copy())
train
torch.save('modelb') `I might have missed something there, but generally speaking this will let you:
Select TASKA as a parameter of TaskB training process Will register automagically Tasks'A...
Hi MinuteCamel2
I can I disable it from automatically uploading model checkpoints to ClearML servers?
Maybe this one can help :)
https://www.youtube.com/watch?v=etGjxOKG9lo
deleted all of the models from my ClearML project but I still receive this message. Do you know why?
It might take it a few hours to update... π
but it still not is able to run any task after I abort and rerun another task
When you "run" a task you are pushing it to a queue, so how come a queue is empty? what happens after you push your newly cloned task to the queue ?
BTW: you should probably update the server, you're missing out on a lot of cool features π
, I can see the shape is
[136, 64, 80, 80]
. Is that correct?
Yes that's correct. In case of the name, just try input__0
Notice you also need to convert it to torchscript
The image is
allegroai/clearml:1.0.2-108
Yep, that makes sense, seems like a backwards compatibility issue
Hi @<1724235687256920064:profile|LonelyFly9>
So, I noticed that with the REST API at least the
/tasks.get_all
endpoint appears to have an undocumented maximum page size of 500.
Yeah otherwise the request size might be too big, but you have pagination:
page
optional Page number, returns a specific page out of the resulting list of tasks
Minimum value : 0 integer
I think the part that is missing for me is the context, in other words how would one configure the execution_plan
and why would they configure it in a specific way?
My intuition, without fully understanding it, is that for some reason the internal DAG/decision is exposed to the user, and it feels like too much information. Basically I have a hunch that the users should not need to have such deep understanding to control the flow, and they should end up with an abstraction on top of it. ...
@<1560074028276781056:profile|HealthyDove84> if you want you can PR a fix, it should be very simple basically:
None
elif np_dtype == str:
return "STRING"
elif np_dtype == np.object_ or np_dtype.type == np.bytes_:
return "BYTES"
return None
(also could you make sure all posts regrading the same question are put in the thread of the first post to the channel?)
UnevenDolphin73 FYI: clearml-data is documented , unfortunately only in GitHub:
https://github.com/allegroai/clearml/blob/master/docs/datasets.md
Hi RobustRat47
My guess is it's something from the converting PyTorch code to TorchScript. I'm getting this error when trying the
I think you are correct see here:
https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/examples/pytorch/train_pytorch_mnist.py#L136
you have to convert the model to TorchScript for Triton to serve it
RobustRat47 what's the Triton container you are using ?
BTW, the Triton error is:model_repository_manager.cc:1152] failed to load 'test_model_pytorch' version 1: Internal: unable to create stream: the provided PTX was compiled with an unsupported toolchain.
https://github.com/triton-inference-server/server/issues/3877
RobustRat47 are you saying updating the nvidia drivers solved the issue ?
I can raise this as an issue on the repo if that is useful?
I think this is a good idea, at least increased visibility π
Please do π
SubstantialElk6 if you call Task.init with continue_last_task=<task_id> it will automatically add the last_iteration of the previous run, to any logging/report so you never overwrite the previous reports π
hmm DeliciousKoala34
what are you getting if you put this at the top of your code (the one you are running in the remote docker)import os print([(k, os.environ[k]) for k in os.environ if k.startswith("CLEARML_")])