Reputation
Badges 1
25 × Eureka!Hi RobustRat47
My guess is it's something from the converting PyTorch code to TorchScript. I'm getting this error when trying the
I think you are correct see here:
https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/examples/pytorch/train_pytorch_mnist.py#L136
you have to convert the model to TorchScript for Triton to serve it
Hi BattyLion34
I might have a solution, in order to make sure the two agents are not sharing the "temp" folder:
create two copies of ~/clearml.conf , let's call them :
~/clearml_service.conf ~/clearml_agent.confThen in each one select a different venvs_dir see here:
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L90
for example:
~/.clearml/venvs-builds1 ~/.clearml/venvs-builds2Now start the two agents with:
The service age...
BattyLion34 let me see if I understand.
The same base_task_id when cloned by the UI and enqueues on the same queue as the pipeline, will work but when the pipeline runs the same Task it fails?!
Could it be that you enqueue them on different queues ?
BattyLion34
Maybe something inside the task is different?!
Could you run these lines and send me the result:from clearml import Task print(Task.get_task(task_id='failing task id').export_task()) print(Task.get_task(task_id='working task id').export_task())
Hi @<1533257411639382016:profile|RobustRat47>
sorry for the delay,
Hi when we try and sign up a user with github.
wait, where are you getting this link?
That should spin up an instance, right? (it currently doesn't, and I'm not sure where to debug)
Do you see the AWS scaler Task running ?
(This is the code/process that actually spins a new EC2 instance)
Hi JuicyDog96
The easiest way is:from trains.backend_api.session.client import APIClient client = APIClient() client.projects.get_all()You can just run it from a python console and check what you are getting.
Full API is https://github.com/allegroai/trains/tree/master/trains/backend_api/services/v2_8
BattyLion34 I have a theory, I think that any Task on the "default" queue qill fail if a Task is running on the "service" queue.
Could you create a toy Task that just print "." and sleeps for 5 seconds and then prints again.
Then while that Task is running, from the UI launch the Task that passed on the "default" queue. If my theory holds it should fail, then we will be getting somewhere 🙂
sorry the point where you select the interpreter for pycharm
Oh I see...
ResponsiveCamel97
could you attach the full log?
ElegantCoyote26 could you upgrade the docker-compose ?
First that is awesome to hear PanickyFish98 !
Can you send the full exception? You might be on to something...
2. Actually we thought of it, but could not find a use case, can you expand?
3. I'm not sure I follow, do you mean you expect the first execution to happen immediately?
That makes no sense to me?!
Are you absolutely sure the nntrain is executed on the same queue? (basically could it be that the nntraining is executed on a different queue in these two cases ?)
This will fix it, the issue is the "no default value" that breaks the casting@PipelineDecorator.component(cache=False) def step_one(my_arg=""):
Hmm that is odd, let me see if I can reproduce it.
What's the clearml version you are using ?
BattyLion34
if I simply clone nntraining stage and run it in default queue - everything goes fine.
When you compare the Task you clone manually and the Task created by the pipeline , what's the difference ?
No, I mean actually compare using the UI, maybe the arguments are different or the "installed packages"
Any updates on trigger and schedule docs
I think examples are already pushed, docs still in progress.
BTW: pipeline v2 examples are also out:
https://github.com/allegroai/clearml/blob/master/examples/scheduler/trigger_example.py
https://github.com/allegroai/clearml/blob/master/examples/pipeline/full_custom_pipeline.py
And the agent section on this machine is:api_server: web_server: files_server:
Is that correct?
MelancholyBeetle72 thanks! I'll see if we could release an RC with a fix soon, for you to test :)
So does that mean "origin" solves the issue ?
@<1523715429694967808:profile|ThickCrow29> this is odd... how did you create the pipeline? can you provide code sample?
How can i make it such that any update to the upstream database
What do you mean "upstream database"?
Hi PricklyJellyfish35
My apologies this thread was forgotten 😞
What's the current status with the OmegaConf, ? (I'm not sure I understand what do mean by resolve=False)
I want to run only that sub-dag on all historical data in ad-hoc manner
But wouldn't that be covered by the caching mechanism ?
We are working hard on release 1.7 once that is out we will push an RC for review (I hope) 🙂
Are you inheriting from their docker file ?