Thanks @<1523702652678967296:profile|DeliciousKoala34> I think I know what the issue is!
The container has 1.3.0a and you need 1.3.0 this is why it is re-downloading (I'll make sure the agent can sort it out, becuase this is Nvidia's version in reality it should be a perfect match)
Are you sure you added the pytorch channel in clearml.conf ?
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L64
Hi @<1545216070686609408:profile|EnthusiasticCow4>
is there a way to get the date from the InputModel?
You should be able to with model._get_model_data()
But I think we should have it all exposed, wdyt?
Hi @<1523701066867150848:profile|JitteryCoyote63>
RC is out,
pip3 install clearml-agent==1.5.3rc3
Then in pytorch_resolve: "direct"
None
Let me know if it worked
Hi JitteryCoyote63 ,
I remember seeing something similar on our GitHub...
The error itself is pip failing to run "git clone" , seems like a weird network connection error (TLS is the HTTPS security layer)
Quite hard for me to try this right
đź‘Ť
How do I reproduce it ?
- At its simplest, this could just mean checking that all of the steps and the pipeline itself have completed successfully (by checking their “Task status”).If a pipeline step ends with "failed" status in the pipeline execution function an exception will be raised, if the exception is not caught, the pipeline itself will also fail
run
pipeline_script.py
which contains the pipeline code as decorators.
So in theory the following should actually work.
Let's assume you ...
Hi JuicyDog96
The easiest way is:from trains.backend_api.session.client import APIClient client = APIClient() client.projects.get_all()
You can just run it from a python console and check what you are getting.
Full API is https://github.com/allegroai/trains/tree/master/trains/backend_api/services/v2_8
Hi GracefulDog98
Are argument parameters to the script not passed on to the workers, or am I missing something?
The arguments are passed directly when the code is executed (i.e. the argparser parse_args is called).
If the code fails, I'm assuming the argparse is called before clearml is imported, could that be the case ?
Hi PanickyLion56
Yep savefig also works, you can also do,from clearml import Logger Logger.current_logger().report_matplotlib_figure(title="My Plot Title", series="My Plot Series", iteration=10, figure=plt)
https://github.com/allegroai/clearml/blob/0c5d12b830987aa9bb8d44d81e92ff9198008f29/examples/frameworks/matplotlib/matplotlib_example.py#L25
none of my pipeline tasks are reporting these graphs, regardless of runtime. I guess this line would also fix that?
Same issue, that said, good point, maybe with pipeline we should somehow make that a default ?
Hi ReassuredTiger98
I do not want to create extra queues for this since this will not be able to properly distribute tasks.
Queues are the way to abstract different resources to "compute capabilities". It creates a simple interface to users on the one hand and allows you to control the compute on the other Agents can listen to multiple queues with priority. This means an RTX agent can pull from an RTX queue, and if this is empty, it will pull from "default" queueWould that work for ...
a bit sad that there is no working integration with one of the leading time series framework...
You mean a series darts reports ? if it does report it, where does it do so? are you suggesting we have Darts integration (which sounds like a good idea) ?
I cannot modify an autoscaler currently running
Yes this is a known limitation, and I know they are working on fixing it for the next version
We basically have flask commands allowing to trigger specific behaviors. ...
Oh I see now, I suspect the issue is that the flask command is not executed from within the git project?!
Hmm I wonder, can you try with this line before?Task._report_subprocess_enabled = False frameworks = { 'tensorboard': True, 'pytorch': False } Task.init(...)
task.update({'script': {'version_num': 'my_new_commit_id'}})
This will update to a specific commit id, you can pass empty string '' to make the agent pull the latest from the branch
How about this one:
None
potential sources of slow down in the training code
Is there one?
GiganticTurtle0 we had this discussion in the wrong thread, I moved it here.
Moved from the wrong thread
Martin.B Â Â [1:55 PM]
GiganticTurtle0 Â the sample mock pipeline seems to be running perfectly on the latest code from GitHub, can you verify ?
Martin.B Â Â [1:55 PM]
Spoke too soon, sorry 🙂  issue is reproducible, give me a minute here
Alejandro C Â Â [1:59 PM]
Oh, and which approach do you suggest to achieve the same goal (simultaneously running the same pipeline with differen...
You can always specify diff clearml.conf files with --config-file 🙂
hi @<1546303293918023680:profile|MiniatureRobin9>
I can still see the metrics in Grafana. I
it will not delete it from grafana, it means it's no longer collected, make sense ?
What's the clearml-server version ?
So inside the pipeline logic you can do Task.current_task().id
Or inside a component Task.current_task().parent
Hmm so the SaaS service ? and when you delete (not archive) a Task it does not ask for S3 credentials when you select delete artifacts ?
single task in the DAG is an entire ClearML
pipeline
.
just making sure detials are not lost, "entire ClearML pipeline ." : the pipeline logic is process A running on machine AA.
Every step of that pipeline can be (1) subprocess, but that means the exact same environement is used for everything, (2) The DEFAULT behavior, each step B is running on a different machine BB.
The non-ClearML steps would orchestrate putting messages into a queue, doing retry logic, and tr...
Hey SarcasticSparrow10 see here 🙂
https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_linux_mac.html#upgrading
Whoa, are you saying there's an autoscaler that
doesn't
use EC2 instances?...
Just to be clear the ClearML Autoscaler (aws) will spin instances up/down based on jobs in the queue it is listening to (the type of EC2 instances and configuration is fully configurable)