Reputation
Badges 1
25 × Eureka!This is odd I was running the example code from:
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
It is stored inside a repo, but the steps that are created (i.e. checking the Task that is created) do not have any repo linked to them.
What's the difference ?
ElegantCoyote26 could be, if the Task run is under 30sec?!
BoredHedgehog47 you need to configure the clearml k8s glue to spin pods (instead of allocating agents per pods statically) does that make sense ?
worker nodes are bare metal and they are not in k8s yet
By default the agent will use 10022 as an initial starting port for running the sshd that will be mapped into the container. This has nothing to do with the Host machine's sshd. (I'm assuming agent running in docker mode)
can we somehow in clearml-session choose the pool of ports for work?
Yes, I think you can.
How do you spin the worker nodes? Is it Kubernetes ?
EnviousPanda91 'connect' will log the object properties, the automagic logging is controlled in the Task.init call. Specifically Which framework produces metrics that are not logged? Your sample code manually reports some scalars/values, do you these as well?
EnviousPanda91 so which frame works are being missed? Is it a request to support new framework or are you saying there is a bug somewhere?
oh that makes sense.
I would add to your Task's docker startup script the following:
ls -la /.ssh
ls -la ~/.ssh
cat ~/.ssh/id_rsa
Let's see what you get
FranticCormorant35
See here https://github.com/allegroai/trains/blob/master/examples/manual_reporting.py#L42
I'm not sure this is configurable from the outside π
BoredHedgehog47 I tried changing the order of imports on the sample code I shared before, it worked in both cases ...
JitteryCoyote63 yes this is very odd, seems like a pypi flop ?!
On the website they do say there is 0.5.0 ... I do not get it
https://pypi.org/project/pytorch3d/#history
It is available of course, but I think you have to have clearmls-server 1.9+
Which version are you running ?
You should have a download button when you hover over the table, I guess that would be the easiest.
If needed I can send an SDK code but unfortunately there is no single call for that
EnviousStarfish54
oh, this is a bit different from my expectation. I thought I can use artifact for dataset or model version control.
You totally can use artifacts as a way to version data (actually we will have it built in in the next versions)
Getting an artifact programmatically:
Task.get_task(task_id='aabb'). artifacts['artifactname'].get()
Models are logged automatically. No need to log manually
but this is not different from not using clearml-data,
ReassuredTiger98 just making sure we are on the same page. clearml-data immutability is fixed, the user cannot change the content of the dataset (it is actually compressed and uploaded). If you want to change it, you create a new child version
each of it gets pushed as a separate Model entity right?
Correct
But thereβs only one unique model with multiple different version of it
Do you see multiple lines in the Model repository ? (every line is an entity) basically if you store it under the same local file, it will override the model entry (i.e. reuse it and upgrade the file itself), otherwise you are creating a new model, "version" will be progress in time ?
Well it should work, make sure you see the Task "holds" all the information needed (under the execution tab). repo / uncommitted changes / python packages etc.
Then configure your agent (choose pip/conda/poetry as package managers), and spin it up (by default in venv/coda mode, or in docker mode)
Should work π
Hi @<1541954607595393024:profile|BattyCrocodile47>
Does clearML have a good story for offline/batch inference in production?
Not sure I follow, you mean like a case study ?
Triggering:
We'd want to be able to trigger a batch inference:
- (rarely) on a schedule
- (often) via a trigger in an event-based system, like maybe from AWS lambda function(2) Yes there is a great API for that, checkout the github actions it is essentially the same idea (RestAPI also available) ...
DeliciousSeal67
are we talking about the agent failing to install the package ?
You can set torch to be installed last:
post_packages: ["horovod", "torch"]
Which will make sure the "trains-agent" version (the one you specified in the "installed packages" will be installed last.
Hi RipeGoose2
You can also report_table them? what do you think?
https://github.com/allegroai/clearml/blob/master/examples/reporting/pandas_reporting.py
https://github.com/allegroai/clearml/blob/9ff52a8699266fec1cca486b239efa5ff1f681bc/clearml/logger.py#L277
JitteryCoyote63 so now everything works as expected ?
RC you can see on the main readme, (for some reason the Conda badge will show RC and the PyPi won't)
https://github.com/allegroai/clearml/
Hi DeliciousBluewhale87
You can achieve the same results programmatically with Task.create
https://github.com/allegroai/clearml/blob/d531b508cbe4f460fac71b4a9a1701086e7b6329/clearml/task.py#L619
basically use the template π we will deprecate the override option soon
One way to circumvent this btw would be to also add/use theΒ
--python
Β flag forΒ
virtualenv
Notice that when creating the venv , the cmd that is used is basically pythonx.y -m virtualenv ...
By definition this will create a new venv based on the python that executes the venv.
With all that said, it might be there is a bug in virtualenv and in some cases, it does Not adhere to this restriction