Can you clone the git with the .ssh credentials on the host machine ?
If so, can you do the same manually inside a docker (i.e. spin a docker with mount -v /home/hostuser/.ssh:/root/.ssh) ?
Whoa, are you saying there's an autoscaler that
doesn't
use EC2 instances?...
Just to be clear the ClearML Autoscaler (aws) will spin instances up/down based on jobs in the queue it is listening to (the type of EC2 instances and configuration is fully configurable)
first try the current setup using
pip
, and if it fails, use
poetry
if
poetry.lock
exists
I guess the order here is not clear to me (the agent does the opposite), why would you start with pip if you are using poetry ?
The other way around- "8011:8008"
this topic is about the issue with reporting a configuration with a string inside a tuple that has backslash
So the encoding itself is done YAML style, and based on your example \b Has to be encoded to \b because this is string encoding, like \n will become "new line"
Make sense ?
So you mean 1.3.1 should fix this bug?
Yes it should see the release notes, there are a few "disappearing" UI fixes:
https://github.com/allegroai/clearml-server/releases/tag/v1.3.0
SubstantialElk6 Ohh okay I see.
Let's start with background on how the agent works:
When the agent pulls a job (Task), it will clone the code based on the git credentials available on the host itself, or based on the git_user/git_pass configured in ~/clearml.conf
https://github.com/allegroai/clearml-agent/blob/77d6ff6630e97ec9a322e6d265cd874d0ab00c87/docs/clearml.conf#L18
The agent can work in two modes:
Virtual environment mode, where it will create a new venv for each experiment ba...
SolidSealion72 this makes sense, clearml deletes artifacts/models after they are uploaded, so I have to assume these are torch internal files
. So i'd like to use the command line argument it in the first argparse, and then hide/delete/override before running the second argparse.
Nice, hack!
Hi OddShrimp85
I think numpy 1.24.x is broken in a lot of places we have noticed scikit breaks on it, TF and others 😞
I will make sure we fix this one
if you have an automation process, then you should have the Task object, no?
then you have task.id
What am I missing here?
Hi WhimsicalLion91
You can always explicitly send a value:from trains import Logger Logger.current_logger().report_scalar("title", "series", iteration=0, value=1337)
A full example can be found here:
https://github.com/allegroai/trains/blob/master/examples/reporting/scalar_reporting.py
Hi PompousParrot44
Could you send the "Installed Packages" list?
I think there is a bug in the current trains-agent (there is already a fix but the RC is still not out),
where "packeg @ git+http" packages ignore the git+http link.
You can solve it manually by just editing the "Installed packages" (when Task is in draft mode, the section becomes editable), and remove the "package @" part, and leave the "git+http" link.
SoggyBeetle95 the question is, where does clearml stores these arguments, and the answer is on the Task object (from there the agent will take them and apply to the docker execution). Now since all users see all the tasks, they also see these arguments. Wdyt?
CloudyHamster42
RC probably in a few days, but notice that it will just remove the warnings, I still can't reproduce the double axis issue.
It will be helpful if you could send a small script to reproduce the problem.
Maybe this example code can help ? https://github.com/allegroai/trains/blob/master/examples/manual_reporting.py
you can also specify additional packages on the decorator@PipelineDecorator.component(..., packages=["tqdm>=2.1", "scikit-learn"]) def step_one(...): # code here
Yes exactly like a Task (pipeline is a type of task)
'''
clonedpipeline=Task.clone(pipeline_uid_here)
Task.enqueue(...)
'''
In my understanding requests still go through
clearml-server
which configuration I left
DefiantHippopotamus88 actually this is Not correct.
clearml-server only acts as a control plane, no actual requests are routed to it, it is used to sync model state, stats etc. not part of the request processing flow itself.curl: (56) Recv failure: Connection reset by peer
This actually indicates 9090 port is not being listened to...
What's the final docker-compose you are usi...
Right, if this is the case, then just use 'title/name 001'
it should be enough (I think this is how TB separates title/series or metric/variant )
They inherit from one another, so it does make sense. Also the add_tags is on the "main" Task and not the backend parent
StaleMole4 you are printing the values before Task.init had the chance to populate it.
Basically try moving the print after closing the Task (closing the tasks waits for the async update)
Make sense ?
Let me know if I understand you correctly, the main goal is to control the model serving, and deploy to your K8s cluster, is that correct ?
BeefyHippopotamus73 are you saying that on a remote machine you cannot set AWS_PROFILE
? or is it the clearml.conf
is missing ? (not sure I follow how / who spins the remote machine)
BTW: I think it was fixed in the latest trains package as well as the cleaml package