@<1546303293918023680:profile|MiniatureRobin9>
, not the pipeline itself. And that's the last part I'm looking for.
Good point, any chance you want to PR this code snippet ?
def add_tags(self, tags):
# type: (Union[Sequence[str], str]) -> None
"""
Add Tags to this pipeline. Old tags are not deleted.
When executing a Pipeline remotely (i.e. launching the pipeline from the UI/enqueuing it), this method has no effect.
:param tags: A li...
@<1689446563463565312:profile|SmallTurkey79> could you attach the full log of the Task?
also I would recommend "export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1" (not true
)
Usually binary env vars are 0/1
(I can see that the docs here: None
never mention it, I'll ask them to add that)
Is the agent idle ? it is running something else ?
What exactly do you get automatically on the "Installed Packages" (meaning the "my_package" line)?
in clearml.conf we could have:azure.storage { max_connections = 10 # containers: [ # { # account_name: "clearml" # account_key: "secret" # # container_name: # } # ] }
Then in AzureContainerConfigurations
:
` @classmethod
def from_config(cls, configuration):
...
class AzureContainerConfigurations(object):
def init(self, container_configs=None, max_connections=None):
...
as i also noticed that uploads are sometimes slow, and i see here max_connections=2
Makes sense to me, please go ahead and add that as well (basically the same thing on _AzureBlobServiceStorageDriver.upload_object
and an additional variable on the AzureContainerConfigurations
class.
Could you PR a tested draft ? we will be able to take from there
EnviousStarfish54
Can you check with the latest clearml from github?pip install git+
Hi PunyGoose16 ,
I think the website is probably the easiest 🙂
https://clear.ml/contact-us/
I think they get back to quite quickly
okay this seems like a broken pip install python3.6
Can you verify it fails on another folder (maybe it's a permissions thing, for example if you run in docker mode, then the permissions will be root, as the docker is creating those folders)
BTW: the above error is a mismatch between the TF and the docker, TF is looking for cuda 10, and the docker contains cuda 11
Hi PompousParrot44
Well this kind of control is tricky. If you don't mind processes "fighting over cpu" you can just spin two trains-agents in cpu-mode. It will work as long as they have a different TRAINS_WORKER_NAME
The other option (might be a bit of an overkill) is to use K8s, which will set the CPU % for the entire agent.
What do you think?
PompousParrot44 now that I think about it, you might be able to limit the cpu affinity, would that help?
the use case i have is to allow people from my team to run their workloads on set of servers without stepping over each other..
So does that mean CPU only workloads?
Also are we afraid of fairness? (i.e. someone "taking" all the CPU for themselves)
TenseOstrich47 it's based on free "index" so the first index not in used will be captured, but if you remove agents, then the order will change e.g. you take down worker #1 , the next worker you spin will be #1 becuase it is not taken)
Maybe failed pipelines with zero steps count as completed
zero steps counts as successful.
That said, how could it have zero steps if one of the steps failed? no?
it fails but with COMPLETED status
Which Task is marked "completed" the pipeline Task or the Step ?
if fails duringÂ
add_step
 stage for the very first step, becauseÂ
task_overrides
 contains invalid keys
I see, yes I guess it it makes sense to mark the pipeline as Failed 🙂
Could you add a GitHub issue on this behavior, so we do not miss it ?
ClumsyElephant70
Could it be virtualenv package is not installed on the host machine ?
(From the log it seems you are running in venv mode, is that correct?)
PompousParrot44 with pleasure. If during your search for a solution you come across something that solves it, and might integrate to the agent, do not hesitate to suggest it :)
ClumsyElephant70
Can you manually run the same command ?['python3.6', '-m', 'virtualenv', '/home/user/.clearml/venvs-builds/3.6']
Basically:python3.6 -m virtualenv /home/user/.clearml/venvs-builds/3.6'
SmarmySeaurchin8 could you test with the latest RCpip install clearml==0.17.5rc2
Regrading the missing packages, you might want to test with:force_analyze_entire_repo: false
https://github.com/allegroai/trains/blob/c3fd3ed7c681e92e2fb2c3f6fd3493854803d781/docs/trains.conf#L162
Or if you have a full venv you like to store instead:
https://github.com/allegroai/trains/blob/c3fd3ed7c681e92e2fb2c3f6fd3493854803d781/docs/trains.conf#L169
BTW:
What is the missed package?
Thanks! a few thoughts below 🙂
- not true — you can specify the image you want for each stepMy apologies, looking at the release notes, it was added a while back and I have not noticed 😞
- re: role-base access control - see Outerbounds Platform that provides a layer of security and auth features required by enterprisesRole based access meaning limiting access in metaflow i.e. specific users/groups can only access specific projects etc. ...
An upload of 11GB took around 20 hours which cannot be right.
That is very very slow this is 152kbps ...
JitteryCoyote63 I think there is a ClearML logger , no?