Reputation
Badges 1
45 × Eureka!Thanks @<1523701070390366208:profile|CostlyOstrich36> , but doesn’t the agent create/caches an environment from the requirements.txt when running? I’m reproducing an old project that used to work like that, and also my ClearML.conf set to work that way
its working now, thanks that was the problem.
Just upgraded to clearml-agent==1.5.1 and I still get this error.
Hi @<1523701070390366208:profile|CostlyOstrich36> , it is part of the repository, do pipelines run differently then normal tasks? what I mean is when i run a task it has a working directory do pipelines also have that or are their working directory the root of the repo?
Yes it does, thank you @<1523701070390366208:profile|CostlyOstrich36>
Hi @<1523701070390366208:profile|CostlyOstrich36> , I am using the community server, what happens if i change to a self hosting server?
I'm using Tensorboard to report everything, nothing special besides that.
Hi @<1523701205467926528:profile|AgitatedDove14> ,
That solved my problem thank you, from my deep dive I've found the problem there was a package called install that changed it's name to pip-install and its requirement was setuptools .
Thanks again for the help.
@<1523701087100473344:profile|SuccessfulKoala55> did i do something wrong?
Hi @<1523701070390366208:profile|CostlyOstrich36> ,
but how do I configure this if I'm not hosting the clearml server?
where can i find the services.conf file?
Solved it by doing clearml.Task.current_task().id but thank you
Btw in pipelines is there a way to get the pipelines main task id? for example <step_name>.id gets me the stages id but I need the main pipeline that's running all the tasks
Thanks John, I read the one about the pip timeout, the problem is that I'm assume clearml runs the following command :
"pip install -r requirments.txt" and I want to know if I make clearml add the timeout flag.
sadly the teammate that had the problem re-ran the experiments so i don't have the taskids but I do have the cpu and gpu usage of the agent that ran the experiment:
And I'm looking at None as an example of a clearml.conf file and i can't seem to find sdk.development.worker.console_cr_flush_period this flag.
I reviewed this example and sadly there isn't anything about how to upload a path as a string only.
@<1523701087100473344:profile|SuccessfulKoala55> yes the working dir is set to the correct path and yet it cannot import the train module
Thank you @<1523701070390366208:profile|CostlyOstrich36> and @<1523701205467926528:profile|AgitatedDove14> , after that bit on information, can you tell me where I can find the differences between the community server and self hosted server?
Are there any additional downsides to migrating to a self hosted server?
Oh so in that case I'll need to change every agent's pip config file.
@<1523701070390366208:profile|CostlyOstrich36> my repo is like this and both the files are located at the same dir so its weird that they cannot find train:
.
├── pytorch
├── tensorflow
│ ├── Project A
│ │ └── src
│ ├── Project B
│ │ ├── data
│ │ ├── model
│ │ ├── reports
│ │ └── utils
│ ├── hand_validator_boxes
│ │ ├── src
│ │ ├── train.py (the module i need)
│ │ └── clearml_pipeline.py (where the pipeline is initilizied
└── utils
Ok cool, I'll try that, Thanks
when i tried doing with the decorators it threw me an error that it cannot run task init in side a working task (the pipe lines task)
@<1523701087100473344:profile|SuccessfulKoala55> But when i use this setting it the packages download only from the torch repo and not a local repo correct? or does it use the url-extra-link? and is there a way to cancel the auto cuda detect?
Thanks I'll look into that, but in the end we decided to add a private repo with the pytorch libraries that we need.
I've added the extra_index_url to point to our https and we changed to requirements.txt to look for that https however I'm getting this that warning I've attached and its still trying to download the packages not from my path.
how do i enable clearml-agent to look for private repos?
It’s running a agent without docker, we aren’t using docker
@<1523701087100473344:profile|SuccessfulKoala55> in the file example here there is no reference to console_cr_flush_period
yep, just a string which is a path but not to upload the folder
No, until now we used the default server that is handled by Clearml and we want to transfer to a self hosted one
Yes, here is the log file.
we had a few experiments that were stuck for a few hours until we noticed that and we also had 1 that was stuck for 2 days (on the weekend). and they weren't auto aborted.