Reputation
Badges 1
45 × Eureka!Hi @<1523701205467926528:profile|AgitatedDove14> ,
That solved my problem thank you, from my deep dive I've found the problem there was a package called install
that changed it's name to pip-install
and its requirement was setuptools
.
Thanks again for the help.
@<1523701087100473344:profile|SuccessfulKoala55> After going into the steps full details I reset the step and enqueued it
@<1523701087100473344:profile|SuccessfulKoala55> did i do something wrong?
Hi @<1523701070390366208:profile|CostlyOstrich36> ,
but how do I configure this if I'm not hosting the clearml server?
where can i find the services.conf file?
we had a few experiments that were stuck for a few hours until we noticed that and we also had 1 that was stuck for 2 days (on the weekend). and they weren't auto aborted.
sadly the teammate that had the problem re-ran the experiments so i don't have the taskids but I do have the cpu and gpu usage of the agent that ran the experiment:
Hi @<1523701070390366208:profile|CostlyOstrich36> , Here is a better explanation of my situation, in my IDE the working directory is where my code starts and I'm importing from common_utils my custom augmentations and locally the code is working with the import I've added in my previous message, however when i run from ClearML agent the import from point a to point b isn't working however they are both in the same git repo and i don't want to copy the files into project_1 as to not have unne...
its working now, thanks that was the problem.
yeah i see it now in the requirements of the task, that's weird, I'll create a new environment and check it again, thanks
Just upgraded to clearml-agent==1.5.1 and I still get this error.
@<1523701087100473344:profile|SuccessfulKoala55> But when i use this setting it the packages download only from the torch repo and not a local repo correct? or does it use the url-extra-link? and is there a way to cancel the auto cuda detect?
It’s running a agent without docker, we aren’t using docker
Yes, same one
Thanks I'll look into that, but in the end we decided to add a private repo with the pytorch libraries that we need.
I've added the extra_index_url to point to our https and we changed to requirements.txt to look for that https however I'm getting this that warning I've attached and its still trying to download the packages not from my path.
how do i enable clearml-agent to look for private repos?
yes sometimes I suffer from small network issues, is there a way to make clearml have a bigger timeout when installing packages?
and if not is there a way to point it to a local package for installation or a local virtual enviroment?
Oh so in that case I'll need to change every agent's pip config file.
Thanks John, I read the one about the pip timeout, the problem is that I'm assume clearml runs the following command :
"pip install -r requirments.txt" and I want to know if I make clearml add the timeout flag.
@<1523701087100473344:profile|SuccessfulKoala55> yes the working dir is set to the correct path and yet it cannot import the train module
@<1523701087100473344:profile|SuccessfulKoala55> and @<1523701070390366208:profile|CostlyOstrich36> , in the end I've found the problem, it was due to me running the pipeline locally and when running the pipeline locally it, doesn't copy all the dir but only the script that is running None
@<1523701070390366208:profile|CostlyOstrich36> my repo is like this and both the files are located at the same dir so its weird that they cannot find train:
.
├── pytorch
├── tensorflow
│ ├── Project A
│ │ └── src
│ ├── Project B
│ │ ├── data
│ │ ├── model
│ │ ├── reports
│ │ └── utils
│ ├── hand_validator_boxes
│ │ ├── src
│ │ ├── train.py (the module i need)
│ │ └── clearml_pipeline.py (where the pipeline is initilizied
└── utils
The flow is: Training.py (which creates and runs a training task) -> conversion_task.py (converts the outputs of the models into a format of our choosing) -> testing.py (testing the model after conversion).
I tried using the decorators and fucntions but they both threw me errors that i cannot do task init in side a running task.
Ok cool, I'll try that, Thanks
@<1523701087100473344:profile|SuccessfulKoala55> What I'm trying to do is connect 3 different tasks into 1 pipeline but still being able to run each task as an individual when needed but without changing the tasks code. for example i have a training.py file which runs task.init in the start and creates a task in the server for training a new model, but i want also to create a pipeline that will run that training.py and other tasks together, is that more clear now?
when i tried doing with the decorators it threw me an error that it cannot run task init in side a working task (the pipe lines task)
not sure what that means to be honest @<1523701070390366208:profile|CostlyOstrich36>
Thanks @<1523701070390366208:profile|CostlyOstrich36> , but doesn’t the agent create/caches an environment from the requirements.txt when running? I’m reproducing an old project that used to work like that, and also my ClearML.conf set to work that way
Hi @<1523701070390366208:profile|CostlyOstrich36> , it is part of the repository, do pipelines run differently then normal tasks? what I mean is when i run a task it has a working directory do pipelines also have that or are their working directory the root of the repo?
@<1523701070390366208:profile|CostlyOstrich36> After discussing with my TL, we think the plan we are subscribed to might not be for us, can you point me to a person who we can have a meeting with and advice us the best plan for my team?