Reputation
Badges 1
94 × Eureka!no, it is everything on my local machine
In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?
Actually I am still struggling with a problem of agent running on docker (message on starting at 10:54)
Agent works when I am running it from virtual environment but stucks in the same place all the time when I using Docker
more or less
What is interesting, it works when using virtual environment setup
But stucks at the same moment when using docker
clearml-agent daemon --docker --foreground --debug
usage: clearml-agent [-h] [--help] [--version] [--config-file CONFIG_FILE] [--debug]
{execute,build,list,daemon,config,init} ...
clearml-agent: error: unrecognized arguments: --debug
CostlyOstrich36 have you ever seen something like my case maybe?
Ubuntu 21.10 to be concrete
WARNING: You are using pip version 20.1.1; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip'
command.
Retrying (Retry(total=239, connect=239, read=240, redirect=240, status=240)) after co
nnection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at
` 0x7faf9da78400>: Failed to establish a ...
hmm, this might be a problem....
btw. why do I need to give my git name/pass to run it if I serve an agent from local?
Ok, I noticed something that might have been causing that. I didn't add "agent" section to config file...
Do I need to push the needed code to github if it needs to be cloned?
AgitatedDove14 do I need to have the repo that I am running on my account? Even if it is public repo, like repo with your (clearml) examples:
SOURCE CODE
REPOSITORY
https://github.com/allegroai/clearml.git
BRANCH NAME
Latest in branch master
SCRIPT PATH
pytorch_matplotlib.py
WORKING DIRECTORY
examples/frameworks/pytorch
?
So there is no way to use Agent without use of remote repo (just using local server not connected to Internet), am I right?
There is a git repo 🙂 my question was to clarify if I understand well. Thank you for response :)
AgitatedDove14 how does the Agent know which git repo from my account to clone for execution?
Yes, it is a good reason 🙂
Do you maybe know a tool that measures that during execution (to avoid looking on nvidia-smi
during all training)?
So, suppose, that a task T uses 27% of GPU, means, that we can spawn 3 agents on this GPU (suppose that we will give them only task T). Does it make sense?
there is no such option
SuccessfulKoala55 So, we have two problems:
Probably minor one, but strange. We run some number of workers using given compose file, that is attached in .zip. We can do:docker compose -f docker-compose-worker.yaml build docker compose -f docker-compose-worker.yaml up
and in theory there should be 10 agents running, but frequently, not 10 are shown in UI (for example on last run we got 3 of them). When we run htop
, we can see 10 agents in our system. What is even more strange, those...
because when I run that normally it differentiates workers basing on gpu that it is using