Reputation
Badges 1
611 × Eureka!Ah, sore should have been more specific. I mean on the ClearML server.
AgitatedDove14 Yea, I also had this problem: https://github.com/allegroai/clearml-server/issues/87 I have Samsung 970 Pro 2TB on all machines, but maybe something is missconfigured like SuccessfulKoala55 suggested. I will take a look. Thank you for now!
Perfect, works! I was looking for "host", didn't come to my mind to search for "worker". Any idea about getting the user that created the task?
Specific step in the pipeline. The main step (the experiment) is currently just a file with a Task.init ` and then the experiment code. I am wondering how to modify this code such that it can be run in the pipeline or as standalone.
Thank you. I am still having the issue. I verified that output_uri of Task.init works and also clearml-data with MinIO storage works, but the logger still throws errors
Installed packages:
` # Python 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0]
absl-py==0.12.0
aiostream==0.4.2
attrs==20.3.0
cached-property==1.5.2
cffi==1.14.5
chardet==4.0.0
clearml==0.17.5
cython==0.29.22
dm-control==0.0.364896371
dm-env==1.4
dm-tree==0.1.5
fasteners==0.16
furl==2.1.0
future==0.18.2
glfw==2.1.0
gym==0.18.0
h5py==3.2.1
humanfriendly==9.1
idna==2.10
imageio-ffmpeg==0.4.3
importlib-metadata==3.7.3
jsonschema==3.2.0
labmaze==1.0.4
lxml==4.6.3
moviepy==1.0.3
mujoco-py==...
Both, actually. So what I personally would find intuitive is something like this:
` class Task:
def load_statedict(self, state_dict):
pass
async def synchronize(self):
...
async def task_execute_remotely(self):
await self.synchronize()
...
def add_requirement(self, requirement):
...
@classmethod
async def init(task_name):
task = Task()
task.load_statedict(await Task.load_or_create(task_name))
await tas...
AlertBlackbird30 Thanks for asking. Just take everything with I grain of salt I say, because I am also not sure whether I do machine learning the correct way 😄
I think you got the right idea. I actually do reinforcement learning (RL), so I have multiple RL-environments and RL-agents. However, while the code for the agents differs between the agents, the glue code is the same. So what I do is I call python run_experiment.py --agent http://myproject.agents.my ` _agent --environm...
I have no idea whether it is a user error or because of the clearml-server update...
I mean if I do CLEARML_DOCKER_IMAGE=my_image clearml-task something something it will not work, right?
My clearml-server server crashed for some reason, so I won't be able to verify until tomorrow.
Oh you are right. I did not think this through... To implement this properly it gets to enterprisy for me, so I ll just leave it for now :D
At least when you use docker containers the agent will reuse the existing python environment.
I was wondering whether some solution is builtin in clearml, so I do not have to configure each server manually. However, from your answer I take that this is not the case.
Yea, something like this seems to be the best solution.
I have no idea myself, but what the serverfault thread says about man-in-the-middle makes sense. However this also prohibits an automatic solution except for a shared known_hosts file I guess.
Latest version for everything. I will message you again, if I encounter this problem again.
It is not explained there, but do you meanCLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY:-} CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY:-}?
If you compare the two outputs it put at the top of this thread, the one being the output if executed locally and the other one being the output if executed remotely, it seems like command is different and wrong on remote.
` ocker-compose ps
Name Command State Ports
clearml-agent-services /usr/agent/entrypoint.sh Restarting
clearml-apiserver /opt/clearml/wrapper.sh ap ... Up 0.0.0.0:8008->8008/tcp, 8080/tcp, 8081/tcp ...
AgitatedDove14 Thank you, that explains it.
I mean, could my hard drive not become full at some point? Can clearml-agent currently detect this?
For me this does not work (at least with nested tqdm bars, did not try single ones yet).
When I passed specific arguments (for example --steps) it ignored them...
I am not sure what you mean by this. It should not ignore anything.