Reputation
Badges 1
611 × Eureka!I think sometimes there can be dependencies that require a newer pip version or something like that. I am not sure though. Why can we even change the pip version in the clearml.conf?
Tried to install cudatoolkit==11.1 manually in this environemnt and got:
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Package xz conflicts for:
python=3....
Ah, sore should have been more specific. I mean on the ClearML server.
AgitatedDove14 Yea, I also had this problem: https://github.com/allegroai/clearml-server/issues/87 I have Samsung 970 Pro 2TB on all machines, but maybe something is missconfigured like SuccessfulKoala55 suggested. I will take a look. Thank you for now!
Perfect, works! I was looking for "host", didn't come to my mind to search for "worker". Any idea about getting the user that created the task?
Specific step in the pipeline. The main step (the experiment) is currently just a file with a Task.init ` and then the experiment code. I am wondering how to modify this code such that it can be run in the pipeline or as standalone.
Thank you. I am still having the issue. I verified that output_uri of Task.init works and also clearml-data with MinIO storage works, but the logger still throws errors
Installed packages:
` # Python 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0]
absl-py==0.12.0
aiostream==0.4.2
attrs==20.3.0
cached-property==1.5.2
cffi==1.14.5
chardet==4.0.0
clearml==0.17.5
cython==0.29.22
dm-control==0.0.364896371
dm-env==1.4
dm-tree==0.1.5
fasteners==0.16
furl==2.1.0
future==0.18.2
glfw==2.1.0
gym==0.18.0
h5py==3.2.1
humanfriendly==9.1
idna==2.10
imageio-ffmpeg==0.4.3
importlib-metadata==3.7.3
jsonschema==3.2.0
labmaze==1.0.4
lxml==4.6.3
moviepy==1.0.3
mujoco-py==...
Both, actually. So what I personally would find intuitive is something like this:
` class Task:
def load_statedict(self, state_dict):
pass
async def synchronize(self):
...
async def task_execute_remotely(self):
await self.synchronize()
...
def add_requirement(self, requirement):
...
@classmethod
async def init(task_name):
task = Task()
task.load_statedict(await Task.load_or_create(task_name))
await tas...
AlertBlackbird30 Thanks for asking. Just take everything with I grain of salt I say, because I am also not sure whether I do machine learning the correct way 😄
I think you got the right idea. I actually do reinforcement learning (RL), so I have multiple RL-environments and RL-agents. However, while the code for the agents differs between the agents, the glue code is the same. So what I do is I call python run_experiment.py --agent http://myproject.agents.my ` _agent --environm...
I have no idea whether it is a user error or because of the clearml-server update...
I mean if I do CLEARML_DOCKER_IMAGE=my_image clearml-task something something it will not work, right?
My clearml-server server crashed for some reason, so I won't be able to verify until tomorrow.
It is weird though. The task is submitted by the original user and then run on the agent. The task however is still registered by the original user, since it is created by the original user.
Makes more sense to just inherit the user from the task than from the agent?
Oh you are right. I did not think this through... To implement this properly it gets to enterprisy for me, so I ll just leave it for now :D
At least when you use docker containers the agent will reuse the existing python environment.
I was wondering whether some solution is builtin in clearml, so I do not have to configure each server manually. However, from your answer I take that this is not the case.
Yea, something like this seems to be the best solution.
I have no idea myself, but what the serverfault thread says about man-in-the-middle makes sense. However this also prohibits an automatic solution except for a shared known_hosts file I guess.
Latest version for everything. I will message you again, if I encounter this problem again.
It is not explained there, but do you meanCLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY:-} CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY:-}?
If you compare the two outputs it put at the top of this thread, the one being the output if executed locally and the other one being the output if executed remotely, it seems like command is different and wrong on remote.