Reputation
Badges 1
611 × Eureka!` # Connecting ClearML with the current process,
from here on everything is logged automatically
task = Task.init(project_name="examples", task_name="artifacts example")
task.set_base_docker(
"my_docker",
docker_arguments="--memory=60g --shm-size=60g -e NVIDIA_DRIVER_CAPABILITIES=all",
)
if not running_remotely():
task.execute_remotely("docker", clone=False, exit_process=True)
timer = Timer()
with timer:
# add and upload Numpy Object (stored as .npz file)
task.upload_a...
Obviously in my examples there is a lot of stuff missing. I just want to show, that the user should be able to replicate Task.init easily so it can be configured in every way, but still can make use of the magic that clearml has, for stuff that does not differ from the comfort way.
Mhhm, now conda env creation takes forever since it probably resolves conflicts. At least that is what is happening when I tried to manually install my environment
I just manually went into the docker container and ran python -m venv env --system-site-packages and activated the virtual env.
When I run pip list then, it correctly shows the preinstalled packages including torch 1.12.0a0+2c916ef
Hi TimelyMouse69 Thank you for your answer.
I use 3.10.8 locally and 3.10.6 remotely. Everything is run in a docker container, locally and remotely on the docker-agent (exactly the same docker image).
Thank you for looking into the disappearing dev . It seems like this should be the reason for pip trying to install a stable version of 1.14, which does only exist as nightly
btw: Could you check whether agent.package_manager.system_site_packages is true or false in your config and in the summary that the agent gives before execution?
I start my agent in --foreground mode for debugging and it clearly show false , but in the summary that the agent gives before the task is executed, it shows true .
You mean I can add exactly what you wrote--extra-index-url clearml torch == 1.14.0.dev20221205+cu117 torchvision == 0.15.0.dev20221205+cputo the installed packages section?
What I am trying to do it install thistorch == 1.14.0.dev20221205+cu117 torchvision == 0.15.0.dev20221205+cpuIs this what you mean by specific build?
Ah, okay, that's weird 🙂 Thank you for answering!
My code is in classes, indeed. But I have more than one model. Actually, all the things that people store in for example yaml or json configs I store in python files. And I do not want to statically import all the models/configs.
Sounds like a good hack, but not like a good solution 😄 But thank you anyways! 🙂
Another example on what I would expect:
` ### start_carla.py
def get_task():
task = Task.init(project_name="examples", task_name="start-carla", task_type="application")
# experiment is not run here. The experiment is only run when this is executed as standalone or on a clearml-agent.
return task
def run_experiment(task):
...
This task can also be run as standalone or run by a clearml-agent
if name == "main":
task = get_task()
run_experiment(task)
run_pi...
Btw: I think Task.init is more confusing than Task.create and I would rather rename the former.
Or alternatively I just saw that Task.create takes a requirements.txt as an argument. This would also be fine for me, however I am not sure whether I should use Task.create ?
However, this seems like a pretty edge-case to me - why would you do that on a regular basis?
For me this is how I use ClearML as tensorboard replacement. To start some debug runs before adding it to a clearml-agent queue. For me this seems like the most common usage case or am I missunderstanding ClearML?
Also, is max_workers about compression threads or upload threads or both?
Just tested it again. Here is my config:
https://gist.github.com/mctigger/086c5f8071a604605e9f7a172800b51d
In the Web UI under Configuration -> Hyper Parameters -> Environment I can see the following:MUJOCO_GL osmesa
Maybe the problem is that I do not start my docker containers from the root user, so 1001 is a mapping inside the docker to my actual user. Could it be that on the host the owner if your .ssh files is called root ?
It is only a single agent that is sending a single artifact. server-->agent is fast, but agent-->server is slow.
Thank you. I am not trying to use this option to speed up the setup. I have some package (carla simulator PythonAPI) that has no pip support (only easy_install). So I am thinking about just installing this manually on the worker, so that tasks can assume, that carla is provided by the system
Unfortunately, not. Quick question: Is there caching happening somewhere besides .clearml ? Does the boto3 driver create cache?
mytask.get_logger().current_logger().set_default_upload_destination(" s3://ip:9000/clearml ") this is what I do. Do you do the same?
Or there should be an early error for trying to run conda based tasks on pip agents
Afaik, clearml-agent will use existing installed packages if they fit the requirements.txt. E.g. pytorch >= 1.7 will only install PyTorch if the environment does not already provide some version of PyTorch greater or equal to 1.7.
Thanks! I am fascinated by what you guys offer with clearml 🙂