![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/ReassuredTiger98.png)
Reputation
Badges 1
606 × Eureka!The one I posted on top 22.03-py3
😄
btw: I am pretty sure this used to work, but then stopped work some time ago.
Maybe the difference is that I am using pipnow and I used to use conda! The NVIDIA PyTorch container uses conda. Could that be a reason?
How can I see that?
I have venv_update.enabled: true
and detect_with_conda_freeze: true
Good to know!
I think the current solutions are fine. I will try it first and probably will have some more questions/problems 🙂
test_clearml
, so directly from top-level.
` # Connecting ClearML with the current process,
from here on everything is logged automatically
task = Task.init(project_name="examples", task_name="artifacts example")
task.set_base_docker(
"my_docker",
docker_arguments="--memory=60g --shm-size=60g -e NVIDIA_DRIVER_CAPABILITIES=all",
)
if not running_remotely():
task.execute_remotely("docker", clone=False, exit_process=True)
timer = Timer()
with timer:
# add and upload Numpy Object (stored as .npz file)
task.upload_a...
I think sometimes there can be dependencies that require a newer pip version or something like that. I am not sure though. Why can we even change the pip version in the clearml.conf?
I just manually went into the docker container and ran python -m venv env --system-site-packages
and activated the virtual env.
When I run pip list
then, it correctly shows the preinstalled packages including torch 1.12.0a0+2c916ef
Here it is
I am going to try it again and send you the relevant part of the logs in a minute. Maybe I am interpreting something wrong.
` =============
== PyTorch ==
NVIDIA Release 22.03 (build 33569136)
PyTorch Version 1.12.0a0+2c916ef ...
Looking in indexes: ,
Requirement already satisfied: pip in /root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages (22.0.4)
2022-04-07 16:40:57
Looking in indexes: ,
Requirement already satisfied: Cython in /opt/conda/lib/python3.8/site-packages (0.29.28)
Looking in indexes: ,
Requirement already satisfied: numpy==1.22.3 in /opt/conda/...
I have a related question: I read here that 4GB is a http limitation and ClearML will not chunk single files. I take from that, that ClearML did not want/there was no need to implement an own solution so far. But what about models that are larger than 4GB?
If you compare the two outputs it put at the top of this thread, the one being the output if executed locally and the other one being the output if executed remotely, it seems like command
is different and wrong on remote.
Here is some code that shows exactly what goes wrong. I do local execution only. It seems not to be related to remote execution as I thought, but more related to clearml.Task:
` args = parser.parse_args()
print(args) # FIRST OUTPUT
command = args.command
enqueue = args.enqueue
track_remote = args.track_remote
preset_name = args.preset
type_name = args.type
environment_name = args.environment
nvidia_docker = args.nvidia_docker
# Initialize ClearML Tas...
One last question then I have everything solved: Is it possible to pass clearml the files to analyze manually? For example my setup consists of a run_this.py
and this_should_be_run_A.py
and this_should_be_run_B.py
. I can then programmatically choose which file to import with importlib. Is there a way to tell clearml programmatically to analyze the files, so it can built up the requirements correctly?
Or alternatively I just saw that Task.create
takes a requirements.txt
as an argument. This would also be fine for me, however I am not sure whether I should use Task.create
?
I think such an option can work, but actually if I had free wishes I would say that the clearml.Task code would need some refactoring (but I am not an experienced software engineer, so I could be totally wrong). It is not clear, what and how Task.init
does what it does and the very long method declaration is confusing. I think there should be two ways to initialize tasks:
Specify a lot manually, e.g. ` task = Task.create()
task.add_requirements(from_requirements_files(..))
task.add_entr...
Mhhm, then maybe it is not clear 😂 to me how clearml.Task is meant to be used. I thought of it as being a container for all the information regarding a single experiment that is reflected on the server-side and by this in the WebUI. Now I init() a Task and it will show in the WebUI. I thought after initialization I can still update the task to my liking, i.e. it being a documentation of my experiment.
I think doing all that work is not worth it right now, I am just trying to understand why I clearml seems not to be designed something like this:
` task_name = args.task_name
task = Task()
task = task.load_statedict(await Task.load_or_create(task_name))
task.requirements.add(...)
await task.synchronize()
task.execute_remotely(queue_name, exit=True) `
Long story short, the Task requirements are async, so if one puts it after creating the object (at least in theory), it might be too late.
AgitatedDove14 Is there no await/synchronize method to wait for task update?
Then I could also do this:# My custom very special use case task = Task() task = task.load_statedict(await Task.load_or_create(task_name)) await task.synchronize() await run_code_analysis() task.add_requirement("myreq") await task.synchronize()
If you think the explanation takes too much time, no worries! I do not want to waste your time on my confusion 😄
Yes, I did not change this part of the config.
@<1576381444509405184:profile|ManiacalLizard2> Thank you, but afaik this only works locally and not if you run your task on a clearml-agent!