Having the ability to pack jobs/tasks onto the same "resource" (underlying server/EC2 instance)
This is essentially a "queue". Basically a queue is a way to abstract a specific type of resource, so that you can achieve exactly what you descibed.
open up a streaming use case, wherein batch (offline) inference could be done directly inside of a ClearML pipeline in reaction to an event/trigger (like new data landing in your data lake).
Yes, that's exactly how clearml is designed, a...
I will take any suggestion πgit remote -v
could be a good start but I'm not familiar with the output structure, is there a template for parsing ?
By default the agent will add the root of the git repository into the pythonpath , so that you can import...
I think CostlyOstrich36 managed to reproduce?!
So basically development on a "shared" GPU?
Hi CostlyElephant1
What do you mean by "delete raw data"? Data is always fetched to cached folders and clearml takes care of cache cleanup
That said notice that get mutable copy is a target you specify, in this case you should definetly delete after usage. Wdyt ?
I managed to set up my (Windows) laptop as a worker and reproduce the issue.
Any insight on how we can reproduce the issue?
PompousParrot44
Check out the task.execute_remotely()
You can call it right after the task init, and it will enqueue your running Task, and leave the process (if you want).
https://github.com/allegroai/trains/blob/65a4aa7aa90fc867993cf0d5e36c214e6c044270/trains/task.py#L1437
SubstantialElk6 if you call Task.init with continue_last_task=<task_id> it will automatically add the last_iteration of the previous run, to any logging/report so you never overwrite the previous reports π
Hi @<1562610699555835904:profile|VirtuousHedgehong97>
I think you need to upgrade your self-hosted clearml-server, could that be the case?
Omg that's a lot of submodules!
It has nothing with what the tasks sees if you are inside a git repo you will have to cone it on the remote machine. Let me check in the code maybe you have a workaround
We just donβt want to pollute the server when debugging.
Why not ?
you can always remove it later (with Task.delete) ?
I know about clearml.conf but wanted to avoid ssh-ing through 50 instances to edit it.
LOL yeah, btw: this is exactly the reason the enterprise version has a vault feature, so one could edit the base configuration in the UI and it automatically propagates everywhere
but docker_arguments doesn't propagate if I leave docker_image as None
yeah, that's correct, you have to select a container to be used
No should be fine... Let me see if I can get a windows box π
How are you getting:
beautifulsoup4 @ file:///croot/beautifulsoup4-split_1681493039619/work
is this what you had on the Original manual execution ? (i.e. not the one executed by the agent) - you can also look under "org _pip" dropdown in the "installed packages" of the failed Task
Is there any documentation on versioning for Datasets?
You mean how to select the version name ?
Could it be someone deleted the file? this is inside the temp venv folder but it should not get there
For example, the
Task
object is heavily overloaded and its documentation would benefit from being separated into logical units of work. It would also make it easier for the ClearML team to spot any formatting issues.
This is a very good point (the current documentation is basically docstring, but we should create a structured one)
... but some visualization/inline code with explanation is also very much welcome.
I'm assuming this connected with the previous po...
single task in the DAG is an entire ClearML
pipeline
.
just making sure detials are not lost, "entire ClearML pipeline ." : the pipeline logic is process A running on machine AA.
Every step of that pipeline can be (1) subprocess, but that means the exact same environement is used for everything, (2) The DEFAULT behavior, each step B is running on a different machine BB.
The non-ClearML steps would orchestrate putting messages into a queue, doing retry logic, and tr...
Hi GreasyPenguin66
So the way clearml can store your notebook is by using the jupyter-notebook rest api. It assumes, that it can communicate with it as the kernel is running on the same machine. What exactly is the setup? is the jupyter-lab/notebook running inside the docker? maybe the docker itself is running with some --network argument ?
how did you try to restart them ?
Yes, but how did you restart the agent on the remote machine ?
Since I'm assuming there is no actual task to run, and you do not need to setup the environment (is that correct?)
you can do:$ CLEARML_OFFLINE_MODE=1 python3 my_main.py
wdyt?
Hi VexedCat68
can you supply more details on the issue ? (probably the best is to open a github issue, and have all the details there, so we have better visibility)
wdyt?
It's the same but done from outside, you want the same and "offline" as well right?
Failed to initialize NVML: Unknown Error
yeah this is a driver issue. I think you need to check the VM image if the drivers match the GPU on that machine
Hi ColossalAnt7 , I think we run into it on a few dockers, I believe the bug was fixed in the latest trains-agent
RC. Could you verify please ?
Well it should work out if the box as long as you have the full route, i.e. Section/param
PYTHONPATH is still not working as expected
inside your code if you do :import os print("PYTHONPATH", os.environ["PYTHONPATH"])
what are you getting?