Reputation
Badges 1
606 × Eureka!If I understood correctly, if I tried to print(os.environ["MUJOCO_GL"])
after the clearml Task is created, this should be set?
Just tested it again. Here is my config:
https://gist.github.com/mctigger/086c5f8071a604605e9f7a172800b51d
In the Web UI under Configuration -> Hyper Parameters -> Environment
I can see the following:MUJOCO_GL osmesa
I forgot to add this:
` Here is my error:
Traceback (most recent call last):
File "src/run_gym.py", line 25, in <module>
print(os.environ["MUJOCO_GL"])
File "/home/tim/.clearml/venvs-builds/3.7/lib/python3.7/os.py", line 681, in getitem
raise KeyError(key) from None
KeyError: 'MUJOCO_GL' `
This is at the top of my script.
So the environment variables are not set by the clearml-agent, but by clearml itself
Thank you very much. I am going to try that.
Thank you, good to know!
(btw: the simulator is called carla, not clara :))
Actually, my current approach looks like this:
carla-server-task : Launch carla server instance on a random port, set the port as param and then block the task/process, so I can kill carla when this task is aborted. This task keeps running the whole time.
start-carla-task : Launch a carla-server-task and wait for the port parameter to be set. Set the launched carla-server-task task-id and the port as param. Set task completed.
main-task : Run experiment when all start-carla-task are...
@<1576381444509405184:profile|ManiacalLizard2> Yea, that makes sense. However, my problem is that I do not want to set it on the remote clearml-agent, since every use may have a different storage. E.g. one user pushes to Azure, while another one pushes to S3
Yea, it was finished after 20 hours. Since the artifact started uploading when the experiment finishes otherwise, there is no reporting for the the time where it uploaded. I will debug it and report what I find out
Yea, and the script ends with clearml.Task - INFO - Waiting to finish uploads
I see a python 3 fileserver.py
running on a single thread with 100% load.
481.2130692792125 seconds
Done
Yes, I did not change this part of the config.
I am wondering where to put my experiment logic, so that it gets lazily executed and not at task definition time (i.e. in get_task_experiment()
how to get my experiment logic in there without running it)
Maybe let s put it in a different way:
Pipeline
Preprocess Task Main Task Postprocess Task
My main task is my experiment, so my training code. When I ran the main task standalone, I just used Task.init
and set up the project name, task name, etc.
Now what I could do is push this task to the server, then just reference the task by its task-ID and run the pipeline. However, I do not want to push the main task to the server before running. Instead I want to push the whole pipeline, but st...
Wow, thank you very much. And how would I bind my code to task? Should I still use Task.init
and it will just use the file it is called in as entrypoint or should I create a task using Task.create
and specify the script?
Another example on what I would expect:
` ### start_carla.py
def get_task():
task = Task.init(project_name="examples", task_name="start-carla", task_type="application")
# experiment is not run here. The experiment is only run when this is executed as standalone or on a clearml-agent.
return task
def run_experiment(task):
...
This task can also be run as standalone or run by a clearml-agent
if name == "main":
task = get_task()
run_experiment(task)
run_pi...
Also here is how I run my experiments right now, so I can execute them locally and remotely:
` # Initialize ClearML Task
task = (
Task.init(
project_name="examples",
task_name=args.name,
output_uri=True,
)
if track_remote or enqueue
else None
)
# Execute remotly via CLearML
if enqueue is not None and not running_remotely():
if enqueue == "None":
queue_name = None
task.reset()
...
I am also wondering how I integrate my (preexisting) main task in the pipeline. I start my main task like this: python my_script.py --myarg "myargs"
. How are the arguments captured? I am very confused, how one integrates this correctly...
Is there a way to capture uncommited changes with Task.create
just like Task.init
does? Actually, I would like to populate the repo, branch and packages automatically...
@<1576381444509405184:profile|ManiacalLizard2> Maybe you are using the enterprise version with the vault? I suppose the enterprise version is running differently, but I dont have experience with it.
For the open-source version, each clearml-agent is using it's own clearml.conf
So my network seems to be fine. Downloading artifacts from the server to the agents is around 100 MB/s, while uploading from the agent to the server is slow.
An upload of 11GB took around 20 hours which cannot be right. Do you have any idea whether ClearML could have something to do with this slow upload speed? If not I am going to start debugging with the hardware/network.
But it is not related to network speed, rather to clearml. I simple file transfer test gives me approximately 1 GBit/s transfer rate between the server and the agent, which is to be expected from the 1Gbit/s network.
Artifact Size: 74.62 MB