Reputation
Badges 1
611 × Eureka!So if understand correctly, something like this should work?
task = Task.init(...) task.connect_configuration( {"agent.package_manager.system_site_packages": False} ) task.execute_remotely(queue_name, clone=False, exit_process=True)
Exactly. I don't want people to circumvent the queue 🙂
But I do not have anything linked correctly since I rely in conda installing cuda/cudnn for me
You can add and remove clearml-agents to/from the clearml-server anytime.
So actually deleting from client (e.g. an dataset with clearml-data) works.
I am not sure what happened, but my experiments are gone. However, the data directory is still filled.
Here it is
There is no way to create an artifact/model/dataset without a task, right? Just always inherit from the parent task. And if cloned change the user to the user who did the clone.
(just for my own interest: how much does the enterprise version divert from the open source version? It it just extended or are there core changes to the enterprise version)
Let me check again.
Maybe something like this is how it is intended to be used?
` # run_with_clearml.py
def get_main_task():
task = Task.create(project="my_project", name="my_experiment", script="main_script.py")
return task
def run_standalone(task_factory):
Task.enqueue(task_factory())
def run_in_pipeline(task_factory):
pipe = Pipelinecontroller()
pipe.add_step(preprocess, ...)
pipe.add_step(base_task_factory=task_factory, ...)
pipe.add_step(postprocess, ...)
pipe.start()
if...
So with pipeline decorators can I implement this logic?
Thanks for answering, but I still do not get it. file_history_size decides how many past files are shown? So if file_history_size=100 and I have 1 image/iteration and ran 1000 iterations, I will see images for iteration 900-1000?
It seems like the services-docker is always started with Ubuntu 18.04, even when I usetask.set_base_docker( "continuumio/miniconda:latest -v /opt/clearml/data/fileserver/:{}".format( file_server_mount ) )
It seems like clearml removes the dev... from torch == 1.14.0.dev20221205+cu117 in the /tmp/ cached requirements.txt
Is there a clearml.conf for this agent somewhere?
[root@dc01deffca35 elasticsearch]# curl `
{
"cluster_name" : "clearml",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 10,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 10,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_nu...
Can you ping me when it is updated in None so I can update my installation?
Could be clean log after restart. Unfortunately, I restarted the server right away 😞 I gonna post if it happens again with the appropriate logs.
Args is similar to what is shown in print(args) when executed remotely.
So missing args that are not specified are not None like intended, but just do not exists in args . And command is a list instead of a single str.
AnxiousSeal95 Thanks a lot. Seems to be working fine for me. I see the clearml-agent version that pip installs in the docker is now fixed to the host version 🙂 PyTorch Nightly is also installed correctly now!
Okay, this seems to work fine.
Also here is how I run my experiments right now, so I can execute them locally and remotely:
` # Initialize ClearML Task
task = (
Task.init(
project_name="examples",
task_name=args.name,
output_uri=True,
)
if track_remote or enqueue
else None
)
# Execute remotly via CLearML
if enqueue is not None and not running_remotely():
if enqueue == "None":
queue_name = None
task.reset()
...
I have set default_output_uri to s3://my_minio_instance:9000/clearml
If I set files_server to s3://my_minio_instance:9000 /bucket_that_does_not_exist it fails at uploading metrics, but model upload still works:
WARNING - Failed uploading to s3://my_minio_instance:9000/ bucket_that_does_not_exist ('NoneType' object has no attribute 'upload')
clearml.Task - INFO - Completed model upload to s3://my_minio_instance:9000/clearml
What is ` default_out...