
Reputation
Badges 1
131 × Eureka!AnxiousSeal95 Okay it seems to work with a compute optimized c2-standard-4
instance
I'm reffering https://clearml.slack.com/archives/CTK20V944/p1668070109678489?thread_ts=1667555788.111289&cid=CTK20V944 mapping the project to ClearML project and https://github.com/ultralytics/yolov5/tree/master/utils/loggers/clearml that when calling the trainin g.py from my machine successfully logged the training on clearML and uploaded the artifact correctly
SmugDolphin23 But the training.py has already a CLearML task created under the hood since its integration with ClearML, beside initing the task before the execution of the file like in my snippet is not sufficient ?
AgitatedDove14 Got that invalid region error on the set_upload_destination()
while the region ( aws-global
) I specified in my agent config worked fine to retrieve a dataset from the same bucket
` 2022-11-04 15:05:40,784 - clearml.storage - ERROR - Failed testing access to bucket XXXXX: incorrect region specified for bucket XXXX (detected region eu-central-1)
Traceback (most recent call last):
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/model.py", l...
SuccessfulKoala55 Mostly the VM instances types and properties, execution queue and app name.
Ia lready deleted ~/.clearml/cache
but I'll try deleting the entire folder
Ah no i cant since the pipeline is in its own dummy model and you cannot reattach pipelines to real projects so I must instanciate a dummy task just to attach the output model to the correct project
Nice, thank you for the reactivity ❤
I got some credentials issues to in some pipelines steps and I solved it using
task = Task.current_task()
task.setup_aws_upload(...)
It can allows you to explicitaly specify credentials
Thus the main difference of behavior must be coming from the _debug_execute_step_function
property in the Controller
class, currently skimming through it to try to identify a cause, did I provide you enough info btw CostlyOstrich36 ?
I have a pipeline with a single component:
` @PipelineDecorator.component(
return_values=['dataset_id'],
cache=True,
task_type=TaskTypes.data_processing,
execution_queue='Quad_VCPU_16GB'
)
def generate_dataset(start_date: str, end_date: str, input_aws_credentials_profile: str = 'default'):
"""
Convert autocut logs from a specified time window into usable dataset in generic format.
"""
print('[STEP 1/4] Generating dataset from autocut logs...')
import os
...
Ah thank you I'll try that ASAP
Well its not working, this params seems to be used to override the repo to pull since it has a str type annotation anyway, ClearML still attempted to pull the repo
Does it happens for all your packages or for a specific one ?
Hum, must be more arcane then, I guess the official support would be able to provide an answer, they usually answer within 24 hours
Would gladly try to run it on a remote instance to verify the thesis on some local cache acting up but unfortunately also ran into an issue with the GCP autoscaler https://clearml.slack.com/archives/CTK20V944/p1665664690293529
Oh, it's a little strange the comment lines about it were in the agent section
Well if you have:
ret_obj = None
for in in range(5):
ret_obj = step_x(ret_obj)
SInce the orchestration automatically determine the order of execution using the logic of return objects the controller will execute them sequentially.
However, if your steps don't have dependencies like this:
for i in range(5):
step_x(...)
It will try to execute them concurrently
Old tags are not deleted. When executing a Task (experiment) remotely, this method has no effect).
This description in the add_tags()
doc intrigues me tho, I would like to remove a tag from a dataset and add it to another version (eg: a used_in_last_training
tag) and this method seems to only add new tags.
Oh, that's nice, if I import a model using InputModel do I still need to specify a OutputModel ?
My agents are stared through systemd so maybe I should specify the env in the service file the clearml.conf
file looks like it has a section to do it properly (see 2nd point above)
I would try to not run it locally but in your execution queues on a remote worker, if that's not it it is likely a bug
Hey CostlyOstrich36 did you find anything on interest on the issue ?
Well solved, it's not as beautiful but I guess i can put them in a env file with an arbitrary name in the init script and just pass that file as exec argument...
Did you properly install Docker and Docker nvidia toolkit ? here's the init script i'm using on my autoscaled workers:
#!/bin/sh
sudo apt-get update -y
sudo apt-get install -y \
ca-certificates \
curl \
gnupg \
lsb-release
sudo mkdir -p /etc/apt/keyrings
curl -fsSL
| sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg]
\
$(lsb_release -cs) stable" | s...
Hey, did you checked that out ? None