Hi @<1523703961872240640:profile|CrookedWalrus33> ! The way connect works by default is:
While running locally, all the values (and value changes) of a connected object are sent to the backend.
While running remotely (in your case here), all the values sent in the local run are fetched from the backend and the connected dictionary is populated with these values. The values are readonly, chaning them will not have any effect.
To avoid this behaviour, you could use the `ignore_remote_override...
Hi @<1523701842515595264:profile|PleasantOwl46> ! This looks like a python problem. A useful SO thread: None
First, I would verify that I can access the api server without using the SDK. To do so, run this code after filling the credentials yourself (just login should be enough to verify that the api server is reachable)
api_server = ""
access_key = ""
secret_ke...
ShinyPuppy47 does add_task_init_call help your case? https://clear.ml/docs/latest/docs/references/sdk/task/#taskcreate
Hi @<1689446563463565312:profile|SmallTurkey79> !Prior runs of this pipeline worked just fine What SDK version were you using for the prior runs? Does this still happen if you revert to that version?
Can you provide a script that imitates what you are doing?
In the pipeline you are running, are you creating new tasks/pipelines/datasets?
Hi @<1643060801088524288:profile|HarebrainedOstrich43> ! Could you please share some code that could help us reproduced the issue? I tried cloning, changing parameters and running a decorated pipeline but the whole process worked as expected for me.
After you do s['Function']['random_number'] = random.random() you still need to call set_parameters_as_dict(s)
@<1657556312684236800:profile|ManiacalSeaturtle63> can you share how you are creating your pipeline?
Also, do you need to close the task? It will close automatically when the program exits
@<1523701240951738368:profile|RoundMosquito25> sorry, actually add_pipeline_tags will add the tag pipe: ID to all steps, not a predefined tag. You will need to set the tags argument to your desired tags for each step individually
Hi @<1657918706052763648:profile|SillyRobin38> ! If it is compatible with http/rest, you could try setting api.files_server to the endpoint or sdk.storage.default_output_uri in clearml.conf (depending on your use-case).
@<1626028578648887296:profile|FreshFly37> can you please screenshot this section of the task? Also, how does your project's directory structure look like?
Hi @<1679661969365274624:profile|UnevenSquirrel80> ! Pipeline projects are hidden. You can try to pass task_filter={"search_hidden": True, "_allow_extra_fields_": True} to the query_tasks function to fetch the tasks from hidden projects
Can you actually add the bucket to the credentials just to try it out?
Also, can you check that this snippet works for you (with your creds):
import boto3
import json
import six
key = ""
secret = ""
host = "our_host.com"
bucket_name = "bucket"
profile = None
filename = "test"
data = {"test": "data"}
boto_session = boto3.Session(aws_access_key_id=key, aws_secret_access_key=secret, profile_name=profile)
endpoint = "https://" + host
boto_resource = boto_session.resource("s3", region_name...
Hi @<1523715429694967808:profile|ThickCrow29> ! We identified the issue. We will soon release a fix for it
@<1552101474769571840:profile|DepravedLion86> You shouldn't need to call wait explicitly. What happens if you don't?
Hi @<1702492411105644544:profile|YummyGrasshopper29> ! Parameters can belong to different sections. You should append it before some_parameter . You likely want ${step2.parameters.kwargs/some_parameter}
I will ask internally about this
Hi @<1523701949617147904:profile|PricklyRaven28> ! We released ClearmlSDK 1.9.1 yesterday. Can you please try it?
[package_manager.force_repo_requirements_txt=true] Skipping requirements, using repository "requirements.txt"
Try adding clearml to the requirements
Hi NonchalantGiraffe17 ! Thanks for reporting this. It would be easier for us to check if there is something wrong with ClearML if we knew the number and sizes of the files you are trying to upload (content is not relevant). Could you maybe provide those?
@<1545216070686609408:profile|EnthusiasticCow4> a PR would be greatly appreciated. If the problem lies in _query_tasks then it should be addressed there
what do you get when you run this code?
from clearml.backend_api import Session
print(Session.check_min_api_server_version("2.17"))
Hi @<1693795212020682752:profile|ClumsyChimpanzee88> ! Not sure I understand the question. If the commit ID does not exist remotely, then it can't be pulled. How would you pull the commit to another machine otherwise, is this possible using your current workflow?
Hi @<1578555761724755968:profile|GrievingKoala83> ! The only way I see this error appearing is:
- your process gets forked while
launch_multi_nodeis called - there has been a network error when receiving the response to Task.enqueue, then the call has been retried, resulting in this errorCan you verify one or the other?
Hi @<1523703472304689152:profile|UpsetTurkey67> ! What if in Task.init you set auto_connect_frameworks={"joblib": False} . Do you still have this issue?
@<1523701949617147904:profile|PricklyRaven28> thank you for the feedback. We will investigate this further
Hi @<1626028578648887296:profile|FreshFly37> ! You could try getting the version via user properties as well: None .
so something like p._task.get_user_properties().get("version")
Hi @<1578555761724755968:profile|GrievingKoala83> ! Are you trying to launch 2 nodes each using 2 gpus on only 1 machine? Because I think that will likely not work because of nccl limitation
Also, I think that you should actually do
task.launch_multi_node(nodes)
os.environ["LOCAL_RANK"] = 0 # this process should fork the other one
os.environ["NODE_RANK"] = str(current_conf.get("node_rank", ""))
os.environ["GLOBAL_RANK"] = str(current_conf.get("node_rank", "")) * gpus
os.environ["WORLD...
Hi @<1581454875005292544:profile|SuccessfulOtter28> ! You could take a look at how the HPO was built using optuna: None .
Basically: you should create a new class which inherits from SearchStrategy . This class should convert clearml hyper_parameters to some parameters the Ray Tune understands, then create a Tuner and run the Ray Tune hyper paramter optimization.
The function Tuner will optim...
Hi @<1639074542859063296:profile|StunningSwallow12> !
This happens because the output_uri in Task.init is likely not set.
You could either set the env var CLEARML_DEFAULT_OUTPUT_URI to the file server you want the model to be uploaded to before running train.py or set sdk.development.default_upload_uri: true (or to the file server you want the model to be uploaded to) in your clearml.conf .
Also, you could call Task.init(output_uri=True) in your train.py scri...