PanickyMoth78 there is no env var for sdk.google.storage.pool_connections/pool_maxsize . We will likely add these env vars in a future release.
Yes, setting max_workers to 1 would not make a difference. The docs look a bit off, but it is specified that 1: if the upload destination is a cloud provider ('s3', 'gs', 'azure') .
I'm thinking now that the memory issue might also be cause because of the fact that we prepare the zips in the background. Maybe a higher max_workers wou...
Please add it to github! No other info is needed, we know what the issue is
you might want to prefix both the host in the configuration file and the uri in Task.init / StorageHelper.get with s3. if the script above works if you do that
You're welcome! Feel free to write here again if you believe this might be a ClearML problem
If the task is running remotely and the parameters are populated, then the local run parameters will not be used, instead the parameters that are already on the task will be used. This is because we want to allow users to change these parameters in the UI if they want to - so the paramters that are in the code are ignored in the favor of the ones in the UI
Something like:
dataset = Dataset.create(dataset_name=dataset_name, dataset_porject=dataset_project, parent_datasets=[dataset.id])
Hi NonchalantGiraffe17 ! Thanks for reporting this. It would be easier for us to check if there is something wrong with ClearML if we knew the number and sizes of the files you are trying to upload (content is not relevant). Could you maybe provide those?
Hi @<1603198134261911552:profile|ColossalReindeer77> ! The usual workflow is that you modify the fields in your remoter run in either the Hyperparameters section or the configuration section, but not usually both (as in Hydra's case). When using CLI tools, people mostly modify the Hyperparameters section so we chose to set the allow_omegaconf_edit to False by default for parity.
@<1719524641879363584:profile|ThankfulClams64> you could try using the compare function in the UI to compare the experiments on the machine the scalars are not reported properly and the experiments on a machine that runs the experiments properly. I suggest then replicating the environment exactly on the problematic machine. None
Hi @<1679661969365274624:profile|UnevenSquirrel80> ! Pipeline projects are hidden. You can try to pass task_filter={"search_hidden": True, "_allow_extra_fields_": True} to the query_tasks function to fetch the tasks from hidden projects
@<1626028578648887296:profile|FreshFly37> I see that create_dataset doesn't have a repo set. Can you try setting it manually via the repo repo_branch repo_commit arguments in the add_function_step method?
Hi @<1639074542859063296:profile|StunningSwallow12> !
This happens because the output_uri in Task.init is likely not set.
You could either set the env var CLEARML_DEFAULT_OUTPUT_URI to the file server you want the model to be uploaded to before running train.py or set sdk.development.default_upload_uri: true (or to the file server you want the model to be uploaded to) in your clearml.conf .
Also, you could call Task.init(output_uri=True) in your train.py scri...
@<1523701168822292480:profile|ExuberantBat52> Do you have pandas installed on your machine?
Hi OutrageousSheep60 ! Regarding your questions:
No it's not. We will have a RC that fixes that ASAP, hopefully by tomorrow You can use add_external_files which you already do. If you wish to upload local files to the bucket, you can specify the output_url of the dataset to point the bucket you wish to upload the data to. See the parameter here: https://clear.ml/docs/latest/docs/references/sdk/dataset/#upload . Note that you CAN mix external_files and regular files. We don't hav...
Hi BoredHedgehog47 ! We tried to reproduce this, but failed. What we tried is running the attached main.py which Popen s sub.py .
Can you please run main.py as well and tell us if you still encounter the bug? If not, is there anything else you can think of that could trigger this bug besides creating a subprocess?
Thank you!
Hi @<1594863230964994048:profile|DangerousBee35> ! This looks like an ok solution, but I would make the package pip-installable and push it to another repo, then add that repo to a requirements file such that the agent can install it. Other than that, I can’t really think of another easy way to use your package
Hi @<1674226153906245632:profile|PreciousCoral74> !
Sadly, Logger.report_matplotlib_figure(…) doesn't seem to log plots. Only the automatic integration appears to behave.
What do you mean by that? report_matplotlib_figure should work. See this example on how to use it: None .
If it still doesn't work for you, could you please share a code snippet that could help us track down...
Hi OutrageousSheep60 . The list_datasets function is currently broken and will be fixed next release
@<1634001100262608896:profile|LazyAlligator31> it looks like the args get passed to a python thread. so the should be specified the same way as you would pass them to the args argument in a thread (so a tuple of positional arguments): func_args=("something", "else") . It looks like passing kwargs is not directly supported, but you could build a partial :
from functools import partial
scheduler.add_task(schedule_function=partial(clone_enqueue, arg_1="something", arg_2="else")...
Hi @<1546303293918023680:profile|MiniatureRobin9> ! I think the UI is not aware of tags. Anyway, the repository will likely get checked out to your desired tag. Can you please tell us if that's the case?
That is a clear bug to me. Can you please open a GH issue?
Hi @<1546303293918023680:profile|MiniatureRobin9> ! When it comes to pipeline from functions/other tasks, this is not really supported. You could however cut the execution short when your step is being ran by evaluating the return values from other steps.
Note that you should however be able to skip steps if you are using pipeline from decorators
ok, that is very useful actually
@<1526734383564722176:profile|BoredBat47> How would you connect with boto3 ? ClearML uses boto3 as well, what it basically does is getting the key/secret/region from the conf file. After that it opens a Session with the credentials. Have you tried deleting the region altogether from the conf file?
you could also try using gloo as the backend (it uses CPU) just to check that the subprocesses spawn properly
can you share the logs of the controller?