Hi @<1654294820488744960:profile|DrabAlligator92> ! The way chunk size works is:
the upload will try to obtain zips that are smaller than the chunk size. So it will continuously add files to the same zip until the chunk size is exceeded. If the chunk size is exceeded, a new chunk (zip) is created. The initial file in this chunk is the file that caused the previous size to be exceeded (regardless of the fact that the file itself might exceed the size).
So in your case: am empty chunk is creat...
Hi @<1523708920831414272:profile|SuperficialDolphin93> ! What if you do just controller.start()
(to start it locally). The task should not quit in this case.
@<1523703472304689152:profile|UpsetTurkey67> can you please open a Github issue as well, so we can better track this one?
Hello MotionlessCoral18 . I have a few questions that might help us find out why you experience this problem:
Is there any chance you are running the program in offline mode? Is there any other message being logged that might help? The error messages might include Action failed
, Failed sending
, Retrying, previous request failed
, contains illegal schema
Are you able to connect to the backend at all from the program you are trying to get the dataset?
Thank you!
@<1626028578648887296:profile|FreshFly37> can you please screenshot this section of the task? Also, how does your project's directory structure look like?
Hi @<1590514584836378624:profile|AmiableSeaturtle81> ! What function are you using to upload the data?
@<1531445337942659072:profile|OddCentipede48> Looks like this is indeed not supported. What you could do is return the ID of the task that returns the models, then use Task.get_task
and get the model from there. Here is an example:
from clearml import PipelineController
def step_one():
from clearml import Task
from clearml.binding.frameworks import WeightsFileHandler
from clearml.model import Framework
WeightsFileHandler.create_output_model(
"obj", "file...
@<1523701083040387072:profile|UnevenDolphin73> are you composing the code you want to execute remotely by copy pasting it from various cells in one standalone cell?
FierceHamster54initing the task before the execution of the file like in my snippet is not sufficient ?
It is not because os.system
spawns a whole different process then the one you initialized your task in, so no patching is done on the framework you are using. Child processes need to call Task.init
because of this, unless they were forked, in which case the patching is already done.
` But the training.py has already a CLearML task created under the hood since its integratio...
Hi @<1523702000586330112:profile|FierceHamster54> ! Looks like we pull all the ancestors of a dataset when we finalize. I think this can be optimized. We will keep you posted when we make some improvements
FreshParrot56 You could modify this entry in your clearml.conf
to point to your drive: sdk.storage.cache.default_base_dir
.
Or, if you don't want to touch your conf file, you could set the env var CLEARML_CACHE_DIR
to your remote drive before you call get_local_copy. See this example:
` dataset = Dataset.get(DATASET_ID)
os.environ["CLEARML_CACHE_DIR"] = "/mnt/remote/drive" # change the clearml cache, make it point to your remote drive
copy_path = dataset.get_loc...
Hi @<1581454875005292544:profile|SuccessfulOtter28> ! You could take a look at how the HPO was built using optuna: None .
Basically: you should create a new class which inherits from SearchStrategy
. This class should convert clearml hyper_parameters to some parameters the Ray Tune understands, then create a Tuner
and run the Ray Tune hyper paramter optimization.
The function Tuner
will optim...
Hi @<1657918706052763648:profile|SillyRobin38> ! If it is compatible with http/rest, you could try setting api.files_server
to the endpoint or sdk.storage.default_output_uri
in clearml.conf
(depending on your use-case).
Indeed, running pipelines that were started with pipe.start_locally
can not be cloned and ran. We will change this behaviour ASAP such that you can use just 1 queue for your use case.
Hi RoughTiger69 ! Can you try adding the files using a python script such that we could get an exception traceback, something like this:
` from clearml import Dataset
or just use the ID of the dataset you previously created instead of creating a new one
parent_dataset = Dataset.create(dataset_name="xxxx", dataset_project="yyyyy", output_uri=" ")
parent_dataset.add_files("folder1")
parent_dataset.upload()
parent_dataset.finalize()
child_dataset = Dataset.create(dataset_name="xxxx", dat...
@<1554638160548335616:profile|AverageSealion33> Can you run the script with HYDRA_FULL_ERROR=1
. Also, what if you run the script without clearml? Do you get the same error?
SmallGiraffe94 You should use dataset_version=2022-09-07
(not version=...
). This should work for your use-case.Dataset.get
shouldn't actually accept a version
kwarg, but it does because it accepts some **kwargs
used internally. We will make sure to warn the users in case they pass values to **kwargs
from now on.
Anyway, this issue still exists, but in another form:Dataset.get
can't get datasets with a non-semantic version, unless the version is sp...
That's unfortunate. Looks like this is indeed a problem 😕 We will look into it and get back to you.
in the meantime, we should have fixed this. I will ping you when 1.9.1 is out to try it out!
DangerousDragonfly8 Yes this is correct, we mixed-up the places we call these functions
FiercePenguin76 Looks like there is actually a bug when loading models remotely. We will try to fix this asap
Hi @<1523701345993887744:profile|SillySealion58> ! We allow finer grained control over model uploads. Please refer to this GH thread for an example on how to achieve that: None
Hi again, @<1526734383564722176:profile|BoredBat47> ! I actually took a closer look at this. The config file should look like this:
s3 {
key: "KEY"
secret: "SECRET"
use_credentials_chain: false
credentials: [
{
host: "myendpoint:443" # no http(s):// and no s3:// prefix, also no bucket name
key: "KEY"
secret: "SECRET"
secure: true # ...
Hi @<1688721797135994880:profile|ThoughtfulPeacock83> ! Make sure you set agent.package_manager.type: poetry
in your clearml.conf
. If you do, the poetry.lock of pyproject.toml will be used to install the packages. See None