I will ask internally about this
would it be on the pipeline task itself then, since that's what's disappearing?
that likely the case
Hi @<1668427950573228032:profile|TeenyShells80> , the parent_datasets
should be a list of dataset IDs or clearml.Dataset objects, not dataset names. Maybe that is the issue
Hi @<1597762318140182528:profile|EnchantingPenguin77> ! You should be able to see the overrides unde CONFIGURATION->HYPERPARAMETERS->Args->overrides:
would that mean that multiple pre_callback()s would have to be defined for every add_step, since every step would have different configs? Sorry if there's something I'm missing, I'm still not quite good at working with ClearML yet.
Yes, you could have multiple callbacks, or you could check the name of each step via node.name
and map the name of the node to its config.
One idea would be to have only 1 pipeline config file, that would look like:
step_1:
# step_1 confi...
Hi @<1523715429694967808:profile|ThickCrow29> ! We identified the issue. We will soon release a fix for it
is it just this script that you are running that breaks? What happens if instead of pipe.upload_model
you callprint(pipe._get_pipeline_task())
?
DangerousDragonfly8 I'm pretty sure you can use pre_execute_callback
or post_execute_callback
for this. you get the PipelineController
in the callback and the Node
. Then you can modify the next step/node. Note that you might need to access the Task
object directly to change the execution_queue
and docker_args
. You can get it from node.job.task
https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#add_funct...
FierceHamster54 As long as you are not forking, you need to use Task.init
such that the libraries you are using get patched in the child process. You don't need to specify the project_name
, task_name
or outpur_uri
. You could try locally as well with a minimal example to check that everything works after calling Task.init
.
Hi @<1702492411105644544:profile|YummyGrasshopper29> ! To enable caching while using a repo
, you also need to specify a commit
(as the repo might change which would invalidate the caching). We will add a warning regarding this in the near future.
Regarding the imports: we are aware that there are some problems when executing the pipeline remotely as described. At the moment, appending to sys.path is one of the only solutions (other than making utils a package on your local machine so...
Hi @<1524560082761682944:profile|MammothParrot39> ! A few thoughts:
You likely know this, but the files may be downloaded to something like /home/user/.clearml/cache/storage_manager/datasets/ds_e0833955ded140a69b4c9c9d8e84986c
. .clearml
may be hidden and if you are using an explorer you are not able to see the directory.
If that is not the issue: are you able to download some other datasets, such as our example one: UrbanSounds example ? I'm wondering if the problem only happens fo...
RoundMosquito25 you might need to use cast=True
when you get the parameters.
See this snippet:
` from clearml import Task
t = Task.init()
params = {}
params["Function"] = {}
params["Function"]["number"] = 123
t.set_parameters_as_dict(params)
t.close()
cloned = Task.clone(t.id)
s = cloned.get_parameters_as_dict(cast=True)
s["Function"]["number"] = 321
cloned.set_parameters_as_dict(s)
print(type(cloned.get_parameters_as_dict(cast=True)["Function"]["number"])) # will print 'int' `
Hi RoundMole15 ! Are you able to see a model logged when you run this simple example?
` from clearml import Task
import torch.nn.functional as F
import torch.nn as nn
import torch
class TheModelClass(nn.Module):
def init(self):
super(TheModelClass, self).init()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
s...
PanickyMoth78 there is no env var for sdk.google.storage.pool_connections/pool_maxsize
. We will likely add these env vars in a future release.
Yes, setting max_workers to 1 would not make a difference. The docs look a bit off, but it is specified that 1: if the upload destination is a cloud provider ('s3', 'gs', 'azure')
.
I'm thinking now that the memory issue might also be cause because of the fact that we prepare the zips in the background. Maybe a higher max_workers
wou...
Hi HandsomeGiraffe70 ! You could try setting dataset.preview.tabular.table_count
to 0 in your clearml.conf
file
HandsomeGiraffe70 your conf file should look something like this:
` {
# ClearML - default SDK configuration
storage {
cache {
# Defaults to system temp folder / cache
default_base_dir: "~/.clearml/cache"
# default_cache_manager_size: 100
}
direct_access: [
# Objects matching are considered to be available for direct access, i.e. they will not be downloaded
# or cached, and any download request will ...
You're correct. There are 2 main entries in the conf file: api
and sdk
. The dataset
entry should be under sdk
Each step is a separate task, with its own separate logger. You will not be able to reuse the same logger. Instead, you should get the logger in the step you want to use it calling current_logger
@<1578555761724755968:profile|GrievingKoala83> what error are you getting when using gloo? Is it the same one?
@<1719162259181146112:profile|ShakySnake40> the data is still present in the parent and it won't be uploaded again. Also, when you pull a child dataset you are also pulling the dataset's parent data. dataset.id
is a string that uniquely identifies each dataset in the system. In my example, you are using the ID to reference a dataset which would be a parent of the newly created dataset (that is, after getting the dataset via Dataset.get
)
And I believe that by default we send artifacts to the clearml server if not specified
(We will deprecate continue_on_fail)
in the meantime, we should have fixed this. I will ping you when 1.9.1 is out to try it out!
@<1578555761724755968:profile|GrievingKoala83> did you call task.aunch_multi_node(4)
or 2
? I think the right value is 4 in this case
Hi @<1581454875005292544:profile|SuccessfulOtter28> ! The logger is likely outdated. Can you please open a Github issue about it?
you might want to prefix both the host
in the configuration file and the uri in Task.init
/ StorageHelper.get
with s3.
if the script above works if you do that
Regarding 1.
, are you trying to delete the project from the UI? (I can't see an attached image in your message)
OutrageousSheep60 that is correct, each dataset is in a different subproject. That is why bug 2.
happens as well
Regarding number 2.
, that is indeed a bug and we will try to fix it as soon as possible