
Reputation
Badges 1
108 × Eureka!I have manually verified that the line-by-line content of the csv files is identical using hashlib.sha256(). Why would it be that the file content is the same, they are generated by the same process (literally just rerunning the same code twice) but ClearML treats them differently.
As far as I can tell there's nothing else running that isn't running on our hardware. Is there some way to see what application instances are active?
Sorry I disappeared (went on a well deserved vacation). The problem is happening because of the ordering of the install. If I install using pip install -r ./requirements.txt
then pip installs the packages in the order of the requirements file. However, during the installation process from ClearML, it installs the packages in order UNLESS there's a custom path provided, then it's saved for last. The reason this breaks my code is I have later packages that depend on the custom packages, as ...
The verbose output:
Generating SHA2 hash for 123 files
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 123/123 [00:00<00:00, 310.04it/s]
Hash generation completed
Add 2022-12.csv
Add 2020-10.csv
Add 2021-06.csv
Add 2022-02.csv
Add 2021-04.csv
Add 2013-03.csv
Add 2021-02.csv
Add 2015-02.csv
Add 2016-07.csv
Add 2022-05.csv
Add 2021-10.csv
Add 2018-04.csv
Add 2019-06.csv
Add 2017-11.csv
Add 2016-01.csv
Add 2013-06.csv
Add 2018-08.csv
Add 2020-05.csv
Add 2020-03.csv
Add 20...
Alright, I tried testing it out by commenting out the code for generating new csv's, so for successive runs the CSVs are identical. However, when I use dataset.add_files() it still generated a new version of the dataset.
# log the data to ClearML if a task is passed
if self.task:
self.clearml_dataset = Dataset.create(dataset_name="[LTV] Dataset")
self.clearml_dataset.add_files(path=save_path, verbose=True)
if self.tags is not None:
...
Unfortunately, that doesn't seem to have solved the problem. I tried the same thing with https and it seems to skip the lines with the @ symbol like it did before. Honestly, it seems more like it just isn't parsing those lines during the install.
Collecting darts==0.25.0
Using cached darts-0.25.0-py3-none-any.whl (760 kB)
Collecting lightgbm
Using cached lightgbm-4.1.0-py3-none-manylinux_2_28_x86_64.whl (3.1 MB)
Collecting prophet
Using cached prophet-1.1.4-py3-none-manylinux_2_1...
I see. Thanks for the insight. That seems to be the case. I'm struggling a bit with datasets. For example, if I wanted to trace the genealogy of a dataset that's used by traditional tasks and pipelines. I'll try and write something up about the challenges around that when I get the chance. But your comment revealed another issue:
It appears that the partial name matching isn't going well. I'm unclear why this wouldn't be matching. In the attached photo you can see the input for `partial_nam...
I might have found the answer. I'll reply if it works as expected.
Yeah, it's because it's just hooking into the save operation and capturing the output, regardless of the parent call.
Thanks Martin. I read this method as "getting the data associated with the model training" not "getting metadata for the model". This is what I'm looking for.
Thanks for your reply @<1523701070390366208:profile|CostlyOstrich36> Is there an example where a pipeline is built from existing tasks? I'd like to experiment with it and I don' t see any examples of what you describe with my (clearly lacking) google-fu. What happens if you wrap a function with a task.init() with a pipeline decorator or is that the process you're speaking of?
It's even attempting to install omegaconf but not from the repo, likely because it's a dependency of hydra-colorlog.
Collecting omegaconf<2.4,>=2.2
Using cached omegaconf-2.2.3-py3-none-any.whl (79 kB)
Using cached omegaconf-2.2.2-py3-none-any.whl (79 kB)
Using cached omegaconf-2.2.1-py3-none-any.whl (78 kB)
The plot thickens. It seems like there's something odd going on with the interaction between [LTV]
and additional text. If I just search [LTV]
it works, if I just search Dataset Test
it works, but if I put them together it breaks the search. Now that I think about it, there's other oddities that seem to happen in the web interface that might be explained by some bugs around using brackets in names.
Thanks for the reply @<1523701070390366208:profile|CostlyOstrich36> !
It says in the documentation that:
Add a folder into the current dataset. calculate file hash, and compare against parent, mark files to be uploaded
It seems to recognize the dataset as another version of the data but doesn't seem to be validating the hashes on a per file basis. Also, if you look at the photo, it seems like some of the data does get recognized as the same as the prior data. It seems like it's the correct...
Since this could happen with a lot of services, maybe it would be worth a retry option? Especially if it's part of a pipeline.
It hooks into the calls made by the code. If you never save the model to disk, add it to a tool like MLflow/Tensorboard, or manually add the artifact to ClearML, afaik it won't save the artifact.
This does appear to resolve the issue. I'll keep you updated if I find any other issues. Thanks @<1523701435869433856:profile|SmugDolphin23>
Results:
I first tried uncommenting enable_git_ask_pass: false
but it didn't resolve the issue.
I then cleared the cache in the vcs-cache
folder, and that did fix the issue. This is the second time the cache seemed to have been the root cause of the problem. At some point I did move from token-based auth to ssh keys. Would this require clearing the cache for any project that was cached prior to the auth change?
Sounds good. Lmk if there's some changes that are required.
I just checked the clearml.conf and I'm not specifying any version of python for the agents.
The answer is simple but also not completely obvious to someone new to the platform. So you can inject new command line args that hydra will recognize. This is what the Hydra section of args is for. However, if you enable _allow_omegaconf_edit_: True
, I think ClearML will βinjectβ the OmegaConf saved under the configuration object of the prior run, overwriting the overrides. Iβll experiment with this behavior a bit more to be sure.
You might want to start with the first steps guide then:
None
@<1523701435869433856:profile|SmugDolphin23> Yeah, I just wanted to validate it was worth spending the time. Since there is already a parameter that takes callable (i.e. schedule_function
) it might make sense that we reuse the parameter. If it returns a str we validate that it's a task and if it does we can run the task as if we originally passed it as the task_id
in .add_task()
. This would only be a breaking change if the callable that was passed happened to return a task_id
...
Awesome! Did you managed to solve the tailscale issue with ClearML sessions? Sorry I wasn't active with that. I don't use sessions often and I found a suitable alternative in the short time. Any hopes of the changes making their way to a PR for the official release?
That make sense. I was confused what the source was.
Ah, that makes sense. What is supposed to be hidden changes depending on the section your in, which makes sense. Now there needs to a packman sprite easter egg hidden somewhere else.
I had 2 datasets on archive and 0 unarchived. When I ran the following command:
Dataset.list_datasets(dataset_project=self.task.get_project_name(), only_completed=True)
It returned two entrees for the two datasets I had on archive.
In this case it's the ID of the "output" model from the first task.