Alright, I'll try and put that together for Monday.
@<1523701435869433856:profile|SmugDolphin23> Yes. I'll try it in about 14 hours when I'm back at work and let you know how it goes. 😂
They will be related through the task. Get the task information from the dataset, then get the model information from the task.
So far when I delete a task or dataset using the web interface that has artifacts on S3 it doesn't prompt me for credentials.
It hooks into the calls made by the code. If you never save the model to disk, add it to a tool like MLflow/Tensorboard, or manually add the artifact to ClearML, afaik it won't save the artifact.
Hi @<1523701205467926528:profile|AgitatedDove14> . I think I'm misunderstanding something here. I have the scheduler service running. Now that it's running how does one add a new task or remove an existing task from the scheduler? I get that I can add them before starting the scheduler service but once the service is running is there any way to connect to it and change the schedule?
I thought the advantage of this service would be we could schedule tasks just by connecting to the existing t...
Why? That's not how I authenticate. Also, if it was simply an issue with authentication wouldn't there be some error message in the log?
Yes, it indeed appears to be a regex issue. If I run:
Dataset.list_datasets(
dataset_project=self.task.get_project_name(),
partial_name=re.escape('[LTV] Dataset Test'),
only_completed=True,
)
It works as expected. I'm not sure how raw you want to leave the partial_name features. I could create a PR to fix this but would you want me to re.escape at the list_datasets()
level? Or go deeper and do it at `Task._query_task...
Project 2:
2024-01-22 17:21:56
task 6518c3cd13394aa4abbc8f0dc34eb763 pulled from 8a69a982f5824762aeac7b000fbf2161 by worker bigbrother:10
2024-01-22 17:22:03
Current configuration (clearml_agent v1.7.0, location: /tmp/.clearml_agent.bojpliyx.cfg):
----------------------
agent.worker_id = bigbrother:10
agent.worker_name = bigbrother
agent.force_git_ssh_protocol = true
agent.python_binary = /home/natephysics/anaconda3/bin/python
agent.package_manager.type = pip
agent.package_manager.pip_v...
Hi Jake 👍 ,
Maybe the content is cached? The repo isn't big. I didn't realize the log was missing content. I believe I copied everything but I'll double check in a moment.
I'm not sure why the logs were incomplete. I think part of the reason it wasn't pulling from the repo was that it was pulling from cache. I cleared the clearml cache for that project and reran it. This should be the full log.
After some digging we found it was actually caused by the routers IPS protection. I thought it would be strange for github to be throttling things at this scale.
Actually, clearing the cache on the other project might have fixed it. I just tested it out and it seems to be working.
@<1523701435869433856:profile|SmugDolphin23> I spoke too soon. It does resolve the error I posted but it introduces a new error. While this error does seem to be related to VS Code the strange thing is it doesn't occur if I run it with earlier versions of clearml
.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/natephysics/.vscode-server/extensions/ms-python.python-2023.22.1/pythonFiles/lib/python/debugpy/_vendo...
This doesn't really make a lot of sense. ClearML would be better served for tracking which version of the code you used for a corresponding task and you'd use something like github or gitlab to track code and host your code. You could use ClearML to help you reconstruct the environment and code from a task given it's being tracked by git and hosted somewhere you can access.
I might have found the answer. I'll reply if it works as expected.
Thanks for your reply @<1523701070390366208:profile|CostlyOstrich36> Is there an example where a pipeline is built from existing tasks? I'd like to experiment with it and I don' t see any examples of what you describe with my (clearly lacking) google-fu. What happens if you wrap a function with a task.init() with a pipeline decorator or is that the process you're speaking of?
The verbose output:
Generating SHA2 hash for 123 files
100%|██████████████████████████████████████████████████████████| 123/123 [00:00<00:00, 310.04it/s]
Hash generation completed
Add 2022-12.csv
Add 2020-10.csv
Add 2021-06.csv
Add 2022-02.csv
Add 2021-04.csv
Add 2013-03.csv
Add 2021-02.csv
Add 2015-02.csv
Add 2016-07.csv
Add 2022-05.csv
Add 2021-10.csv
Add 2018-04.csv
Add 2019-06.csv
Add 2017-11.csv
Add 2016-01.csv
Add 2013-06.csv
Add 2018-08.csv
Add 2020-05.csv
Add 2020-03.csv
Add 20...
Actually this is not how it works, pip will install in any way it sees fit, and it is not consistent between versions (it has to do with dependency resolving)
Oh I see. What a pain. 🤣
You can configure the agent to first install specific packages, and only then others, just add the package names here:
That's an interesting solution. I'll keep that in mind as I work more with ClearML.
Thanks for your help Martin!
Since this could happen with a lot of services, maybe it would be worth a retry option? Especially if it's part of a pipeline.
It sounds like you didn't set up your config. Did you ever initialize clearml?
Depending on the framework you're using it'll just hook into the save model operation. Every time you save a model, which will probably happen every epoch for some subset of the training. If you want to do it with the existing framework you could change the checkpoint so that it only clones the best model in memory and saves the write operation for last. The risk with this is if the training crashes, you'll lose your best model.
Optionally, you could also disable the ClearML integration with...
Hyperdatasets are the only ones that require a premium. If you're using normal datasets it should be fine.
It's a corporate one. We are also looking into options on Github's end.
@<1539780284646428672:profile|PoisedElephant79> Are you sure you're not simply referring to the get operation? That seems to exclude archived datasets. But I don't see anything like that for the list_datasets operation.