
Reputation
Badges 1
108 × Eureka!Sounds good. Lmk if there's some changes that are required.
@<1523701435869433856:profile|SmugDolphin23> Yes. I'll try it in about 14 hours when I'm back at work and let you know how it goes. 😂
That behavior seems strange. In the pipeline in the clearML pagem if you click on one of the steps and select full details (see attached) you can see the commit ID and the branch. Can you validate that the branch is correct but the commit ID is incorrect?
Awesome! Did you managed to solve the tailscale issue with ClearML sessions? Sorry I wasn't active with that. I don't use sessions often and I found a suitable alternative in the short time. Any hopes of the changes making their way to a PR for the official release?
1707128614082 bigbrother:gpu0 INFO task 59d23c5919b04fd6947c1e463fa8c78c pulled from 9890a035b8f84872ab18d7ff207c26c6 by worker bigbrother:gpu0
Current configuration (clearml_agent v1.7.0, location: /tmp/.clearml_agent.vo_oc47r.cfg):
----------------------
agent.worker_id = bigbrother:gpu0
agent.worker_name = bigbrother
agent.force_git_ssh_protocol = true
agent.python_binary = /home/natephysics/anaconda3/bin/python
agent.package_manager.type = pip
agent.package_manager.pip_version.0 = ...
You might want to start with the first steps guide then:
None
Hyperdatasets are the only ones that require a premium. If you're using normal datasets it should be fine.
Let me give that a try. Thanks for all the help.
@<1523701435869433856:profile|SmugDolphin23> Yeah, I just wanted to validate it was worth spending the time. Since there is already a parameter that takes callable (i.e. schedule_function
) it might make sense that we reuse the parameter. If it returns a str we validate that it's a task and if it does we can run the task as if we originally passed it as the task_id
in .add_task()
. This would only be a breaking change if the callable that was passed happened to return a task_id
...
Thanks for always checking in @<1523701087100473344:profile|SuccessfulKoala55> 😛
Is it possible the cached repository was cloned before you changed your agent settings?
Which settings are you referring to? I can't remember if I was using https auth when the project would have been first cached. Would that make a difference?
Also, did you set
agent.enable_git_ask_pass: true
?
The only instance of it in the config is commented out.
# if set, use GIT_ASKPASS to pass user/pass when cloning / fetch repositories
# it solves pas...
What version of ClearML server are you using?
Results:
I first tried uncommenting enable_git_ask_pass: false
but it didn't resolve the issue.
I then cleared the cache in the vcs-cache
folder, and that did fix the issue. This is the second time the cache seemed to have been the root cause of the problem. At some point I did move from token-based auth to ssh keys. Would this require clearing the cache for any project that was cached prior to the auth change?
Provide a bit more detail. What framework are you using?
I had 2 datasets on archive and 0 unarchived. When I ran the following command:
Dataset.list_datasets(dataset_project=self.task.get_project_name(), only_completed=True)
It returned two entrees for the two datasets I had on archive.
Alright, I'll try and put that together for Monday.
Interesting approach. I'll give that a try. Thanks for the reply!
I see. Thanks for the insight. That seems to be the case. I'm struggling a bit with datasets. For example, if I wanted to trace the genealogy of a dataset that's used by traditional tasks and pipelines. I'll try and write something up about the challenges around that when I get the chance. But your comment revealed another issue:
It appears that the partial name matching isn't going well. I'm unclear why this wouldn't be matching. In the attached photo you can see the input for `partial_nam...
The original file sizes are the same but the compressed sizes seem to be different.
Actually this is not how it works, pip will install in any way it sees fit, and it is not consistent between versions (it has to do with dependency resolving)
Oh I see. What a pain. 🤣
You can configure the agent to first install specific packages, and only then others, just add the package names here:
That's an interesting solution. I'll keep that in mind as I work more with ClearML.
Thanks for your help Martin!
After some digging we found it was actually caused by the routers IPS protection. I thought it would be strange for github to be throttling things at this scale.
Well, if I stop the cron service and start it back up I don't have to re-register each schedule. If, for instance, I start the TaskScheduler, register a task, and stop the scheduler, how do I restart the TaskScheduler in a way that re-register the tasks? Because, in theory, they could be registered from several users and I might be unaware of tasks that were previously scheduled. What is the best practices to preserve state?
Hi @<1523701205467926528:profile|AgitatedDove14> . I think I'm misunderstanding something here. I have the scheduler service running. Now that it's running how does one add a new task or remove an existing task from the scheduler? I get that I can add them before starting the scheduler service but once the service is running is there any way to connect to it and change the schedule?
I thought the advantage of this service would be we could schedule tasks just by connecting to the existing t...
Depending on the framework you're using it'll just hook into the save model operation. Every time you save a model, which will probably happen every epoch for some subset of the training. If you want to do it with the existing framework you could change the checkpoint so that it only clones the best model in memory and saves the write operation for last. The risk with this is if the training crashes, you'll lose your best model.
Optionally, you could also disable the ClearML integration with...
Thanks for the reply @<1523701070390366208:profile|CostlyOstrich36> !
It says in the documentation that:
Add a folder into the current dataset. calculate file hash, and compare against parent, mark files to be uploaded
It seems to recognize the dataset as another version of the data but doesn't seem to be validating the hashes on a per file basis. Also, if you look at the photo, it seems like some of the data does get recognized as the same as the prior data. It seems like it's the correct...
We have a server that has many agents running on it because there are many instances where training can be run over several agents as a single agent doesn't take up all the resources available to the server.
I have manually verified that the line-by-line content of the csv files is identical using hashlib.sha256(). Why would it be that the file content is the same, they are generated by the same process (literally just rerunning the same code twice) but ClearML treats them differently.
It's even attempting to install omegaconf but not from the repo, likely because it's a dependency of hydra-colorlog.
Collecting omegaconf<2.4,>=2.2
Using cached omegaconf-2.2.3-py3-none-any.whl (79 kB)
Using cached omegaconf-2.2.2-py3-none-any.whl (79 kB)
Using cached omegaconf-2.2.1-py3-none-any.whl (78 kB)