Reputation
Badges 1
25 × Eureka!Hi JitteryCoyote63
I think this is the default python str() casting.
But you can specify the preview test when you call upload_artifact:
https://clear.ml/docs/latest/docs/references/sdk/task#upload_artifact
see preview
argument
JitteryCoyote63 Great to hear ๐
BTW:
Would it be possible to extendย
Task.init
ย with aย
force_reuse
ย that would enforce reusing these tasks
You can pass continue_last_task=True
I think it should be equivalent to what you suggest
instead of the one that I want or the one of the env which it is started from.
The default is the python that is used to run the agent.agent.ignore_requested_python_version = true agent.python_binary = /my/selected/python3.8
Hi ShallowArcticwolf27
Does theย
clearml-task
ย cli command currently support remote repositories with that are intended to be used with ssh
It does ๐
but theย
git@
ย prefix used for gitlab's ssh it seems to default to looking for the repository locally
git@ is always the prefix for SSH repositories (it does not actually mean it uses it, it's what git will return when asked on the origin of the repository. The agent knows (if SSH credentials ...
Hi JitteryRaven85
I have also deleted some hyper-params but they appear again when training starts.
Yes you cannot "delete" parameters, as any missing parameter is synced back (making sure you have a full log).
The problem is that when I clone an experiment and change the hyper params some change and some remain the same
Could you expand on which parameters stay the same ? (obviously this should not happen)
Yes it fully supported, and should work.
Could you share the full execution log ?
VirtuousFish83 I remember an issue on github with something similar , what's the cleamrl- server version you are using ?
In your code, can you print the following:import os print(os.environ.keys())
There should be a few keys the Pycharm plugin is sending from the local machine, pointing to the git repo
Can you also make sure you did not check "Disable local nachine git detection" in the clearml PyCharm plugin?
DefeatedOstrich93 many thanks I was able to reproduce it (basically newly added files caused git apply to fail)
Fix will be part of the next clearml-agent RC
Also, How do I make the files other than entry script visible to the job?
The assumption for clearml (regradless on how you create a Task) is that you code is either a standlone script (or jupyter notebook) or inside a git repository. In case of a git repository cleamrl-agent will clone the git repository of the code, apply the uncommitted changes and run your code.
Really stoked to start using it and introduce a more sane ML ops workflow at my workplace lol.
Totally with you ๐
... would that be aย
Model Registry Store
ย plugin?
YES please โค
So we actually just introduced "Applications" into the clearml free tier, https://app.community.clear.ml/applications
Allowing you to take any Task in the system and make it an "application" (a python script running on one of the service agents), with the ability to configu...
I'm thinking of a few plots in my current in-house tooling which are slightly different than the standard charts we look at. For example a custom parallel coordinate chart that can use aggregations, categorical variables, etc.
This can be done by comparing experiments, then check the Hyper-Parameters tab, and select graph from the drop down at the top
So my question in general is pertaining to if I would need to get better at Javascript if I were to make those changes. My guess is ...
Hi SmilingFrog76
Great question, sadly multi-node is never simple ๐
Let's start with the basic, let's assume one worker is available and the other is not, what would you want to happen? (p.s. I'm not aware of flexible multi-node training frameworks, i.e. a framework that can detect another node is available and connect with it mid training, that said, it might exist ๐ )
Hi UnevenDolphin73
Took a long time to figure out that there was a specific Python version with a specific virtualenv that was old ...
NICE!
Then the task requested to use Python 3.7, and that old virtualenv version was broken.
Yes, if the Task is using a specific python version it will first try to find this one (i.e. which python3.7
) then use it to create the new venv
As a result -> Could the agent maybe also output theย
virtualenv
ย version used ...
ReassuredTiger98 both are running with pip as package manager, I thought you mentioned conda as package manager, no?agent.package_manager.type = pip
Also the failed execution is looking for "ruamel_yaml_conda" but it is nowhere to be found on the original one?! how is that possible ?
for example, one notebook will be dedicated to explore columns, spot outliers and create transformations for specific column values.
This actually implies each notebook is a standalone "process", which makes a ton of sense. But this is where notebooks and proper SW design break, in traditional SW, the notebooks are actually python files, and then of course you can import one from another, unfortunately this does not work in notebooks...
If you are really keen on using notebooks I wou...
Hi DeliciousBluewhale87
Hmm, good question.
Basically the idea is that if you have ingestion service on the pods (i.e. as part of the yaml template used by the k8s glue) you can specify to the glue what are the exposed ports, so it knows (1) what's the maximum of instances it can spin, e.g. one per port (2) it will set the external port number on the Task, so that the running agent/code will be aware of the exposed port.
A use case for it would be combing the clearml-session with the k8s gl...
Regrading the limit interface, let me check I think this is worked on (i.e. nice interface that should be pushed in the next few days). Let me get back to you on this one.
How will imposing an instance limit , prevent or allow --order-fairness feature for example, which exists when running in clearml-agent version compared to k8s_glue_example version ?
A bit of background on how the glue works:
It pulls jobs from the clearml queue, then it prepares a k8s job, and launches the k8s jobs...
Hi MortifiedCrow63
I have to admit this is very strange, I think the fact it works for the artifacts and not for the model is kind of a fluke ...
If you use "wait_on_upload" argument in the upload_artifact you end up with the same behavior. Even if uploaded in the background, the issue is still there, for me it was revealed the minute I limited the upload bandwidth to under 300kbps.It seems the internal GS timeout assumes every chunk should be uploaded in under 60 seconds.
The default chunk...
Thanks FiercePenguin76 , I can totally understand your point on running proper tests, and reluctance to break other things.
I suggest to add a comment with the temp fix that solved the problem for you, and we will make sure the team takes it from there. wdyt?
(i.e. importing the trains package is enough to patch the argparser, only when you call the task.init the arguments will be logged, before they are stored in memory)
Hi UpsetTurkey67
repository discovery stores github repo in the form:
...
while for others
git@github.com:...
Yes that depends on how they locally cloned the repo (via SSH or user/pass/token)
Interestingly in the former case the ssh config is ignored and cloning repository breaks on the worker
If you have passed git user/pass to the agent it should use them not SSH, how did you configure the agent ?
Hi ReassuredTiger98
I do not want to create extra queues for this since this will not be able to properly distribute tasks.
Queues are the way to abstract different resources to "compute capabilities". It creates a simple interface to users on the one hand and allows you to control the compute on the other Agents can listen to multiple queues with priority. This means an RTX agent can pull from an RTX queue, and if this is empty, it will pull from "default" queueWould that work for ...
and I have no way to save those as clearml artifacts
You could do (at the end of the codetask.upload_artifact('profiler', Path('./fil-result/'))
wdyt?
Also SoreDragonfly16 could you test with if the issue exists with trains==0.16.2rc0
?
BTW: we are now adding "datasets chunks for a more efficient large dataset storage"
Thanks ShakyJellyfish91 this really helps to narrow it down!
Let me see what I can find