Reputation
Badges 1
25 × Eureka!Hi @<1687643893996195840:profile|RoundCat60> , I just saw the message,
Just by chance I set the SSH deploy keys to write access and now we're able to clone the repo. Why would the SSH key need write access to the repo to be able to clone?
Let me explain, the default use case for the agent is to use user/pass (as configured in the clearml.conf file(
It will change any ssh links to https links and will add the credentials to clone the repository.
You can also provide SSH keys (basicall...
That is a bit odd, But SSH keys have to have a specific chmod flags for them to work (security issues)
What was the error ?
Last but not least - can I cancel the offline zip creation if I'm not interested in it
you can override with OS environment, would that work?
Or well, because it's not geared for tests, I'm just encountering weird shit. Just calling
task.close()
takes a long time
It actually zips the entire offline folder so you can later upload it. Maybe we can disable that part?!
` # generate the script section
script = (
"fr...
Hi StickyBlackbird93
Yes, this agent version is rather old ( clearml_agent v1.0.0
)
it had a bug where pytorch wheel aaarch broke the agent (by default the agent in docker mode, will use the latest stable version, but not in venv mode)
Basically upgrade to the latest clearml-agent version it should solve the issue:pip3 install -U clearml-agemnt==1.2.3
BTW for future debugging, this is the interesting part of the log (Notice it is looking for the correct pytorch based on the auto de...
ClearML maintains a github action that sets up a dummy clearml-server,
You have one, it's the http://app.clear.ml (not a dummy one, but for this purpose it will work)
thoughts ?
s there any way to see datasets uploaded to ClearML Data without downloading them using ClearML Data?
Hi VexedCat68
Currently when you create datasets with clearml-data it has to repackage your files, i.e. upload them. That said we have received numerous requests on "registering data", and we are looking into it.
Here is the main technical hurdles we are facing, and I would love to get your perspective:
If the data is not available locally, we cannot calculate the hash of the conten...
why are there indefinitely growing anonymous tasks, even after i've closed the main schedulers.
The anonymous Tasks are The Dataset you are creating (a Dataset version is also a Task of a certain type with artifacts, the idea is usually Datasets are created from code, hence the need to combine the two).
Make sense ?
FlatStarfish45
In the parent task, the libs appear installed.
What do you mean by "parent Task"? Is this the base task we are optimizing (i.e. the experiment / model we are optimizing) ?
Or is it the "Optimization Task" itself?
Hi VexedCat68
Could it be the python version is not the same? (this is the only reason not to find a specific python package version)
FrustratingWalrus87 Unfortunately TB's TSNE is not automatically captured by ClearML (Scalars, histograms etc. are)
That said, matplotlib will be automatically captured do you can run your own PCA/tSNE and use matplotlib to visualize (ClearML will capture it).
The same applies for plotly.
What do you think?
Hi @<1692345677285167104:profile|ThoughtfulKitten41>
Is it possible to trigger a pipeline run via API?
Yes! a pipeline is at the end a Task, you can take the pipeline ID and clone and enqueue it
pipeline_task = Task.clone("pipeline_id_here")
Task.enqueue(pipeline_task, queue_name="services")
You can also monitor the pipeline with the same Task inyerface.
wdyt?
Is there any way to make that increment from last run?
pipeline_task = Task.clone("pipeline_id_here", name="new execution run here")
Task.enqueue(pipeline_task, queue_name="services")
wdyt?
Hmm yeah I think that makes sense. Can you post here the arguments?
I'm assuming you have something like '1.23a' in the arguments?
@<1720249421582569472:profile|NonchalantSeaanemone34>
dso = Dataset.create(
dataset_project= project_name,
dataset_name= dataset_name,
parent_datasets=[parent_datasets_id],
)
dso = Dataset.get(
dataset_project= project_name,
dataset_name= dataset_name,
only_completed=True,
only_published=False,
alias='latest',
)
why are you creating a dataset then getting a dataset on the same object?
it seems you are trying to upload...
This workflow however is the only way I have found to easily fix my previous ‘Module not found’ errors
Hmm okay make sense,
Did you try to set these ?
or even hack the sys.path with something likeimport sys, os sys.path.insert(0, os.path.abspath(os.path.dirname(__file__)+"/../")
Oh no, you are absolutely correct, it is broken (I mean I have no idea why it lists Hydra, or how it got there). I will let the guys know and fix it.
Bottom line, after you clone it, please edit the installed packages and remove the "Hydra" line and replace with just "hydra-core" (no need for version).
The format is the same as "requirements.txt" and will effect the venv created by the agent
And I think the default is 100 entries, so it should not get cleaned.
and then they are all removed and for a particular task it even happens before my task is done
Is this reproducible ? Who is cleaning it and when?
looks like service-writing-time for me!
Nice!
persist/restore state so that tasks are restartable?
You mean if you write preemption-ready training code ?
DeliciousBluewhale87
node.base_task_id
is the base task, which will always be in draft mode, Instead we should use the
node.executed
which references the current executed node.
YES, maybe we should add that into the example, so it is clearer ? WDYT?
Yea I know, I reported this
LOL, apologies these days it a miracle I still remember my login passwords 😉
Do we have it on the git issue ?
So what you are saying is the workers randomly report on one another's experiments ?
There is some overhead, but it should be negligible.
Hi CostlyElephant1
What do you mean by "delete raw data"? Data is always fetched to cached folders and clearml takes care of cache cleanup
That said notice that get mutable copy is a target you specify, in this case you should definetly delete after usage. Wdyt ?
Is
mark_completed
used to complete a task from a different process and
close
from the same process - is that the idea?
Yes
However, when I tried them out,
mark_completed
terminated the process that called
mark_completed
.
Yes if you are changing the state of the Task externally or internally the SDK will kill the process. If you are calling task.close()
from the process that created the Task it will gra...
Failed to initialize NVML: Unknown Error
yeah this is a driver issue. I think you need to check the VM image if the drivers match the GPU on that machine