
Reputation
Badges 1
103 × Eureka!I'm guessing .1
is since there were datasets that I could not see - but actually they were there (as sub projects). so everything is related
Hi SmugDolphin23
Do you have a timeline for fixing this https://clearml.slack.com/archives/CTK20V944/p1661260956007059?thread_ts=1661256295.774349&cid=CTK20V944
is this running from the same linux user on which you checked the git ssh clone on that machine?
yes
The only thing that could account for this issue is somehow the agent is not getting the right info from the ~/.ssh folder
maybe -
Question - if we change the clearml.conf
do we need to stop and start the daemon?
will do
A work around that worked for me is to explicitly complete the task, seems like the flush
has some bug
task = Task.get_task('...')
task.close()
task.mark_completed()
ds.is_final()
True
Hi CostlyOstrich36 ,
After verifying - I can confirm that there is no custom certificate .
any other ideas?
I think I have a lead.
looking at list of workers from clearml-agent list
e.g. https://clearml.slack.com/archives/CTK20V944/p1657174280006479?thread_ts=1657117193.653579&cid=CTK20V944
is there a way to find the worker_name
?
in the above example the worker_id
is clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0
but I'm not able to stop this worker using the command
clearml-agent daemon --stop
since this orphan worker has no corresponding clearml.conf
Looking in the repo I was not able to see an example - only reference to https://github.com/allegroai/clearml/blob/b9b0a506f35a414f6a9c2da7748f3ec3445b7d2d/docs/clearml.conf#L13 - I just need to add company.id
or user.id
in the credential dict?
I've updated the configuration and now i'm able to see sub projects
that I didn't see before.
As I can see - each dataset
is a separate sub project
- is that correct?
agree -
we understand now that the worker is the default worker that is installed after runningpip install clearml-agent
is it possible to remove it ? since all tasks that use the worker don't have the correct credentials.
We need to convert it a DataFrame since
Displaying metadata in the UI is only supported for pandas Dataframes for now. Skipping!
Just for the record - for who ever will be searching for a similar setup with colab
prerequisitecreate a dedicated Service Account (I was not able to authenticate with a regular User credentials (and not SA)) get SA key ( credentials.json ) Upload json to an ephemeral location (e.g. root of colab)login into ClearML Web UI - Create access key for user - https://clear.ml/docs/latest/docs/webapp/webapp_profile#creating-clearml-credentials prepare credentials` %%bash
export api=`ca...
Here is the screenshot - we deleted all the workers - accept for the one that we couldn't
Great - Thx TimelyPenguin76 for your input
This does not work -
Since all the files are stored as a single ZIP file (which if unzipped will have all the data), but we would like to have access to the raw files in there original format.
AgitatedDove14 -
I also tried to https://github.com/allegroai/clearml-session
running the session
within docker but got the same error
clearml-session --docker
--git-credentials
(there is a typo in git - --git-credent ila s -> --git-credent ials)
and still got the same error
clearml_agent: ERROR: Can not run task without repository or literal script in
script.diff
Hi SweetBadger76 ,
Well - apparently I was mistaken.
I still have a ghost worker that i'm mot able to remove (I had 2 workers on the same queue - that caused my confusion).
I can see it in the UI and when I run clearml-agent list
And although I'm stoping the worker specificallyclearml-agent daemon --stop <worker_id>
I'm gettingCould not find a running clearml-agent instance with worker_name=<worker_id> worker_id=<worker_id>
But this is not on the pods, isn't it? We're talking about the python code running from COLAB or locally...?
correct - but where is the clearml.conf
file?
We have assets in a GCP bucket.
The dataset is created and then the assets are linked to the dataset via the add_external_files
method
In order to create a webdataset
we need to create tar files -
so we need to unzip and then recreate the tar file.
Additionally when the files are in GCS in the raw format you can easily review them with the preview (e.g. a wav file can be directly listened within the GCP console - web browser).
I think the main difference is that I can see a value of having access to the raw format within the cloud vendor and not only have it as an archive
Using the https://allegro.ai/clearml/docs/rst/references/clearml_python_ref/task_module/task_task.html?highlight=upload_artifact#clearml.task.Task.upload_artifact method. It works well, but only saves it as a csv
(which is very problematic since when loading the artifact none of the data types of the columns are preserved...)
This is my current solution[ds for ds in dataset.list_datasets() if ds['project'].split('/')[0]==<PROJEFCT_NAME>]
not sure i understand
we are running the daemon in a detached mode
clearml-agent daemon --queue <execution_queue_to_pull_from> --detached
using the helm charts
https://github.com/allegroai/clearml-helm-charts
Thx for your reply
Hi HugeArcticwolf77
I'v run the following code - which uploads the files with compression, although compression=None
ds.upload(show_progress=True, verbose=True, output_url='
', compression=None)
ds.finalize(verbose=True, auto_upload=True)
Any idea way?
yes - the pre_installations.sh
runs and completes - but the pytorch/main.py
file doesn't run.
so the Task completes successfully but without running the script