
Reputation
Badges 1
103 × Eureka!In order to create a webdataset
we need to create tar files -
so we need to unzip and then recreate the tar file.
Additionally when the files are in GCS in the raw format you can easily review them with the preview (e.g. a wav file can be directly listened within the GCP console - web browser).
I think the main difference is that I can see a value of having access to the raw format within the cloud vendor and not only have it as an archive
This does not work -
Since all the files are stored as a single ZIP file (which if unzipped will have all the data), but we would like to have access to the raw files in there original format.
Possibly - thinking more of https://github.com/pytorch/data/blob/main/examples/vision/caltech256.py - using clearml dataset as root path.
This is my current solution[ds for ds in dataset.list_datasets() if ds['project'].split('/')[0]==<PROJEFCT_NAME>]
Dataset.list_datasets(dataset_project='XXXX')
Always returns an empty list
I'm looking for the bucket URI
I think my work flow needs to alter.
get the data into the bucket and then create the Dataset using the add_external_file
and then be able to consume the data locally or stream And then I can use - link_entries
Hi SmugDolphin23
Do you have a timeline for fixing this https://clearml.slack.com/archives/CTK20V944/p1661260956007059?thread_ts=1661256295.774349&cid=CTK20V944
That is a workaround - but surly not optimal
If we want to generate a dataset from a set of files that are on a local computer (e.g. a local GPU workstation then ran some media transformation) -
then instead of creating the Dataset
directly - we need to first upload them and only then use the ClearML
sdk.
Do you see any option integrating this kind of workflow into clearml?
we want to use the dataset output_uri as a common ground to create additional dataset formats such as https://webdataset.github.io/webdataset/
Hi SuccessfulKoala55
Is this section only relevant to AWS or also to GCP?
ClearML key/secret provided to the agent
When is this provided? Is this during the build
?
BTW - is the CLEARML_HOST_IP
relevant for the clearml-agent
?
i can see that we can create a worker with this environment variable . e.g.CLEARML_WORKER_NAME=MY-WORKDER CLEARML_WORKER_ID=MY-WORKER:0 CLEARML_HOST_IP=X.X.X.X clearml-agent daemon --detached
my mistake doesn't use it to create a dedicated IP
Is there any settings that we need to take into account when working with session
?
in the https://clear.ml/docs/latest/docs/apps/clearml_session#accessing-a-git-repository it mentions accessing Git Repository -
Can you run clearml sessions
without accessing Git? Assuming we are using ssh
- what is the correct configuration?
I found the task in the UI -
and in the UNCOMMITTED CHANGES
execution section there is
No changes logged
Any other suggestions?
Hi AgitatedDove14
OK - the issue was the firewall rules that we had.
Now both of the jupyter lab
and vscode
servers are up.
But now there is an issue with the Setting up connection to remote session
After the
Environment setup completed successfully
Starting Task Execution:
ClearML results page:
There is a WARNING
clearml - WARNING - Could not retrieve remote configuration named 'SSH'...
Looking in the repo I was not able to see an example - only reference to https://github.com/allegroai/clearml/blob/b9b0a506f35a414f6a9c2da7748f3ec3445b7d2d/docs/clearml.conf#L13 - I just need to add company.id
or user.id
in the credential dict?
Still trying to understand what is this default worker.
I've removed clearml.conf
and reinstall clearml-agent
then running theclearml-agent list
gets the following error
` Using built-in ClearML default key/secret
clearml_agent: ERROR: Could not find host server definition (missing ~/clearml.conf
or Environment CLEARML_API_HOST)
To get started with ClearML: setup your own clearml-server
, or create a free account at and run
clearml-agent init
Then returning the
...
Here is the screenshot - we deleted all the workers - accept for the one that we couldn't
not sure i understand
we are running the daemon in a detached mode
clearml-agent daemon --queue <execution_queue_to_pull_from> --detached
I think I have a lead.
looking at list of workers from clearml-agent list
e.g. https://clearml.slack.com/archives/CTK20V944/p1657174280006479?thread_ts=1657117193.653579&cid=CTK20V944
is there a way to find the worker_name
?
in the above example the worker_id
is clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0
but I'm not able to stop this worker using the command
clearml-agent daemon --stop
since this orphan worker has no corresponding clearml.conf
is this running from the same linux user on which you checked the git ssh clone on that machine?
yes
The only thing that could account for this issue is somehow the agent is not getting the right info from the ~/.ssh folder
maybe -
Question - if we change the clearml.conf
do we need to stop and start the daemon?
so running the command clearml-agent -d list
returns the https://clearml.slack.com/archives/CTK20V944/p1657174280006479?thread_ts=1657117193.653579&cid=CTK20V944
Well it seems that we have similar https://github.com/allegroai/clearml-agent/issues/86
we are not able to reference this orphan worker (it does not show up with ps -ef | grep clearml-agent
-
but still appears with clearml-agent list
and not able to stop with clearml-agent daemon --stop clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0
getting
` Could not find a running clearml-agent instance with worker_name=clearml-server-agent-group-cpu-agent-5df4476cfc-j54gh:0 wo...
Hi SweetBadger76 ,
Well - apparently I was mistaken.
I still have a ghost worker that i'm mot able to remove (I had 2 workers on the same queue - that caused my confusion).
I can see it in the UI and when I run clearml-agent list
And although I'm stoping the worker specificallyclearml-agent daemon --stop <worker_id>
I'm gettingCould not find a running clearml-agent instance with worker_name=<worker_id> worker_id=<worker_id>
agree -
we understand now that the worker is the default worker that is installed after runningpip install clearml-agent
is it possible to remove it ? since all tasks that use the worker don't have the correct credentials.