GiddyTurkey39
I would guess your VM cannot access the trains-server , meaning actual network configuration issue.
What are VM ip and the trains-server IP (the first two numbers are enough, e.g. 10.1.X.Y 174.4.X.Y)
Oh that is odd... let me check something
I'm not sure if it matters but 'kwcoco' is being imported inside one of the repo's functions and not on the script's header.
Should work.
when you run pip freeze inside the same env what are you getting ?
Also, is there anyother import that is missing? (basically 'clearml' tryies to be smart, and see if maybe the script itself, even though inside a repo, is not actually importing anything from the repo itself, and if this is the case it will only analyze the original script. Basically...
Can you post here the docker-compose.yml you are spinning? Maybe it is the wring one?
Step 4 here:
https://github.com/thepycoder/asteroid_example#deployment-phase
IntriguedRat44 If the monitoring only shows a single GPU (the selected one) it means it reads the correct CUDA_VISIBLE_DEVICES (this is how it knows that you are only using a selected GPU not all of them).
There is nothing else in the code that will change the OS environment.
Could you print os.environ['CUDA_VISIBLE_DEVICES'] while running the code to verify ?
, when I am running the pipeline remotely is there a way the remote machine can access it?
Well for the dataset to be accessible, you need to upload it with Dataset class, then the remote machine can do Dataset.get(...).get_local_copy() to get the actual data on the remote machine
Wait, it shows "hydra==2.5" not "hydra-core==x.y" ?
. Yes I do have a GOOGLE_APPLICATION_CREDENTIALS environment variable set, but nowhere do we save anything to GCS. The only usage is in the code which reads from BigQuery
Are you certain you have no artifacts on GS?
Are you saying that if GOOGLE_APPLICATION_CREDENTIALS and clearml.conf contains no "project" section it crashed when starting ?
Hi @<1523706266315132928:profile|DefiantHippopotamus88>
The idea is that clearml-server acts as a control plane and can sit on a different machine, obviously you can run both on the same machine for testing. Specifically it looks like the clearml-sering is not configured correctly as the error points to issue with initial handshake/login between the triton containers and the clearml-server. How did you configure the clearml-serving docker compose?
Hi SkinnyPanda43
This issue was fixed with clearml-agent 1.5.1, can you verify?
seems like pip 20.1.1 has the issue, but >= 22.2.2 do not.
Notice we changed the value there, it now has two versions, pne for python 3.10 < and one for python 3.10>=
The main reason is that pip changed their resolving algorithm, and the new one can break its own dependencies (i.e. pip freeze > requirements.txt -> pip install might not actually work)
None
task.connect(model_config)
task.connect(DataAugConfig)
If these are separate dictionaries , you should probably use two sections:
task.connect(model_config, name="model config")
task.connect(DataAugConfig, name="data aug")
It is still getting stuck.
I notice that one of the scalars that gets logged early is logging the epoch while the remaining scalars seem to be iterations because the iteration value is 1355 instead of 26
wait so you are seeing Some scalars ?...
MistakenBee55 how about a Task doing the Model quantization, then trigger it with TriggerScheduler ?
https://github.com/allegroai/clearml/blob/master/examples/scheduler/trigger_example.py
total size 5.34 GB, 1 chunked stored (average size 5.34 GB)PanickyAnt52 The issue itself the Dataset will not break files (it will package into multiple zip files a large folder, but not break the file itself).
The upload itself is limited by the HTTP interface (i.e. 2GB file size limit)
I would just encode it into multiple Arrow files
does that make sense ?
Found it
GiganticTurtle0 you are 𧨠! thank you for stumbling across this one as well.
Fix will be pushed later today π
Hi GiganticTurtle0
you should actually get " file://home/user/local_storage_path "
With "file://" prefix.
We always store the file:// prefix to note that this is a local path
p.s. StraightCoral86 I might be missing something here, please feel free to describe the entire execution scenario and what you are trying to achieve π
Correct π
but we run everything in docker containers. Will it still help?
As long as you are running with clearml-agent(in docker mode), all the cache folders (this one included) are mounted on the host machine for persistency
Hmm okay let me check that, I think I understand the issue
if so is there any doc/examples about this?
Good point, passing to docs π
https://github.com/allegroai/clearml/blob/51af6e833ddc5a8ba1efaaf75980f58616b25e85/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py#L123
I mean it is mentioned, but we should highlight it better
I see, so thereβs no way to launch a variant of my last run (with say some config/code tweaks) via CLI, and have it re-use the cached venv?
Try:clearml-task ... --requirements requirements.txtYou can also clone / override args withclearml-task --base-task-id <ID-of-original-task-post-agent> --args ...See full doc: https://clear.ml/docs/latest/docs/apps/clearml_task/
Hi CostlyElephant1
What do you mean by "delete raw data"? Data is always fetched to cached folders and clearml takes care of cache cleanup
That said notice that get mutable copy is a target you specify, in this case you should definetly delete after usage. Wdyt ?
This means that in your "Installed packages" you should see the line:
Notice that this is not a pypi artifactory (i.e. a server to add to the extra index url for pip), this is a direct pip install from a git repository, hence it should be listed in the "installed packages".
If this is the way the package was installed locally, you should have had this line in the installed packages.
The clearml agent should take care of the authentication for you (specifically here, it should do nothing).
If ...
Hi CurvedHedgehog15
I would like to optimize hparams saved in Configuration objects.
Yes, this is a tough one.
Basically the easiest way to optimize is with hyperparameter sections as they are basically key/value you can control from the outside (see the HPO process)
Configuration objects are, well, blobs of data, that "someone" can parse. There is no real restriction on them, since there are many standards to store them (yaml,json.init, dot notation etc.)
The quickest way is to add...
I think it would be nicer if the CLI had a subcommand to show the content ofΒ
~/.clearml_data.json
Β .
Actually, it only stores the last dataset id at the moment, no not much π
But maybe we should have a cmd line that just outputs the current datasetid, this means it will be easier to grab and pipe
WDYT?
Hi ShakyJellyfish91
It seems clearml is using a single connection, that takes a long time download
Hmm, I found this one:
https://github.com/allegroai/clearml/blob/1cb5dbb276026644ae20fef63d58256cdc887818/clearml/storage/helper.py#L1763
Does max_connections=10 mean 10 concurrent connections ?
However, SNPE performs quantization with precompiled CLI binary instead of python library (which also needs to be installed). What would be the pipeline in this case?
I would imagine a container with preinstalled SNPE compiler / quantizer, and a python script triggering the process ?
one more question: in case of triggering the quantization process, will it be considered as separate task?
I think this makes sense, since you probably want a container with the SNE environment, m...