EnviousStarfish54 following on this issue, the root cause is that dictConfig will clean All handlers if Not passed "incremental": True
conf_logging = { "incremental": True, ... }
Since you pointed that Kedro is internally calling logging.config.dictConfig(conf_logging)
,
this seems like an issue with Kedro as this call will remove All logging handlers, which seems problematic. wdyt ?
EnviousStarfish54 Yes i'm not sure what happens there we will have to dive deeper, but now that you got us a code snippet to reproduce the issue it should not be very complicated to fix (I hope 🤞 )
Hi EnviousStarfish54
You mean the console output ? if that's the case, the Task.init call will monkey patch the sys.stdout/sys.stderr to report to clearml
as well as the console
Thanks EnviousStarfish54
Let me check if I can reproduce it
Maybe you should make
naming_function
as public variable in
SearchStrategy
class or allow changing it in
HyperParameterOptimizer
class?
I like this idea, let's do that
Just making sure, you hit the 1024 character limit on S3 path?
If this is the case we should also fix the "artifact naming" to take that into account (it already does and has a limit, see here:
https://github.com/allegroai/clearml/blob/24464b7c1019f7a7b3149ecb80a379...
when u say use
Task.current_task()
you for logging? which i’m guessing that the fastai binding should do right?
right, this is a fancy way to say, make sure the actual sub-process is initializing ClearML so all the automagic kicks in, since this is not "forked" but a whole new process, calling Task.current_task is the equivalent of calling Task.init with the same arguments (which you can also do, I'm not sure which one is more straight forward, wdyt?)
Hi EnviousStarfish54
docker on windows , with nvidia runtime support is only with WSL (I think)
https://docs.nvidia.com/cuda/wsl-user-guide/index.html#installing-wip
https://medium.com/@dalgibbard/docker-with-gpu-support-in-wsl2-ebbc94251cf5
RoughTiger69 how did you end up with a Task with just "origin" in the repo field ?
Did you experiment any drop of performances using forkserver?
No, seems to be working properly for me.
If yes, did you test the variant suggested in the pytorch issue? If yes, did it solve the speed issue?
I haven't tested it, that said it seems like a generic optimization of the DataLoader
I want to store only my raw data in my blob storage, and I want to create a Hyperdataset with all the artificats, metrics, frames,
Yes that's exactly how it works.
None
This line adds a reference to raw file (local/remote)
[https://github.com/allegroai/clearml/blob/1b474dc0b057b69c76bc2daa9eb8be927cb25efa[…]es/hyperdatasets/data-registration/register_dataset_wit...
EnviousStarfish54
and the 8 charts are actually identical
Are you plotting the same plot 8 times?
is there something else in the conf that i should change ?
I'm assuming the google credentials?
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/docs/clearml.conf#L113
seems like pip 20.1.1 has the issue, but >= 22.2.2 do not.
Notice we changed the value there, it now has two versions, pne for python 3.10 < and one for python 3.10>=
The main reason is that pip changed their resolving algorithm, and the new one can break its own dependencies (i.e. pip freeze > requirements.txt -> pip install might not actually work)
None
This line 🙂
None
Notice Triton (and so is clearml-serving) needs the pytorch model to be converted into torchscript, so that the triton backend can load it
Yes, offline got broken in 1.3.0 😞 , RC fixed it:pip install clearml==1.3.1rc0
Stable release later this week
How can I track in clearML that this and that row was part of experiment x because it belonged to test/training data set y?
Hi @<1543766544847212544:profile|SorePelican79>
the experiments themselves will have a link to the Dataset they were using. From a dataset perspective, the idea is not to limit you, so essentially it will package all your files, and retrieve them when you fetch the datset. In terms of specifying a row / sample. My suggestion is to mark those rows when training a...
Hi @<1523706645840924672:profile|VirtuousFish83>
could it be you have some permission issues ?
: Forbidden: updates to statefulset spec for fields other than 'replicas',
It might be that you will need to take it down and restart it. not while it is running.
(do make sure you backup your server 🙂 )
Hmm, you can delete the artifact with:task._delete_artifacts(artifact_names=['my_artifact']
However this will not delete the file itself.
Do delete the file I would do :remote_file = task.artifacts['delete_me'].url h = StorageHelper.get(remote_file) h.delete(remote_file) task._delete_artifacts(artifact_names=['delete_me']
Maybe we should have a proper interface for that? wdyt? what's the actual use case?
Is there a way I could move the JWT authentication (not authorization) logic into an API Gateway or Load Balancer?
Hmm in theory, but not in practice 😞
if ClearML is following OAuth 2.0, t
This is for the SSO part, not for the API, API is only using JWT for verification, the login process itself is with external SSO (OAuth 2.0). But the open-source version does not support SSO 😞
Why are you trying to add another ELB with JWT verification on it ? ...
So the way it works when you run a component the return value with the entire function execution is cached, basically:
this did NOT add the artifact to the pipeline via caching on subsequent runs ❌
you just need to do:
PipelineDecorator.upload_artifact(name='images', artifact_object=img_dir, wait_on_upload=True)
return Task.current_task().artifacts['images'].url
This will return the URL of the uploaded images (i.e. S3 bucket)
which means if this is cached you will get it...
Oh this is so internally, the background thread can signal it is not deferred, are you saying there is bug or the code is odd?
,
remote_execute
kills the thread so the multirun stops at the first sub-task.
Hmm
task = Task.init(...)
# config some stuff
task.remote_execute(queue_name_here, exit_process=False)
# this means that the local execution will stop but when running on the remote agent it will be skipped
if Task.running_locally():
return
the SDK is unable to see each of the nodes?
Exactly ! I mean I love the idea of "nested" component, but implementation wise this is not trivial, it will also hurt the ability of caching individual component. The workaround is to have all the "business logic" in the pipeline function itself, routing data between components is basically "free". The data does not actually go through the pipeline logic, it only passes reference (unless the pipeline logic actually tries to access the data o...
files_server:
://genuin-ai/
should be:
files_server:
Oh that is odd. Is this reproducible? @<1533620191232004096:profile|NuttyLobster9> what was the flow that required another task.init?
I can't see any reason it should not work 😀
It seems to fail when trying to download the modellocal_download = StorageManager.get_local_copy(uri, extract_archive=False) File "/opt/venv/lib/python3.7/site-packages/clearml/storage/manager.py", line 47, in get_local_copy cached_file = cache.get_local_copy(remote_url=remote_url, force_download=force_download) File "/opt/venv/lib/python3.7/site-packages/clearml/storage/cache.py", line 55, in get_local_copy if helper.base_url == "file://":
And based on the error I suspect the...
- In a notebook, create a method and decorate it by fastai.script’s
@call_parse
.Any chance you have a very simple code/notebook to reference (this will really help in fixing the issue)?