Reputation
Badges 1
25 × Eureka!ThickDove42 looking at the code, I suspect it fails interacting with the actual jupyter server (that is running on the same machine, but still).
Any chance you have a firewall on the Windows machine ?
but this gives me an idea, I will try to check if the notebook is considered as trusted, perhaps it isn't and that causes issues?
This is exactly what I was thinking (communication with the jupyter service is done over http, to localhost, sometimes AV/Firewall software will block it, false-positive detection I assume)
Oh I see the pipeline controller itself (not the components) is the one with the repo
To fix that add at the top of the script the following:
` from clearml import Task
Task.force_store_standalone_script()
@PipelineDecorator.pipeline(...) `That should do the trick
Hi SkinnyPanda43
Do you mean the cleaml-agent or the cleaml python (a.k.a the auto package detection) ?
Need - in my CI, the url used is https but I need the ssh url to be used. I see that we can pass repo to Task.create but not Task.init
Are you cloning an existing Task, or creating a new one ?
Agent works when I am running it from virtual environment but stucks in the same place all the time when I using Docker
Can you please provide a log? I'm not sure what it means stuck
So I had to add it explicitly via a docker init script
Oh yes, that makes sense, can't think of a better hack other than sys.path.append(os.path.join(os.path.dirname(__file__), "src"))
But this is not copy, this is mount, your log showed cp failing
Hi WittyOwl57
I'm guessing clearml is trying to unify the histograms for each iteration, but the result is in this case not useful.
I think you are correct, the TB histograms are actually a 3d histograms (i.e. 2d histograms over time, which would be the default for kernel;/bias etc.)
is there a way to ungroup the result by iteration, and, is it possible to group it by something else (e.g. the tags of the two plots displayed below side by side).
Can you provide a toy example...
Will this still be considered asÂ
global site-packages
This is a pip settings, I "think" it inherits from the local user's installation, but I would actually install with "sudo pip" that will definitely be "inherited"
Hi RoundMosquito25
The main problem here is there is no way to know before running the Task how much memory it would need ... And without that parameter maximizing GPUs is quite challenging. wdyt?
SkinnyPanda43 issue verified, this seems to be related to python 3.9 and subprocesses.
Let me check what we can do
ColossalAnt7 I would do the following:
Configure trains-server user/pass, mounting the API server configuration file as pointed in the trains-server documentation (intermediate temporary step) Start by providing the ML guys with a VPN access that allows them to access directly the trains-server api/web/file pos (caveat is the IP/sub-domain needs to be solved) Configure a ConfigMap to do the routing/ingest (this solves the IP/Sub-Domain issue) and allow the VPN to access the single entrypoint...
This is odd, how are you spinning clearml-serving ?
You can also do it synchronously :
predict_a = self.send_request(endpoint="/test_model_sklearn_a/", version=None, data=data)
predict_b = self.send_request(endpoint="/test_model_sklearn_b/", version=None, data=data)
I did not start with python -m, as a module. I'll try that
I do not think this is the issue.
It sounds like anything you do on your specific setup will end with the same error, which might point to a problem with the git/folder ?
Is this example working for you?
https://github.com/allegroai/clearml/blob/master/examples/reporting/model_config.py
Hmm good point, it should probably return he clearml python version. Is this what you mean?
Wait, @<1686547375457308672:profile|VastLobster56> per your config clearml-fileserver
who sets this domain name? could it be that it is only on our host machine? you can quickly test by running any docker on your machine and running ping clearml-fileserver
from the docker itself.
also your log showed "could not download None ..." , I would expect it to be None ...
, no?
Hmm so I guess the actual code adds it into the reporting itself ...
How about we call:task.set_initial_iteration(0)
HI FranticCormorant35 , the Reporter is internal implementation the Logger uses. In general you should use the Logger.
Hi ColossalAnt7
Following on SuccessfulKoala55 answer
I saw that there is a config file where you can specify specific users and passwords, but it currently requires
- mount the configuration file (the one holding the user/pass) into the pod from a persistent volume .
I think the k8s way to do this would be to use mounted config maps and secrets.
You can use ConfigMaps to make sure the routing is always correct, then add a load-balancer (a.k.a a fixed IP) for the users a...
do I need to have the repo that I am running on my account
If it is a public repo, then no need, credentials are only needed for private repos 🙂
Am I missing something ?
Notice this is only when:
Using Conda as package manager in the agent the requested python version is already installed (multiple python version installation on the same machine/container are supported)
Hmm I think you have a point here, the confusing part is the cp cmd. Can you send the full log? (Regradless , can I assume you are running a rootless container ?)
Weird issue, I'll make sure we fix compatibility with python 3.9
Hi PanickyMoth78
I had several pipeline components getting it and uploading files to is concurrently.
Should not be a problem
I've attached it's log file which only mentions skipping one file (a warning)
So what exactly is the error you are getting?
You can see the class here:
https://github.com/allegroai/clearml/blob/9b962bae4b1ccc448e1807e1688fe193454c1da1/clearml/binding/frameworks/init.py#L52
Basically you do:
` def my_callback(load_or_save, model):
# type: (str, WeightsFileHandler.ModelInfo) -> WeightsFileHandler.ModelInfo
assert load_or_save not in ('load', 'save')
# do something
if skip:
return None
return model
WeightsFileHandler.add_pre_callback(my_callback) `