
Reputation
Badges 1
25 × Eureka!Will this still be considered asΒ
global site-packages
This is a pip settings, I "think" it inherits from the local user's installation, but I would actually install with "sudo pip" that will definitely be "inherited"
Hi PompousParrot44
Let's stick with a single question per thread, it will make my life a lot easier π
What do you mean by "and not in the terminal directly when executed manually through script"?
trains-agent (usually) executed as a daemon pulling jobs and executing them.
The other options is to use it to manually execute a single task.
What am I missing?
RoundMosquito25 good news, no no need to open any ports π
Basically B_i agents are always polling the server for "jobs" create an http/s request from them to the server, so all connections are out connections. Firewall is intact π
I looked at your task log on the github issue. It seems the main issue is that your notebook is Not stored as python code. Are you running it on jupyter notebook or is it ipython that you are runnig it on? Is this reproducible? If so what's the jupyter version, python and OS versions?
Could it be pandas was not installed on the local machine ?
- Maybe we should add an option, archive components as well ...
Run clearml-agent and enqueue the pipeline ? What am i missing?
I... did not, ashamed to admit.
UnevenDolphin73 π I actually think you are correct, meaning I "think" that what you are asking is the low level logging (for example debug that usually is not printed to console) to also log? is that correct ?
Suppose that a new model version 2 is trained, but it does not fulfill our target metrics, is it possible to just save the model to model repo and not serve it, if a model version 1 is already being served?
Sure, just do not "publish" the model, it will be stored in the model repository, fully accessible but the clearml-serving will not serve it π
The api server by default spins multiple processes (they all might be busy a tye time with a huge flood of requests, but this is still multi process). Let me check if there is an easy way to set more processes
Go to https://demoapp.trains.allegro.ai/profile
You should see something like 0.16.2-123
You can definitely configure the watchdog to set the timeout to 15min, it should not have any effect on running processes, they basically ping every 30 sec alive message
This seems to be the issue:PYTHONPATH = '.'
How is that happening ?
Can you try to run the agent with:PYTHONPATH= clearml-agent daemon ....
(Notice the prefix PYTHONPATH=
clears the environment variable that obviously fails the python commands)
Great if this is what you do how come you need to change the entry script in the ui?
Hi DepressedChimpanzee34
I think main issue here is slow response time from the API server, I "think" you can increase the number of API server processes, but considering the 16GB, I'm not sure you have the headroom.
At peak usage, how much free RAM so you have on the machine ?
Hi @<1730396272990359552:profile|CluelessMouse37>
However, the caching doesn't seem to be working correctly. Despite not changing the configuration, the first step runs every time.
How are you creating the cached component?
is this a standalone script or a git repo link?
These parameters are dictionaries of specific configurations (dict of dict) that are the same but might not be taken into account properly by the caching mechanism.
hmm for the component to be cached (or reuse...
SmarmyDolphin68 okay what's happening is the process exists before the actual data is being sent (report_matplotlib_figure is an async call, and data is sent in the background)
Basically you should just wait for all the events to be flushedtask.flush(wait_for_uploads=True)
That said, quickly testing it it seems it does not wait properly (again I think this is due to the fact we do not have a main Task here, I'll continue debugging)
In the meantime you can just dosleep(3.0)
And it wil...
Yes, only task.execute_remotely()
should be the last call. because it literally will stop the local run before you add the Args section
MoodyCentipede68 seems you did not pass any configuration (os env or conf file) so it does nor know how to find the server and authenticate. Make sense?
LazyTurkey38 configuration pushed to github :)
Thanks CharmingShrimp37 !
Could you PR the fix ?
It will be just in time for the 0.16 release π
Then the dynamic gpu allocation is exactly what you need, I suggest talking to the sales ppl, I'm sure they can help. https://clear.ml/contact-us/
and itβs in the βinstalled packagesβ from the child task:
This is because the agent always updates back the full venv setup, so you will be able to always reproduce the entire thing (as opposed to dev time, where it lists only the directly imported packages)
I think I found something, let me test my theory
after generating a fresh set of keys
when you have a new set, copy paste them idirectly into the 'cleaml.conf' (should be at the top, can't miss it)
with tensorboard logging, it works fine when running from my machine, but not when running remotely in an agent.
This is odd, could you send the full Task log?
Hi NastyFox63 yes I think the problem was found (actually backend side).
It will be solved in the upcoming release (due after this weekend π )