
Reputation
Badges 1
25 × Eureka!PompousBeetle71 oh no 😞
okay this is a bit drastic, but let's see if it helps.
In your trains.conf, add the following section:loggers { loggers { trains { level: ERROR } } }
Hi PompousBeetle71
Could you test the latest RC, I think the warning were fixed:pip install trains==0.16.2rc0
Let me know...
orpip install -U trains
Oh that is odd. Is this reproducible? @<1533620191232004096:profile|NuttyLobster9> what was the flow that required another task.init?
Hi @<1547028031053238272:profile|MassiveGoldfish6>
Is there a way for ClearML to simply save the model once training is done and to ignore the model checkpoints?
Yes, you can simple disable the auto logging of the model and manually save the checkpoint:
task = Task.init(..., auto_connect_frameworks={'pytorch': False}
...
task.update_output_model("/my/model.pt", ...)
Or for example, just "white-label" the final model
task = Task.init(..., auto_connect_frameworks={'pyt...
Thanks DilapidatedDucks58 ! We ❤ suggestions for improvements 🙂
Did you try to print a page using the browser (I think that they can all store it as pdf these days) Yes I agree, it would 🙂 we have some thoughts on creating plugins for the system, I think this could be a good use-case. Wait a week or two ;)
Hi @<1726410010763726848:profile|DistinctToad76>
Why not just report scalars, the x-axis you can use as "iterations" if this is a running in real time to collect the prompts.
If this is a summary then just report a scatter plot (you can also specify the names of the axis and the series)
None
If you have idea on where to start looking for a quick win, I'm open to suggestions 🙂
I can't seem to find a difference between the two, why would matplotlib get listed and pandas does not... Any other package that is missing?
BTW: as an immediate "hack" , before your Task.init
call add the following:Task.add_requirements("pandas")
https://github.com/allegroai/clearml/issues/199
Seems already supported for a while now ...
Okay, this is odd the request returned exactly 100 out 100.
It seems not all of them were reported?!
Could you post the toy code, I'll check what's going on.
Sounds great! let me know what you find out 🙂
Will this still be considered as
global site-packages
This is a pip settings, I "think" it inherits from the local user's installation, but I would actually install with "sudo pip" that will definitely be "inherited"
Hi PompousParrot44
Let's stick with a single question per thread, it will make my life a lot easier 🙂
What do you mean by "and not in the terminal directly when executed manually through script"?
trains-agent (usually) executed as a daemon pulling jobs and executing them.
The other options is to use it to manually execute a single task.
What am I missing?
RoundMosquito25 good news, no no need to open any ports 🙂
Basically B_i agents are always polling the server for "jobs" create an http/s request from them to the server, so all connections are out connections. Firewall is intact 🙂
I looked at your task log on the github issue. It seems the main issue is that your notebook is Not stored as python code. Are you running it on jupyter notebook or is it ipython that you are runnig it on? Is this reproducible? If so what's the jupyter version, python and OS versions?
Could it be pandas was not installed on the local machine ?
- Maybe we should add an option, archive components as well ...
Run clearml-agent and enqueue the pipeline ? What am i missing?
Suppose that a new model version 2 is trained, but it does not fulfill our target metrics, is it possible to just save the model to model repo and not serve it, if a model version 1 is already being served?
Sure, just do not "publish" the model, it will be stored in the model repository, fully accessible but the clearml-serving will not serve it 🙂
The api server by default spins multiple processes (they all might be busy a tye time with a huge flood of requests, but this is still multi process). Let me check if there is an easy way to set more processes
Go to https://demoapp.trains.allegro.ai/profile
You should see something like 0.16.2-123
You can definitely configure the watchdog to set the timeout to 15min, it should not have any effect on running processes, they basically ping every 30 sec alive message
This seems to be the issue:PYTHONPATH = '.'
How is that happening ?
Can you try to run the agent with:PYTHONPATH= clearml-agent daemon ....
(Notice the prefix PYTHONPATH=
clears the environment variable that obviously fails the python commands)
Great if this is what you do how come you need to change the entry script in the ui?
Hi DepressedChimpanzee34
I think main issue here is slow response time from the API server, I "think" you can increase the number of API server processes, but considering the 16GB, I'm not sure you have the headroom.
At peak usage, how much free RAM so you have on the machine ?
Hi @<1730396272990359552:profile|CluelessMouse37>
However, the caching doesn't seem to be working correctly. Despite not changing the configuration, the first step runs every time.
How are you creating the cached component?
is this a standalone script or a git repo link?
These parameters are dictionaries of specific configurations (dict of dict) that are the same but might not be taken into account properly by the caching mechanism.
hmm for the component to be cached (or reuse...
SmarmyDolphin68 okay what's happening is the process exists before the actual data is being sent (report_matplotlib_figure is an async call, and data is sent in the background)
Basically you should just wait for all the events to be flushedtask.flush(wait_for_uploads=True)
That said, quickly testing it it seems it does not wait properly (again I think this is due to the fact we do not have a main Task here, I'll continue debugging)
In the meantime you can just dosleep(3.0)
And it wil...