Reputation
Badges 1
25 × Eureka!I see,
@<1571308003204796416:profile|HollowPeacock58> can you please send the full log?
(The odd thing is it is trying to install the python 3.10 version of torch, when your command line suggest it is running python 3.8)
Can you do the following
Clone the Task you previously sent me the installed packages of, then enqueue the cloned task to the queue the agent with the conda.
Then send me the full log of the task that the agent run
Yes, I think you are correct, verified on Firefox & Chrome. I'll make sure to pass it along.
Thanks SteadyFox10 !
As I suspected, from your log:agent.package_manager.system_site_packages = falseWhich is exactly the problem of the missing tensorflow (basically it creates a new venv inside the docker, but without the flag On, it does not inherit the docker preinstalled packages)
This flag should have been true.
Could it be that the clearml.conf you are providing for the glue includes this value?
(basically you should only have the sections that are either credentials or missing from the default, there...
(you can find it in the pipeline component page)
Was trying to figure out how the method knows that the docker image ID belongs to ECR. Do you have any insight into that?
Basically you should have the docker service login before running the agent, then the agent uses docker to run the image from the ECR.
Make sense ?
So the "packages" are the packages you need in the steps themselves ?
is "my_package" a local package ?
what is the output of:pip freeze | grep my_package
What exactly do you get automatically on the "Installed Packages" (meaning the "my_package" line)?
I think this is the temp requirements it creates not your requirements file. If you attach a log here with the "installed packages" section maybe we could help to debug it
. Can I get gpu usage over time frame via API also?
task.get_reported_scalarsBut this will get you All the scalars, I think the next version of the server supports asking a specific one as well.
How are you implementing the alert monitoring?
Is is a stateless process starting every X min, or is it a state-full process running and monitoring ?
Hi DrabCockroach54
Do we know if gpu_0_mem_usage and gpu_0_mem_used_gb, both shows current GPU usage?
the first is percentage used (memory % used at any specific moment) and the second is memory used GiB , both for the video memory
How to know from this how much GPU is reserved for the task if this task is in progress?
What do you mean by how much is reserved ? Are you running with an agent?
I am asking this because my NGINX server is giving Gateway Timeouts for delete calls sometimes.
Sync ... it might make sense if you have a lot of load. it might also be that the server is preoccupied with other requests
Hi @<1523704667563888640:profile|CooperativeOtter46>
Is there a way to set the name/path of the
requirements.txt
file the agent uses to install packages?
When the agent is installing packages it takes it from the "Onstalled Packages" section of the Task. Only if it is empty it will revert to "requirements.txt" from the git repository
That said, if you can Add the following to your "Installed Pacakges"
-r my_other_requirements.txt
And the agent will `my_...
GiganticTurtle0
If there are several tasks running concurrently, which task shouldΒ
Task.current_task()
Β return?Β (
How could you have that ?
Per process, there is one Main current Task (until you close it).
Are you referring to a pipeline with multiple steps ?
If this is the case, task.current_task will return the Task of the component (if executed form the component) and the pipeline (if called from the pipeline logic function).
Notice we added the ability to s...
Failing when passing the diff to the git command...
Hi @<1523701066867150848:profile|JitteryCoyote63>
I found a memory leak
in
Logger.report_matplotlib_figure
Are you sure this is not Matplotlib leak but the Logger's fault ? I'm trying to think how we could create such a mem leak
wdyt?
Is gpu_0_utilization also in % then?
Correct π
I was trying to find, what are those min and max value for above metrics.
Oh that makes sense, notice that you can get the values over time, so you can track the usage over the experiment lifetime (you can of course see it in the Scalar tab of the experiment)
It was installed by 'pip install kwcoco' while my conda env was active.
Well I guess my question is, how does conda know ehere to install it form, if this is not on the public channels ? is there a specific conda channel you added (or preconfigured) ?
clearml should detect the "main" packages used in the repository (not just the main-script), the derivatives will be installed automatically by pip when the agent is installing the environment, once the agent is done setting the environment, it updates back the Task with the full list of packages including all required packages.
TrickySheep9
you are absolutely correct π
LudicrousParrot69 there is already
Task.add_tags
https://github.com/allegroai/clearml/blob/2d561bf4b3598b61525511a1a5f72a9dba74953e/clearml/task.py#L964
AdventurousButterfly15 this one is quite self container:
https://github.com/allegroai/clearml/blob/master/examples/reporting/scalar_reporting.py
So I guess pip install finished working
But the task is evidently not being executed.
This is very odd ... you can run the agent with debugging with --debug --foreground to see all the outputs and logs
You need to mount it to ~/clearml.conf (i.e. /root/clearml.conf)
Hi AbruptWorm50
I am currently using the repo cache,
What do you mean by "using the repo cache" ? This is transparent, the agent does that, users should not access that folder?
I also looked at the log you send, why do you think it is re-downloading the repo?
it fails because my_package using pip...so I have to manually edit the section and remove the "my_package"
MagnificentSeaurchin79 did you manually add both "." and my_package ?
If so, what was the reasoning to add my_package if pip cannot install it ?
After removing the task.connect lines, it encountered another error related to 'einops' that is not recognized. It does exist on my environment file but was not installed by the agent (according to what I see on 'Summary - installed python packages'. should I add this manually?
Yes, I'm assuming this is a derivative package that is needed by one of your packages?
Task.add_requirements("einops")
task = Task.init(...)
DeliciousSeal67 the agent will use the "install packages" section in order to install packages for the code. If you clear the entire section (you can do that in the UI or programmatically) then it will revert to requirementsd.txt
Make sense ?
Hmm I see your point.
Any chance you can open a github issue with a small code snippet to make sure we can reproduce and fix it?