Okay let's see if I can reproduce it:
new conda env py==3.8 install clearml == 0.17.5rc5 matplotlib == 3.3.4 numpy == 1.20.1 seaborn == 0.11.1Clone repo run `python examples/frameworks/matplotlib/matplotlib_example.pyRight ?
In theory, one could go over previously executed tasks, and create a copy of a specific scalar metric.
ShallowCat10 does that make sense in your scenario ?
It should move you directly into the queue pages.
Let me double check (working on the community server)
Hmm what do you mean? Isn't it under installed packages?
Having the ability to pack jobs/tasks onto the same "resource" (underlying server/EC2 instance)
This is essentially a "queue". Basically a queue is a way to abstract a specific type of resource, so that you can achieve exactly what you descibed.
open up a streaming use case, wherein batch (offline) inference could be done directly inside of a ClearML pipeline in reaction to an event/trigger (like new data landing in your data lake).
Yes, that's exactly how clearml is designed, a...
TenseOstrich47 this looks like elasticserach is out of space...
(BTW: draft means they are in edit mode, i.e. before execution, then they should be queued (i.e. pending) then running then completed)
Yes, this seems like the problem, you do not have an agent (trains-agent) connected to your server.
The agent is responsible for pulling the experiments and executing them.pip install trains-agent trains-agent init trains-agent daemon --gpus all
Hi ElegantCoyote26
sometimes the agents load an earlier version of one of my libraries.
I'm assuming some internal package that is installed from a wheel file not a direct git repo+commit link ?
Yes it does. I'm assuming each job is launched using a multiprocessing.Pool (which translates into a sub process). Let me see if I can reproduce this behavior.
Let me try to add some color to this process analysis process.
Basically clearml will try to statically analyze the code (i.e. look for import/from packages)
Then it will list them in a pip requirements.txt format under installed packages.
When running inside conda environment, it will check which packages were installed via "conda install" (instead of pip install) and mark them internally. This process ensures that when the clearml-agent is running with conda package manager, it "knows" whic...
Can you share the log?
Containers (and Pods) do not share GPUs. There's no overcommitting of GPUs.Actually I am as well, this is Kubernets doing the resource scheduling and actually Kubernetes decided it is okay to run two pods on the Same GPU, which is cool, but I was not aware Nvidia already added this feature (I know it was in beta for a long time)
https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes/
I also see thety added dynamic slicing and Memory Proteciton:
Notice you can control ...
hmm this might help:
https://pip.pypa.io/en/stable/topics/configuration/#environment-variables
basically you might be able to define:PIP_NO_USE_PEP517=1
(ignoring still having to fix the problem with
LazyEvalWrapper
return values).
fix will be pushed post weekend 🙂
such as displaying the step execution DaG in the PLOTS tab . (edited)
Wait, what are you getting on the DAG plot ? I think we "should" be able to see all the steps
yes
argument saying always create from code
can be helpful
@<1523701523954012160:profile|ShallowCormorant89> any chance you can open a github issue on that, just so we do not forget ?
if we can edit the configuration objects of a pipeline, that can be beneficial too. which we're unable to do from UI
Actually you already can, after you clone the pipeline, you can press on details then go to configuration Tab, and edit the pipeline object. The format is HOCON (...
Regrading the missing packages, you might want to test with:force_analyze_entire_repo: falsehttps://github.com/allegroai/trains/blob/c3fd3ed7c681e92e2fb2c3f6fd3493854803d781/docs/trains.conf#L162
Or if you have a full venv you like to store instead:
https://github.com/allegroai/trains/blob/c3fd3ed7c681e92e2fb2c3f6fd3493854803d781/docs/trains.conf#L169
BTW:
What is the missed package?
Still, My problem is calling
pipe.start()
crashes.
is supposed to kill the process2022-08-19 09:17:56,626 - clearml - WARNING - Terminating local execution processThis is what it writes before killing the local process.
` /opt/homebrew/anaconda3/envs/py39/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 16 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be ...
Hi @<1523701523954012160:profile|ShallowCormorant89>
This is generally based on number of agents, or am I missing something ? Also is it based on Task or decorated functions ?
Where do you store those ?
@<1523701523954012160:profile|ShallowCormorant89> can you verify it is reproducible in 1.9.3 ? because if it is I'd like to fix that 🙂
will it be possible for us to configure the "new run" button in a way so that it always clones from a particular pipeline ?
What do you mean by "particular pipeline" ? by default it will clone the last successful one, and by right clicking a specific one you can run a copy of that one. what am I missing ?
Hi @<1729309120315527168:profile|ShallowLion60>
How did you create those credentials ?
I have to admit, I haven't had the time 😞
Trying to get pip to be twice as fast 🤞
https://github.com/pypa/pip/pull/8215
Please keep pinging me, I would really like to follow on it.
Okay this seems correct...
Can you share both yaml files (server & serving) and env file?
Hmm UnevenDolphin73 I just checked with v.1.1.6, the first time the configuration file is loaded is when calling Task.init (if not running with an agent, which is your case).
But the main point I just realized I missed 🤯"http://"${CLEARML_ENDPOINT}":8080"The code does not try to resolve OS environments there!
Which, well, is a nice feature to add
https://github.com/allegroai/clearml/blob/d3e986393ac8d1a1ea48302224962570ab8e6f9e/clearml/backend_api/session/session.py#L576
should p...
Hi SmallDeer34
Is the Dataset in clearml-data ? If it is then Dataset.get().get_local_copy() will get you a cached local copy of the entire dataset.
If it is not, then you can use StorageManager.get_local_copy(url_here) to download the dataset.
- Any Argparser is automatically logged (and later can be overridden from the UI). Specifically HfArgumentParser will be automatically logged https://github.com/huggingface/transformers/blob/e43e11260ff3c0a1b3cb0f4f39782d71a51c0191/examples/pytorc...
Yeah I think that for some reason the merge of the pbtxt raw file is not working.
Any chance you have an end to end example we could debug? (maybe just add a pbtxt for one of the examples?)