Reputation
Badges 1
25 × Eureka!I'm trying to achieve a workflow similar to the one
You mean running everything on a single machine (manually)?
Hi BoredGoat1
from this warning: " TRAINS Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
" It seems trains failed to load the nvidia .so dll that does the GPU monitoring:
This is based on pynvml, and I think it is trying to access "libnvidia-ml.so.1"
Basically saying, if you can run nvidima-smi from inside the container, it should work.
Hi TrickyRaccoon92
Are you sure plotly (the front-end module displaying the plots in the UI) supports it ?
Interesting... TrickyRaccoon92 could it be the validation phase was creating a new Tensorboard file ?
sorry typo client.task.
should be client.tasks.
AstonishingRabbit13 so is it working now ?
(Caused by SSLError(SSLError(1, '[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac
Where is the code running (agent) GCP instance ? your machine ?
There are also "completed, aborted, queued" .
Archived is actually a tag (system tag, not user tag). There is a "state machines" of moving from one state to the other. The special case is "published" that we probably should have called "locked". The idea is that if a Task/Model is published, you cannot reset it (and even deleting requires force flag).
I would use additional user tags (or even system-tags) to mark "deployed" state, wdyt?
Thanks @<1523703472304689152:profile|UpsetTurkey67>
I'm pretty sure it has!
Let me check how we can merge it into the cleamrl-agent, sounds good?
HurtWoodpecker30
The agent uses the
requirements.txt
)
what do you mean by that? aren't the package listed in the "Installed packages" section of the Task?
(or is it empty when starting, i.e. it uses the requirements.txt from the github, and then the agent lists them back into the Task)
Hi StormyOx60
Yes, by default it assumes any "file://" or local files, are accessible (which makes sense because if they are not, it will not able to download them).
there some way to force it to download the dataset to a specified location that is actually on my local machine?
You can specify a specific folder is not "local" and what it will do it will copy the zip locally and unzip it.
Is this what you are after ?
Hi @<1523701304709353472:profile|OddShrimp85>
the venv setup is totally based on my requirements.txt instead of adding on to what the image has before. Why?
Are you using the agent in docker mode ? if this is the case it creates a venv inside the docker, inheriting from the preinstalled docker system packages,
Then as you suggested, I would just use sys.path it is probably the easiest and actually very safe (because the subfolders are Always next to the "main" source code)
Hi DepressedChimpanzee34
Why do you need to have the configuration added manually ? isn't the cleaml.conf easier ? If not I think OS environments are easier no? I run run above code, everything worked with no exception/warning... What is the try/except solves exactly ?
Any chance you can zip the entire folder? I can't figure out what's missing, specifically "from config_files" , i.e. I have no packages nor file named config_files
Noooooooooo, it is still working π
@<1523701868901961728:profile|ReassuredTiger98> what are you getting with:
nvidia-smi
And here:
ls -la /usr/local/
ShallowGoldfish8 I believe it was solved in 1.9.0, can you verify?pip install clearml==1.9.0
SubstantialElk6 feel free to tweet them on their very inaccurate comparison table π
TrickySheep9
you are absolutely correct π
Thanks CynicalBee90 I appreciate the discussion! since I'm assuming you will actually amend the misrepresentation in your table, let me followup here.
1.
SPSS license may be a significant consideration for some, and so we thought it was important to point this out clearly.
SPSS is fully open-source compliant unless you have the intention of selling it as a service, I hardly think this is any users consideration, just like anyone would be using mongodb or elastic search without think...
Hmm HandsomeGiraffe70
This seem like a bug, let me see what we can do about that π
could it be the parent version was created with an older version of clearml sdk ?
Notice both needs to be str
btw, if you need the entire folder just use StorageManager.upload_folder
I double checked the code it's always being passed π
but I have no idea what's behingΒ
1
,Β
2
Β andΒ
3
Β compare to the first execution
This is why I would think multiple experiments, since it will store all the arguments (and I think these arguments are somehow being lost.
wdyt?