Reputation
Badges 1
533 × Eureka!` name: XXXXXXXXXX
on:
workflow_dispatch
jobs:
test-monthly-predictions:
runs-on: self-hosted
env:
DATA_DIR: ${{ secrets.RUNNER_DATA_DIR }}
GOOGLE_APPLICATION_CREDENTIALS: ${{ secrets.RUNNER_CREDS }}
steps:
# Checkout
- name: Check out repository code
uses: actions/checkout@v2
# Setup python environment
- name: Setup up python environment using Poetry
run: |
/home/elior/.poetry/bin/poetry env use python3.9
...
SuccessfulKoala55 seems like you got it spot on, it contains the entire repo, but no .git
directory
So what can we do about it? All I want is to create templates for some tasks, so I can later execute them through a Pipelinecontroller
I mean, I barely have 20 experiments
I get this
` [ec2-user@ip-10-0-0-95 ~]$ docker-compose down
WARNING: The TRAINS_HOST_IP variable is not set. Defaulting to a blank string.
WARNING: The TRAINS_AGENT_GIT_USER variable is not set. Defaulting to a blank string.
WARNING: The TRAINS_AGENT_GIT_PASS variable is not set. Defaulting to a blank string.
ERROR: Couldn't connect to Docker daemon at http+docker://localhost - is it running?
If it's at a non-standard location, specify the URL with the DOCKER_HOST environment variable. `
SuccessfulKoala55 AppetizingMouse58
[ec2-user@ip-10-0-0-95 ~]$ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 3.9G 880K 3.9G 1% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/nvme0n1p1 8.0G 6.5G 1.5G 82% / tmpfs 790M 0 790M 0% /run/user/1000
Increased to 20, lets see how long will it last 🙂
what should I paste here to diagnose it?
I guess the AMI auto updated
This error just keeps coming back... I already made the watermarks like 0.5gb
Now I see the watermarks are 2gb
(it works now, with 20 GB)
Also being able to separate their configurations files would be good (maybe there is and I don't know?)
Could be, my message is that in general, the ability to attach a named scalar (without iteration/series dimension) to an experiment is valuable and basic when looking to track a metric over different experiments
That is not very informative
TimelyPenguin76 this fixed it, using the detect_with_pip_freeze
as true
solves the issue
Is there a way to do so without touching the config? directly through the Task object?
Okay so that is a bit complicated
In our setup, the DSes don't really care about agents, the agents are being managed by our MLops team.
So essentially if you imagine it the use case looks like that:
A data scientists wants to execute some CPU heavy task. The MLops team supplied him with a queue name, and the data scientist knows that when he needs something heavy he pushes it there - the DS doesn't know nothing about where it is executed, the execution environment is fully managed by the ML...