Reputation
Badges 1
42 × Eureka!I also tried detection with pip freeze and that didn’t work
SuccessfulKoala55 Actually- just resolved this. I had to allow access from all ports(not secure but will fix that asap) but everything seems to be running now, thanks for the help!
Thanks! yeah that sounds good. Is there any way to modify the config aside from using the UI?
AgitatedDove14 Thanks- just got the pipeline to run 🙂 Just one final question- on the documentation, it says not to queue any training/inference tasks into the services queue, so should I be creating a different queue for training tasks or is using the default queue okay?
The path for setting up trains-server on GCP is deploying the server, and then configuring the trains server, from my understanding. I'm unable to run trains-init
to setup, as it tells me the command isn't found.
SuccessfulKoala55 At the time, yes it was v0.15.1. How do I actually launch trains server from the custom image? Do I use the external IP associated with the VM or are there any other steps I have to take?
The default GKE OS is container optimized os so some directories(i.e. opt) are read only. I solved the problem by mounting it in var/libs. In the future, is allegro considering including deployment of trains-server on GKE?
SuccessfulKoala55 Getting an error with mounting on a read-only file system(Error response from daemon: error while creating mount source path '/opt/trains/logs': mkdir /opt/trains: read-only file system: RunContainerError), is there any workaround for this?
I did see that, just wanted to confirm that there's no way of deleting artifacts without deleting the experiment as well
AgitatedDove14 So I ran a cell to create the task and execute it remotely. The cell (after execution) shows up with an error that says NameError: name 'exit' is not defined
. Trains does store the task as a draft, and when I try to execute the task via trains agent, it aborts in the middle of installing all of the dependencies.
The notebook is contained within a git repo. I have the jupyter notebook pulled up and am running the cell for creating the task and executing it remotely. In the draft task, there is a git repo and uncommitted changes(which contains all of the cells of the notebook) and installed packages.
Yeah, I’m trying to connect to my own server, and I installed trains-agent
on a vm. How could I diagnose the issue?
i ran ping <IP HERE>:8008
, it returnsping: <ip-here>:8008: Name or service not known
Yeah, the command is executed and nothing is returned.
The VM ip is 153.73.X.Y and the trains-server IP is 34.74.X.Y
This line: “if task.running_locally():” makes sure that when the code is executed by the agent it will not reset it’s own requirements (the agent updates the requirements/installed_packages after it installs them from the requiremenst.txt, so that later you know exactly which packages/versions were used)
got it! Thanks for explaining
Let me experiment with force_analyze_entire_repo
and I’ll get back to you! It was missing google-cloud-bigquery
, an import from a separate file in the repository. Hopefully that should resolve the problem. A flag would be really cool, just in case if theres any problem with the package analysis.
AgitatedDove14 I tried your code snippet, something like this:task = Task.init(...) if task.running_locally(): # wait for the repo detection and requirements update task._wait_for_repo_detection() # reset requirements task._update_requirements(None) task.execute_remotely(queue="default") ... rest of task here ...
It’s able to create a draft successfully with no items in the installed packages section, but it only installed Cython
and began the execution immediately....
AgitatedDove14 worked like a charm, thank you!
If you have a few lines for resetting, I would love to have that as an option as well!
task = Task.init(...) if task.running_locally(): # wait for the repo detection and requirements update task._wait_for_repo_detection() # reset requirements task._update_requirements(None)
Regarding this, does this work if the task is not running locally and is being executed by the trains agent?
just reran and got the same results with the flag for force_analyze_entire_repo
set to false 😞 still has the following import error:ImportError: cannot import name 'bigquery' from 'google.cloud' (unknown location)
I was thinking more along the lines of a draft task that has yet to be enqueued(so it can still be edited). The installed packages grabs most packages except for a few, which is why I am resorting to clearing the packages so that requirements.txt
is used. I’ve been using the UI so far to do it, but would like to move the process to a script
Since the package requirements vary from task to task, setting it up in trains.conf
might be a little tedious to go back and change. Most of the time, the package analysis has worked very well, but in this particular use case it failed to detect. I think task.init flag would be great!
It was set to true earlier, I changed it to false to see if there would be any difference but doesn’t seem like it
AgitatedDove14 Yes, I understand that, but as I mentioned earlier, I don't want to have to edit installed_packages
manually, as others will also be running the same scripts from their own local development machines. This is why I was inquiring about the requirements.txt
file, so that this manual editing would not have to be done by each person running the same scripts.
Additionally, since I'm running a pipeline, would I have to execute each task in the pipeline locally(but still...