Reputation
Badges 1
42 × Eureka!AgitatedDove14 So I ran a cell to create the task and execute it remotely. The cell (after execution) shows up with an error that says NameError: name 'exit' is not defined
. Trains does store the task as a draft, and when I try to execute the task via trains agent, it aborts in the middle of installing all of the dependencies.
The notebook is contained within a git repo. I have the jupyter notebook pulled up and am running the cell for creating the task and executing it remotely. In the draft task, there is a git repo and uncommitted changes(which contains all of the cells of the notebook) and installed packages.
Yeah, the command is executed and nothing is returned.
i ran ping <IP HERE>:8008
, it returnsping: <ip-here>:8008: Name or service not known
The VM ip is 153.73.X.Y and the trains-server IP is 34.74.X.Y
Yeah, I’m trying to connect to my own server, and I installed trains-agent
on a vm. How could I diagnose the issue?
Since I'm running a pipeline, does this mean I have to execute each task of the pipeline individually and manually? Then uncomment task.execute_remotely()
from each task and then run via trains-agent
?
AgitatedDove14 Thanks! That resolved a lot of my questions.
Which would mean
trains
` will update the installed packages, no?
Can you explain what is meant by this?
I think what you're trying to say is that in the UI the installed packages will be determined through the code via the imports as usual and the users will have to manually clear the installed packages so that the requirements.txt
is used instead.
AgitatedDove14 Yes, I understand that, but as I mentioned earlier, I don't want to have to edit installed_packages
manually, as others will also be running the same scripts from their own local development machines. This is why I was inquiring about the requirements.txt
file, so that this manual editing would not have to be done by each person running the same scripts.
Additionally, since I'm running a pipeline, would I have to execute each task in the pipeline locally(but still...
I didn't execute the code manually, I executed it from trains-agent to begin with. So to conclude: it has to be executed manually first, then with trains agent?
SuccessfulKoala55 I looked back into my configuration settings for the VM(I already created a VM instance from the custom image yesterday) and thats where I got my external IP from. I checked "Allow HTTP access" but it's giving me a gateway timeout error now and telling me that the server at the IP is unreachable at the moment.
Regarding the trains server SDK, the end of the documentation for GCP says 'Next step, configuring trains for trains server' where the first step is running trains-init setup wizard
SuccessfulKoala55 Actually- just resolved this. I had to allow access from all ports(not secure but will fix that asap) but everything seems to be running now, thanks for the help!
Let me look into that and I’ll let you know
Thanks! yeah that sounds good. Is there any way to modify the config aside from using the UI?
SuccessfulKoala55 At the time, yes it was v0.15.1. How do I actually launch trains server from the custom image? Do I use the external IP associated with the VM or are there any other steps I have to take?
If there a way to do this without manually editing installed packages? I'm trying to create a setup in which others can utilize the same repo and have their venvs be built similarly without having to do much work on their end (i.e. editing the installed packages). As for why it's failing to detect, I'm not sure.from google.cloud import bigquery ImportError: cannot import name 'bigquery' from 'google.cloud' (unknown location)
That's the error I get after the task makes use of the function ...
I mean the config file. Can I change the parameters before executing the draft task? Or do changes to the parameters have to be committed to the git repo before the draft task can be executed?
AgitatedDove14 Thanks- just got the pipeline to run 🙂 Just one final question- on the documentation, it says not to queue any training/inference tasks into the services queue, so should I be creating a different queue for training tasks or is using the default queue okay?
The path for setting up trains-server on GCP is deploying the server, and then configuring the trains server, from my understanding. I'm unable to run trains-init
to setup, as it tells me the command isn't found.
I did see that, just wanted to confirm that there's no way of deleting artifacts without deleting the experiment as well
The default GKE OS is container optimized os so some directories(i.e. opt) are read only. I solved the problem by mounting it in var/libs. In the future, is allegro considering including deployment of trains-server on GKE?
TimelyPenguin76 thanks! Do you know when the next version will be released?
I was thinking more along the lines of a draft task that has yet to be enqueued(so it can still be edited). The installed packages grabs most packages except for a few, which is why I am resorting to clearing the packages so that requirements.txt
is used. I’ve been using the UI so far to do it, but would like to move the process to a script
Let me experiment with force_analyze_entire_repo
and I’ll get back to you! It was missing google-cloud-bigquery
, an import from a separate file in the repository. Hopefully that should resolve the problem. A flag would be really cool, just in case if theres any problem with the package analysis.